[jira] [Commented] (CASSANDRA-9540) Cql query doesn't return right information when using IN on columns for some keys
[ https://issues.apache.org/jira/browse/CASSANDRA-9540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14572973#comment-14572973 ] Mathijs Vogelzang commented on CASSANDRA-9540: -- Another observation: when testing this bug from our java code with the datastax cql driver, we notice that while the written data is still in the memtable, everything works correctly. Once its flushed to an SSTable on disk, the unpredictable behavior starts. Maybe this bug is another argument for https://issues.apache.org/jira/browse/CASSANDRA-9161 ? Cql query doesn't return right information when using IN on columns for some keys - Key: CASSANDRA-9540 URL: https://issues.apache.org/jira/browse/CASSANDRA-9540 Project: Cassandra Issue Type: Bug Components: API Environment: Cassandra 2.1.5 Reporter: Mathijs Vogelzang Assignee: Carl Yeksigian Fix For: 2.1.x We are investigating a weird issue where one of our clients doesn't get data on his dashboard. It seems Cassandra is not returning data for a particular key (brokenkey from now on). Some background: We have a row where we store a metadata column and data in columns bucket/0, bucket/1, bucket/2, etc. Depending on the date selection of the UI, we know that we only need to retrieve bucket/0, bucket/0 and bucket/1 etc. (we always need to retrieve metadata). A typical query may look like this (using SELECT column1 to just show what is returned, normally we would of course do SELECT value): {noformat} cqlsh:AppBrain select blobAsText(column1) from GroupedSeries where key=textAsBlob('install/workingkey'); blobAsText(column1) - bucket/0 metadata (2 rows) cqlsh:AppBrain select blobAsText(column1) from GroupedSeries where key=textAsBlob('install/brokenkey'); blobAsText(column1) - bucket/0 metadata (2 rows) {noformat} These two queries work as expected, and return the information that we actually stored. However, when we filter for certain columns, the brokenkey starts behaving very weird: {noformat} cqlsh:AppBrain select blobAsText(column1) from GroupedSeries where key=textAsBlob('install/workingkey') and column1 IN (textAsBlob('metadata'),textAsBlob('bucket/0'),textAsBlob('bucket/1'),textAsBlob('bucket/2')); blobAsText(column1) - bucket/0 metadata (2 rows) cqlsh:AppBrain select blobAsText(column1) from GroupedSeries where key=textAsBlob('install/workingkey') and column1 IN (textAsBlob('metadata'),textAsBlob('bucket/0'),textAsBlob('bucket/1'),textAsBlob('bucket/2'),textAsBlob('asdfasdfasdf')); blobAsText(column1) - bucket/0 metadata (2 rows) *** As expected, querying for more information doesn't really matter for the working key *** cqlsh:AppBrain select blobAsText(column1) from GroupedSeries where key=textAsBlob('install/brokenkey') and column1 IN (textAsBlob('metadata'),textAsBlob('bucket/0'),textAsBlob('bucket/1'),textAsBlob('bucket/2')); blobAsText(column1) - bucket/0 (1 rows) *** Cassandra stops giving us the metadata column when asking for a few more columns! *** cqlsh:AppBrain select blobAsText(column1) from GroupedSeries where key=textAsBlob('install/brokenkey') and column1 IN (textAsBlob('metadata'),textAsBlob('bucket/0'),textAsBlob('bucket/1'),textAsBlob('bucket/2'),textAsBlob('asdfasdfasdf')); key | column1 | value -+-+--- (0 rows) *** Adding the bogus column name even makes it return nothing from this row anymore! *** {noformat} There are at least two rows that malfunction like this in our table (which is quite old already and has gone through a bunch of Cassandra upgrades). I've upgraded our whole cluster to 2.1.5 (we were on 2.1.2 when I discovered this problem) and compacted, repaired and scrubbed this column family, which hasn't helped. Our table structure is: {noformat} cqlsh:AppBrain describe table GroupedSeries; CREATE TABLE AppBrain.GroupedSeries ( key blob, column1 blob, value blob, PRIMARY KEY (key, column1) ) WITH COMPACT STORAGE AND CLUSTERING ORDER BY (column1 ASC) AND caching = '{keys:ALL, rows_per_partition:NONE}' AND comment = '' AND compaction = {'min_threshold': '4', 'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32'} AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'} AND dclocal_read_repair_chance = 0.1 AND default_time_to_live = 0 AND gc_grace_seconds = 864000 AND max_index_interval
[jira] [Commented] (CASSANDRA-9540) Cql query doesn't return right information when using IN on columns for some keys
[ https://issues.apache.org/jira/browse/CASSANDRA-9540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573093#comment-14573093 ] Mathijs Vogelzang commented on CASSANDRA-9540: -- And one final observation from our testing: it seems the bug happens as soon as a cell value exceeds 64kb in size. (My earlier comment was wrong, there were 144,000 2-byte hex chars in the json file, so the cell size is roughly 70kb instead of 144kb). This bug is very difficult for us to work around (we're using the IN query as a replacement for the former thrift slice query to get specific column values), as we currently can not trust cassandra to return values that we wrote into it earlier, depending on whether our queried columns are present or not. If there's any workaround how we can query a set of columns using CQL that works, please let me know. Cql query doesn't return right information when using IN on columns for some keys - Key: CASSANDRA-9540 URL: https://issues.apache.org/jira/browse/CASSANDRA-9540 Project: Cassandra Issue Type: Bug Components: API Environment: Cassandra 2.1.5 Reporter: Mathijs Vogelzang Assignee: Carl Yeksigian Fix For: 2.1.x We are investigating a weird issue where one of our clients doesn't get data on his dashboard. It seems Cassandra is not returning data for a particular key (brokenkey from now on). Some background: We have a row where we store a metadata column and data in columns bucket/0, bucket/1, bucket/2, etc. Depending on the date selection of the UI, we know that we only need to retrieve bucket/0, bucket/0 and bucket/1 etc. (we always need to retrieve metadata). A typical query may look like this (using SELECT column1 to just show what is returned, normally we would of course do SELECT value): {noformat} cqlsh:AppBrain select blobAsText(column1) from GroupedSeries where key=textAsBlob('install/workingkey'); blobAsText(column1) - bucket/0 metadata (2 rows) cqlsh:AppBrain select blobAsText(column1) from GroupedSeries where key=textAsBlob('install/brokenkey'); blobAsText(column1) - bucket/0 metadata (2 rows) {noformat} These two queries work as expected, and return the information that we actually stored. However, when we filter for certain columns, the brokenkey starts behaving very weird: {noformat} cqlsh:AppBrain select blobAsText(column1) from GroupedSeries where key=textAsBlob('install/workingkey') and column1 IN (textAsBlob('metadata'),textAsBlob('bucket/0'),textAsBlob('bucket/1'),textAsBlob('bucket/2')); blobAsText(column1) - bucket/0 metadata (2 rows) cqlsh:AppBrain select blobAsText(column1) from GroupedSeries where key=textAsBlob('install/workingkey') and column1 IN (textAsBlob('metadata'),textAsBlob('bucket/0'),textAsBlob('bucket/1'),textAsBlob('bucket/2'),textAsBlob('asdfasdfasdf')); blobAsText(column1) - bucket/0 metadata (2 rows) *** As expected, querying for more information doesn't really matter for the working key *** cqlsh:AppBrain select blobAsText(column1) from GroupedSeries where key=textAsBlob('install/brokenkey') and column1 IN (textAsBlob('metadata'),textAsBlob('bucket/0'),textAsBlob('bucket/1'),textAsBlob('bucket/2')); blobAsText(column1) - bucket/0 (1 rows) *** Cassandra stops giving us the metadata column when asking for a few more columns! *** cqlsh:AppBrain select blobAsText(column1) from GroupedSeries where key=textAsBlob('install/brokenkey') and column1 IN (textAsBlob('metadata'),textAsBlob('bucket/0'),textAsBlob('bucket/1'),textAsBlob('bucket/2'),textAsBlob('asdfasdfasdf')); key | column1 | value -+-+--- (0 rows) *** Adding the bogus column name even makes it return nothing from this row anymore! *** {noformat} There are at least two rows that malfunction like this in our table (which is quite old already and has gone through a bunch of Cassandra upgrades). I've upgraded our whole cluster to 2.1.5 (we were on 2.1.2 when I discovered this problem) and compacted, repaired and scrubbed this column family, which hasn't helped. Our table structure is: {noformat} cqlsh:AppBrain describe table GroupedSeries; CREATE TABLE AppBrain.GroupedSeries ( key blob, column1 blob, value blob, PRIMARY KEY (key, column1) ) WITH COMPACT STORAGE AND CLUSTERING ORDER BY (column1 ASC) AND caching = '{keys:ALL, rows_per_partition:NONE}' AND comment = '' AND compaction = {'min_threshold': '4', 'class':
[jira] [Updated] (CASSANDRA-9540) Cql IN query wrong on rows with values bigger than 64kb
[ https://issues.apache.org/jira/browse/CASSANDRA-9540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mathijs Vogelzang updated CASSANDRA-9540: - Summary: Cql IN query wrong on rows with values bigger than 64kb (was: Cql query doesn't return right information when using IN on columns for some keys) Cql IN query wrong on rows with values bigger than 64kb --- Key: CASSANDRA-9540 URL: https://issues.apache.org/jira/browse/CASSANDRA-9540 Project: Cassandra Issue Type: Bug Components: API Environment: Cassandra 2.1.5 Reporter: Mathijs Vogelzang Assignee: Carl Yeksigian Fix For: 2.1.x We are investigating a weird issue where one of our clients doesn't get data on his dashboard. It seems Cassandra is not returning data for a particular key (brokenkey from now on). Some background: We have a row where we store a metadata column and data in columns bucket/0, bucket/1, bucket/2, etc. Depending on the date selection of the UI, we know that we only need to retrieve bucket/0, bucket/0 and bucket/1 etc. (we always need to retrieve metadata). A typical query may look like this (using SELECT column1 to just show what is returned, normally we would of course do SELECT value): {noformat} cqlsh:AppBrain select blobAsText(column1) from GroupedSeries where key=textAsBlob('install/workingkey'); blobAsText(column1) - bucket/0 metadata (2 rows) cqlsh:AppBrain select blobAsText(column1) from GroupedSeries where key=textAsBlob('install/brokenkey'); blobAsText(column1) - bucket/0 metadata (2 rows) {noformat} These two queries work as expected, and return the information that we actually stored. However, when we filter for certain columns, the brokenkey starts behaving very weird: {noformat} cqlsh:AppBrain select blobAsText(column1) from GroupedSeries where key=textAsBlob('install/workingkey') and column1 IN (textAsBlob('metadata'),textAsBlob('bucket/0'),textAsBlob('bucket/1'),textAsBlob('bucket/2')); blobAsText(column1) - bucket/0 metadata (2 rows) cqlsh:AppBrain select blobAsText(column1) from GroupedSeries where key=textAsBlob('install/workingkey') and column1 IN (textAsBlob('metadata'),textAsBlob('bucket/0'),textAsBlob('bucket/1'),textAsBlob('bucket/2'),textAsBlob('asdfasdfasdf')); blobAsText(column1) - bucket/0 metadata (2 rows) *** As expected, querying for more information doesn't really matter for the working key *** cqlsh:AppBrain select blobAsText(column1) from GroupedSeries where key=textAsBlob('install/brokenkey') and column1 IN (textAsBlob('metadata'),textAsBlob('bucket/0'),textAsBlob('bucket/1'),textAsBlob('bucket/2')); blobAsText(column1) - bucket/0 (1 rows) *** Cassandra stops giving us the metadata column when asking for a few more columns! *** cqlsh:AppBrain select blobAsText(column1) from GroupedSeries where key=textAsBlob('install/brokenkey') and column1 IN (textAsBlob('metadata'),textAsBlob('bucket/0'),textAsBlob('bucket/1'),textAsBlob('bucket/2'),textAsBlob('asdfasdfasdf')); key | column1 | value -+-+--- (0 rows) *** Adding the bogus column name even makes it return nothing from this row anymore! *** {noformat} There are at least two rows that malfunction like this in our table (which is quite old already and has gone through a bunch of Cassandra upgrades). I've upgraded our whole cluster to 2.1.5 (we were on 2.1.2 when I discovered this problem) and compacted, repaired and scrubbed this column family, which hasn't helped. Our table structure is: {noformat} cqlsh:AppBrain describe table GroupedSeries; CREATE TABLE AppBrain.GroupedSeries ( key blob, column1 blob, value blob, PRIMARY KEY (key, column1) ) WITH COMPACT STORAGE AND CLUSTERING ORDER BY (column1 ASC) AND caching = '{keys:ALL, rows_per_partition:NONE}' AND comment = '' AND compaction = {'min_threshold': '4', 'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32'} AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'} AND dclocal_read_repair_chance = 0.1 AND default_time_to_live = 0 AND gc_grace_seconds = 864000 AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128 AND read_repair_chance = 1.0 AND speculative_retry = 'NONE'; {noformat} Let me know if I can give more information that may be helpful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9540) Cql query doesn't return right information when using IN on columns for some keys
[ https://issues.apache.org/jira/browse/CASSANDRA-9540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14572659#comment-14572659 ] Mathijs Vogelzang commented on CASSANDRA-9540: -- I was able to make this case reproducible by getting the key from our production database and saving it with sstable2json. It seems that the broken keys have one somewhat bigger cell (144 kB in this case). A testcase json file is available at https://www.dropbox.com/s/kzu2jmqmwz788k8/testcase.json?dl=0 (the SSTable that it creates is available at https://www.dropbox.com/s/ilwpjka5r70n2os/test-testbug-ka-2-Data.db?dl=0 ) The commands that I executed are in cqlsh: {noformat} create keyspace test with replication={'class':'SimpleStrategy','replication_factor':1}; use test; CREATE TABLE test.testbug ( key blob, column1 blob, value blob, PRIMARY KEY (key, column1) ) WITH COMPACT STORAGE AND CLUSTERING ORDER BY (column1 ASC) AND bloom_filter_fp_chance = 0.01 AND caching = '{keys:ALL, rows_per_partition:NONE}' AND comment = '' AND compaction = {'min_threshold': '4', 'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32'} AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'} AND dclocal_read_repair_chance = 0.1 AND default_time_to_live = 0 AND gc_grace_seconds = 864000 AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128 AND read_repair_chance = 1.0 AND speculative_retry = 'NONE'; {noformat} Then I injected the json as follows on the commandline: {noformat} json2sstable -K test -c testbug testcase.json DATADIR/test/testbug/test-testbug-ka-1-Data.db nodetool refresh test testbug {noformat} The cqlsh then behaves as reported earlier, where querying for row test behaves as expected, but for row broken we don't get any data when we ask for columns that don't exist in that row (depending on the exact set of asked for columns, either 'metadata' is dropped, or no data is returned at all): {noformat} cqlsh:test select blobAsText(key),blobAsText(column1) from testbug where key=textAsBlob('test') and column1 in (textAsBlob('metadata'),textAsBlob('bucket/0')); blobAsText(key) | blobAsText(column1) -+- test |bucket/0 test |metadata (2 rows) cqlsh:test select blobAsText(key),blobAsText(column1) from testbug where key=textAsBlob('broken') and column1 in (textAsBlob('metadata'),textAsBlob('bucket/0')); blobAsText(key) | blobAsText(column1) -+- broken |bucket/0 broken |metadata (2 rows) cqlsh:test select blobAsText(key),blobAsText(column1) from testbug where key=textAsBlob('test') and column1 in (textAsBlob('metadata'),textAsBlob('bucket/0'),textAsBlob('bucket/1'),textAsBlob('bucket/2'),textAsBlob('asdfasdf')); blobAsText(key) | blobAsText(column1) -+- test |bucket/0 test |metadata (2 rows) cqlsh:test select blobAsText(key),blobAsText(column1) from testbug where key=textAsBlob('broken') and column1 in (textAsBlob('metadata'),textAsBlob('bucket/0'),textAsBlob('bucket/1'),textAsBlob('bucket/2'),textAsBlob('asdfasdf')); key | column1 | value -+-+--- (0 rows) cqlsh:test select blobAsText(key),blobAsText(column1) from testbug where key=textAsBlob('broken') and column1 in (textAsBlob('metadata'),textAsBlob('bucket/0'),textAsBlob('bucket/1')); blobAsText(key) | blobAsText(column1) -+- broken |bucket/0 (1 rows) {noformat} Cql query doesn't return right information when using IN on columns for some keys - Key: CASSANDRA-9540 URL: https://issues.apache.org/jira/browse/CASSANDRA-9540 Project: Cassandra Issue Type: Bug Components: API Environment: Cassandra 2.1.5 Reporter: Mathijs Vogelzang We are investigating a weird issue where one of our clients doesn't get data on his dashboard. It seems Cassandra is not returning data for a particular key (brokenkey from now on). Some background: We have a row where we store a metadata column and data in columns bucket/0, bucket/1, bucket/2, etc. Depending on the date selection of the UI, we know that we only need to retrieve bucket/0, bucket/0 and bucket/1 etc. (we always need to retrieve metadata). A typical query may look like this (using SELECT column1 to just show what is returned, normally we would of course do SELECT value): {noformat} cqlsh:AppBrain select blobAsText(column1) from
[jira] [Created] (CASSANDRA-9540) Cql query doesn't return right information when using IN on columns for some keys
Mathijs Vogelzang created CASSANDRA-9540: Summary: Cql query doesn't return right information when using IN on columns for some keys Key: CASSANDRA-9540 URL: https://issues.apache.org/jira/browse/CASSANDRA-9540 Project: Cassandra Issue Type: Bug Components: API Environment: Cassandra 2.1.5 Reporter: Mathijs Vogelzang We are investigating a weird issue where one of our clients doesn't get data on his dashboard. It seems Cassandra is not returning data for a particular key (brokenkey from now on). Some background: We have a row where we store a metadata column and data in columns bucket/0, bucket/1, bucket/2, etc. Depending on the date selection of the UI, we know that we only need to retrieve bucket/0, bucket/0 and bucket/1 etc. (we always need to retrieve metadata). A typical query may look like this (using SELECT column1 to just show what is returned, normally we would of course do SELECT value): {{noformat}} cqlsh:AppBrain select blobAsText(column1) from GroupedSeries where key=textAsBlob('install/workingkey'); blobAsText(column1) - bucket/0 metadata (2 rows) cqlsh:AppBrain select blobAsText(column1) from GroupedSeries where key=textAsBlob('install/brokenkey'); blobAsText(column1) - bucket/0 metadata (2 rows) {{/noformat}} These two queries work as expected, and return the information that we actually stored. However, when we filter for certain columns, the brokenkey starts behaving very weird: {{noformat}} cqlsh:AppBrain select blobAsText(column1) from GroupedSeries where key=textAsBlob('install/workingkey') and column1 IN (textAsBlob('metadata'),textAsBlob('bucket/0'),textAsBlob('bucket/1'),textAsBlob('bucket/2')); blobAsText(column1) - bucket/0 metadata (2 rows) cqlsh:AppBrain select blobAsText(column1) from GroupedSeries where key=textAsBlob('install/workingkey') and column1 IN (textAsBlob('metadata'),textAsBlob('bucket/0'),textAsBlob('bucket/1'),textAsBlob('bucket/2'),textAsBlob('asdfasdfasdf')); blobAsText(column1) - bucket/0 metadata (2 rows) *** As expected, querying for more information doesn't really matter for the working key *** cqlsh:AppBrain select blobAsText(column1) from GroupedSeries where key=textAsBlob('install/brokenkey') and column1 IN (textAsBlob('metadata'),textAsBlob('bucket/0'),textAsBlob('bucket/1'),textAsBlob('bucket/2')); blobAsText(column1) - bucket/0 (1 rows) *** Cassandra stops giving us the metadata column when asking for a few more columns! *** cqlsh:AppBrain select blobAsText(column1) from GroupedSeries where key=textAsBlob('install/brokenkey') and column1 IN (textAsBlob('metadata'),textAsBlob('bucket/0'),textAsBlob('bucket/1'),textAsBlob('bucket/2'),textAsBlob('asdfasdfasdf')); key | column1 | value -+-+--- (0 rows) *** Adding the bogus column name even makes it return nothing from this row anymore! *** {{/noformat}} There are at least two rows that malfunction like this in our table (which is quite old already and has gone through a bunch of Cassandra upgrades). I've upgraded our whole cluster to 2.1.5 (we were on 2.1.2 when I discovered this problem) and compacted, repaired and scrubbed this column family, which hasn't helped. Our table structure is: {{noformat}} cqlsh:AppBrain describe table GroupedSeries; CREATE TABLE AppBrain.GroupedSeries ( key blob, column1 blob, value blob, PRIMARY KEY (key, column1) ) WITH COMPACT STORAGE AND CLUSTERING ORDER BY (column1 ASC) AND caching = '{keys:ALL, rows_per_partition:NONE}' AND comment = '' AND compaction = {'min_threshold': '4', 'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32'} AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'} AND dclocal_read_repair_chance = 0.1 AND default_time_to_live = 0 AND gc_grace_seconds = 864000 AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128 AND read_repair_chance = 1.0 AND speculative_retry = 'NONE'; {{/noformat}} Let me know if I can give more information that may be helpful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9540) Cql query doesn't return right information when using IN on columns for some keys
[ https://issues.apache.org/jira/browse/CASSANDRA-9540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mathijs Vogelzang updated CASSANDRA-9540: - Description: We are investigating a weird issue where one of our clients doesn't get data on his dashboard. It seems Cassandra is not returning data for a particular key (brokenkey from now on). Some background: We have a row where we store a metadata column and data in columns bucket/0, bucket/1, bucket/2, etc. Depending on the date selection of the UI, we know that we only need to retrieve bucket/0, bucket/0 and bucket/1 etc. (we always need to retrieve metadata). A typical query may look like this (using SELECT column1 to just show what is returned, normally we would of course do SELECT value): {noformat} cqlsh:AppBrain select blobAsText(column1) from GroupedSeries where key=textAsBlob('install/workingkey'); blobAsText(column1) - bucket/0 metadata (2 rows) cqlsh:AppBrain select blobAsText(column1) from GroupedSeries where key=textAsBlob('install/brokenkey'); blobAsText(column1) - bucket/0 metadata (2 rows) {noformat} These two queries work as expected, and return the information that we actually stored. However, when we filter for certain columns, the brokenkey starts behaving very weird: {noformat} cqlsh:AppBrain select blobAsText(column1) from GroupedSeries where key=textAsBlob('install/workingkey') and column1 IN (textAsBlob('metadata'),textAsBlob('bucket/0'),textAsBlob('bucket/1'),textAsBlob('bucket/2')); blobAsText(column1) - bucket/0 metadata (2 rows) cqlsh:AppBrain select blobAsText(column1) from GroupedSeries where key=textAsBlob('install/workingkey') and column1 IN (textAsBlob('metadata'),textAsBlob('bucket/0'),textAsBlob('bucket/1'),textAsBlob('bucket/2'),textAsBlob('asdfasdfasdf')); blobAsText(column1) - bucket/0 metadata (2 rows) *** As expected, querying for more information doesn't really matter for the working key *** cqlsh:AppBrain select blobAsText(column1) from GroupedSeries where key=textAsBlob('install/brokenkey') and column1 IN (textAsBlob('metadata'),textAsBlob('bucket/0'),textAsBlob('bucket/1'),textAsBlob('bucket/2')); blobAsText(column1) - bucket/0 (1 rows) *** Cassandra stops giving us the metadata column when asking for a few more columns! *** cqlsh:AppBrain select blobAsText(column1) from GroupedSeries where key=textAsBlob('install/brokenkey') and column1 IN (textAsBlob('metadata'),textAsBlob('bucket/0'),textAsBlob('bucket/1'),textAsBlob('bucket/2'),textAsBlob('asdfasdfasdf')); key | column1 | value -+-+--- (0 rows) *** Adding the bogus column name even makes it return nothing from this row anymore! *** {noformat} There are at least two rows that malfunction like this in our table (which is quite old already and has gone through a bunch of Cassandra upgrades). I've upgraded our whole cluster to 2.1.5 (we were on 2.1.2 when I discovered this problem) and compacted, repaired and scrubbed this column family, which hasn't helped. Our table structure is: {noformat} cqlsh:AppBrain describe table GroupedSeries; CREATE TABLE AppBrain.GroupedSeries ( key blob, column1 blob, value blob, PRIMARY KEY (key, column1) ) WITH COMPACT STORAGE AND CLUSTERING ORDER BY (column1 ASC) AND caching = '{keys:ALL, rows_per_partition:NONE}' AND comment = '' AND compaction = {'min_threshold': '4', 'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32'} AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'} AND dclocal_read_repair_chance = 0.1 AND default_time_to_live = 0 AND gc_grace_seconds = 864000 AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128 AND read_repair_chance = 1.0 AND speculative_retry = 'NONE'; {noformat} Let me know if I can give more information that may be helpful. was: We are investigating a weird issue where one of our clients doesn't get data on his dashboard. It seems Cassandra is not returning data for a particular key (brokenkey from now on). Some background: We have a row where we store a metadata column and data in columns bucket/0, bucket/1, bucket/2, etc. Depending on the date selection of the UI, we know that we only need to retrieve bucket/0, bucket/0 and bucket/1 etc. (we always need to retrieve metadata). A typical query may look like this (using SELECT column1 to just show what is returned, normally we would of course do SELECT value): {{noformat}} cqlsh:AppBrain select blobAsText(column1) from GroupedSeries where key=textAsBlob('install/workingkey'); blobAsText(column1)
[jira] [Commented] (CASSANDRA-8786) NullPointerException in ColumnDefinition.hasIndexOption
[ https://issues.apache.org/jira/browse/CASSANDRA-8786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14318416#comment-14318416 ] Mathijs Vogelzang commented on CASSANDRA-8786: -- The tables were probably originally created on version 1.x We've only tried with cqlsh. I've just done a DESCRIBE in cassandra-cli on our production table and the new one and the most suspicious difference is that on the production (non-working) table, one column name is listed like this: {noformat} (...) Column Name: account_id Validation Class: org.apache.cassandra.db.marshal.LongType Index Name: idx_accid Index Type: KEYS Column Name: next_column_name (...){noformat} and on the newly created development table (the one without problems) it is {noformat} (...) Column Name: account_id Validation Class: org.apache.cassandra.db.marshal.LongType Index Name: idx_accid Index Type: KEYS Index Options: {} Column Name: next_column_name (...){noformat} (Other minor differences are speculative retry NONE vs 99.0PERCENTILE, Read repair chance 0 vs 1 and Bloom filter FP chance default vs. 0.01, but the Index Options seems closer to where the problem probably lies to me) NullPointerException in ColumnDefinition.hasIndexOption --- Key: CASSANDRA-8786 URL: https://issues.apache.org/jira/browse/CASSANDRA-8786 Project: Cassandra Issue Type: Bug Components: Core Environment: Cassandra 2.1.2 Reporter: Mathijs Vogelzang Fix For: 2.1.4 We have a Cassandra cluster that we've been using through many upgrades, and thus most of our column families have originally been created by Thrift. We are on Cassandra 2.1.2 now. We've now ported most of our code to use CQL, and our code occasionally tries to recreate tables with IF NOT EXISTS to work properly on development / testing environments. When we issue the CQL statement CREATE INDEX IF NOT EXISTS index ON tableName (accountId) (this index does exist on that table already), we get a {{DriverInternalError: An unexpected error occurred server side on cass_host/xx.xxx.xxx.xxx:9042: java.lang.NullPointerException}} The error on the server is: {noformat} java.lang.NullPointerException: null at org.apache.cassandra.config.ColumnDefinition.hasIndexOption(ColumnDefinition.java:489) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.cql3.statements.CreateIndexStatement.validate(CreateIndexStatement.java:87) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:224) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:248) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.transport.messages.QueryMessage.execute(QueryMessage.java:119) ~[apache-cassandra-2.1.2.jar:2.1.2] {noformat} This happens every time we run this CQL statement. We've tried to reproduce it in a test cassandra cluster by creating the table according to the exact DESCRIBE TABLE specification, but then this NullPointerException doesn't happon upon the CREATE INDEX one. So it seems that the tables on our production cluster (that were originally created through thrift) are still subtly different schema-wise then a freshly created table according to the same creation statement. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-8786) NullPointerException in ColumnDefinition.hasIndexOption
Mathijs Vogelzang created CASSANDRA-8786: Summary: NullPointerException in ColumnDefinition.hasIndexOption Key: CASSANDRA-8786 URL: https://issues.apache.org/jira/browse/CASSANDRA-8786 Project: Cassandra Issue Type: Bug Components: Core Environment: Cassandra 2.1.2 Reporter: Mathijs Vogelzang We have a Cassandra cluster that we've been using through many upgrades, and thus most of our column families have originally been created by Thrift. We are on Cassandra 2.1.2 now. We've now ported most of our code to use CQL, and our code occasionally tries to recreate tables with IF NOT EXISTS to work properly on development / testing environments. When we issue the CQL statement CREATE INDEX IF NOT EXISTS index ON tableName (accountId) (this index does exist on that table already), we get a DriverInternalError: An unexpected error occurred server side on cass_host/xx.xxx.xxx.xxx:9042: java.lang.NullPointerException The error on the server is java.lang.NullPointerException: null at org.apache.cassandra.config.ColumnDefinition.hasIndexOption(ColumnDefinition.java:489) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.cql3.statements.CreateIndexStatement.validate(CreateIndexStatement.java:87) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:224) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:248) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.transport.messages.QueryMessage.execute(QueryMessage.java:119) ~[apache-cassandra-2.1.2.jar:2.1.2] This happens every time we run this CQL statement. We've tried to reproduce it in a test cassandra cluster by creating the table according to the exact DESCRIBE TABLE specification, but then this NullPointerException doesn't happon upon the CREATE INDEX one. So it seems that the tables on our production cluster (that were originally created through thrift) are still subtly different schema-wise then a freshly created table according to the same creation statement. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-6478) Importing sstables through sstableloader tombstoned data
Mathijs Vogelzang created CASSANDRA-6478: Summary: Importing sstables through sstableloader tombstoned data Key: CASSANDRA-6478 URL: https://issues.apache.org/jira/browse/CASSANDRA-6478 Project: Cassandra Issue Type: Bug Components: Tools Environment: Cassandra 2.0.3 Reporter: Mathijs Vogelzang We've tried to import sstables from a snapshot of a 1.2.10 cluster into a running 2.0.3 cluster. When using sstableloader, for some reason we couldn't retrieve some of the data. When investigating further, it turned out that tombstones in the far future were created for some rows. (sstable2json returned the correct data, but with an addition of metadata: {deletionInfo: {markedForDeleteAt:1796952039620607,localDeletionTime:0}} to the rows that seemed missing). This happened again exactly the same way when we cleared the new cluster and ran sstableloader again. The sstables itself seemed fine, they were working on the old cluster, upgradesstables tells there's nothing to upgrade, and we were finally able to move our data correctly by copying the SSTables with scp into the right directory on the hosts of the new clusters worked fine (but naturally this required much more disk space than when sstableloader only sends the relevant parts). -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (CASSANDRA-6478) Importing sstables through sstableloader tombstoned data
[ https://issues.apache.org/jira/browse/CASSANDRA-6478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mathijs Vogelzang updated CASSANDRA-6478: - Since Version: 2.0.3 Fix Version/s: 2.0.3 Importing sstables through sstableloader tombstoned data Key: CASSANDRA-6478 URL: https://issues.apache.org/jira/browse/CASSANDRA-6478 Project: Cassandra Issue Type: Bug Components: Tools Environment: Cassandra 2.0.3 Reporter: Mathijs Vogelzang Fix For: 2.0.3 We've tried to import sstables from a snapshot of a 1.2.10 cluster into a running 2.0.3 cluster. When using sstableloader, for some reason we couldn't retrieve some of the data. When investigating further, it turned out that tombstones in the far future were created for some rows. (sstable2json returned the correct data, but with an addition of metadata: {deletionInfo: {markedForDeleteAt:1796952039620607,localDeletionTime:0}} to the rows that seemed missing). This happened again exactly the same way when we cleared the new cluster and ran sstableloader again. The sstables itself seemed fine, they were working on the old cluster, upgradesstables tells there's nothing to upgrade, and we were finally able to move our data correctly by copying the SSTables with scp into the right directory on the hosts of the new clusters worked fine (but naturally this required much more disk space than when sstableloader only sends the relevant parts). -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (CASSANDRA-5381) java.io.EOFException exception while executing nodetool repair with compression enabled
[ https://issues.apache.org/jira/browse/CASSANDRA-5381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13615171#comment-13615171 ] Mathijs Vogelzang commented on CASSANDRA-5381: -- We have the same issue where all streaming between nodes fails with an EOFException and then too many retries. This started when we upgraded from 1.1.7 to 1.2.2, and didn't go away on subsequent upgrade to 1.2.3. We tried running with/without internode compression and encryption, and found out that when encryption is off, everything works fine (also WITH compression on). With encryption on, it doesn't work, also with internode compression turned off, so for us it definitely has something to do with streaming while internode encryption is enabled. java.io.EOFException exception while executing nodetool repair with compression enabled --- Key: CASSANDRA-5381 URL: https://issues.apache.org/jira/browse/CASSANDRA-5381 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.2.3 Environment: Linux Virtual Machines, Red Hat Enterprise release 6.4, kernel version 2.6.32-358.2.1.el6.x86_64. Each VM has 8GB memory and 4vCPUS. Reporter: Neil Thomson Priority: Minor Very similar to issue reported in CASSANDRA-5105. I have 3 nodes configured in a cluster. The nodes are configured with compression enabled. When attempting a nodetool repair on one node, i get exceptions in the other nodes in the cluster. Disabling compression on the column family allows nodetool repair to run without error. Exception: INFO [Streaming to /3.69.211.179:2] 2013-03-25 12:30:27,874 StreamReplyVerbHandler.java (line 50) Need to re-stream file /var/lib/cassandra/data/rt/values/rt-values-ib-1-Data.db to /3.69.211.179 INFO [Streaming to /3.69.211.179:2] 2013-03-25 12:30:27,991 StreamReplyVerbHandler.java (line 50) Need to re-stream file /var/lib/cassandra/data/rt/values/rt-values-ib-1-Data.db to /3.69.211.179 ERROR [Streaming to /3.69.211.179:2] 2013-03-25 12:30:28,113 CassandraDaemon.java (line 164) Exception in thread Thread[Streaming to /3.69.211.179:2,5,main] java.lang.RuntimeException: java.io.EOFException at com.google.common.base.Throwables.propagate(Throwables.java:160) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Caused by: java.io.EOFException at java.io.DataInputStream.readInt(Unknown Source) at org.apache.cassandra.streaming.FileStreamTask.receiveReply(FileStreamTask.java:193) at org.apache.cassandra.streaming.compress.CompressedFileStreamTask.stream(CompressedFileStreamTask.java:114) at org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:91) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ... 3 more Keyspace configuration is as follows: Keyspace: rt: Replication Strategy: org.apache.cassandra.locator.SimpleStrategy Durable Writes: true Options: [replication_factor:3] Column Families: ColumnFamily: tagname Key Validation Class: org.apache.cassandra.db.marshal.BytesType Default column value validator: org.apache.cassandra.db.marshal.BytesType Columns sorted by: org.apache.cassandra.db.marshal.BytesType GC grace seconds: 864000 Compaction min/max thresholds: 4/32 Read repair chance: 0.1 DC Local Read repair chance: 0.0 Populate IO Cache on flush: false Replicate on write: true Caching: KEYS_ONLY Bloom Filter FP chance: default Built indexes: [] Compaction Strategy: org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy ColumnFamily: values Key Validation Class: org.apache.cassandra.db.marshal.BytesType Default column value validator: org.apache.cassandra.db.marshal.BytesType Columns sorted by: org.apache.cassandra.db.marshal.BytesType GC grace seconds: 864000 Compaction min/max thresholds: 4/32 Read repair chance: 0.1 DC Local Read repair chance: 0.0 Populate IO Cache on flush: false Replicate on write: true Caching: KEYS_ONLY Bloom Filter FP chance: default Built indexes: [] Compaction Strategy: org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA,