[jira] [Updated] (CASSANDRA-9540) Cql IN query wrong on rows with values bigger than 64kb

Mathijs Vogelzang (JIRA) Thu, 04 Jun 2015 09:36:24 -0700

     [ 
https://issues.apache.org/jira/browse/CASSANDRA-9540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Mathijs Vogelzang updated CASSANDRA-9540:
-----------------------------------------
    Summary: Cql IN query wrong on rows with values bigger than 64kb  (was: Cql 
query doesn't return right information when using IN on columns for some keys)

> Cql IN query wrong on rows with values bigger than 64kb
> -------------------------------------------------------
>
>                 Key: CASSANDRA-9540
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9540
>             Project: Cassandra
>          Issue Type: Bug
>          Components: API
>         Environment: Cassandra 2.1.5
>            Reporter: Mathijs Vogelzang
>            Assignee: Carl Yeksigian
>             Fix For: 2.1.x
>
>
> We are investigating a weird issue where one of our clients doesn't get data 
> on his dashboard. It seems Cassandra is not returning data for a particular 
> key ("brokenkey" from now on).
> Some background:
> We have a row where we store a "metadata" column and data in columns 
> "bucket/0", "bucket/1", "bucket/2", etc. Depending on the date selection of 
> the UI, we know that we only need to retrieve bucket/0, bucket/0 and bucket/1 
> etc. (we always need to retrieve "metadata").
> A typical query may look like this (using SELECT column1 to just show what is 
> returned, normally we would of course do SELECT value):
> {noformat}
> cqlsh:AppBrain> select blobAsText(column1) from "GroupedSeries" where 
> key=textAsBlob('install/workingkey');
>  blobAsText(column1)
> ---------------------
>             bucket/0
>             metadata
> (2 rows)
> cqlsh:AppBrain> select blobAsText(column1) from "GroupedSeries" where 
> key=textAsBlob('install/brokenkey');
>  blobAsText(column1)
> ---------------------
>             bucket/0
>             metadata
> (2 rows)
> {noformat}
> These two queries work as expected, and return the information that we 
> actually stored.
> However, when we "filter" for certain columns, the brokenkey starts behaving 
> very weird:
> {noformat}
> cqlsh:AppBrain> select blobAsText(column1) from "GroupedSeries" where 
> key=textAsBlob('install/workingkey') and column1 IN 
> (textAsBlob('metadata'),textAsBlob('bucket/0'),textAsBlob('bucket/1'),textAsBlob('bucket/2'));
>  blobAsText(column1)
> ---------------------
>             bucket/0
>             metadata
> (2 rows)
> cqlsh:AppBrain> select blobAsText(column1) from "GroupedSeries" where 
> key=textAsBlob('install/workingkey') and column1 IN 
> (textAsBlob('metadata'),textAsBlob('bucket/0'),textAsBlob('bucket/1'),textAsBlob('bucket/2'),textAsBlob('asdfasdfasdf'));
>  blobAsText(column1)
> ---------------------
>             bucket/0
>             metadata
> (2 rows)
> ***  As expected, querying for more information doesn't really matter for the 
> working key ***
> cqlsh:AppBrain> select blobAsText(column1) from "GroupedSeries" where 
> key=textAsBlob('install/brokenkey') and column1 IN 
> (textAsBlob('metadata'),textAsBlob('bucket/0'),textAsBlob('bucket/1'),textAsBlob('bucket/2'));
>  blobAsText(column1)
> ---------------------
>             bucket/0
> (1 rows)
> *** Cassandra stops giving us the metadata column when asking for a few more 
> columns! ***
> cqlsh:AppBrain> select blobAsText(column1) from "GroupedSeries" where 
> key=textAsBlob('install/brokenkey') and column1 IN 
> (textAsBlob('metadata'),textAsBlob('bucket/0'),textAsBlob('bucket/1'),textAsBlob('bucket/2'),textAsBlob('asdfasdfasdf'));
>  key | column1 | value
> -----+---------+-------
> (0 rows)
> *** Adding the bogus column name even makes it return nothing from this row 
> anymore! ***
> {noformat}
> There are at least two rows that malfunction like this in our table (which is 
> quite old already and has gone through a bunch of Cassandra upgrades). I've 
> upgraded our whole cluster to 2.1.5 (we were on 2.1.2 when I discovered this 
> problem) and compacted, repaired and scrubbed this column family, which 
> hasn't helped.
> Our table structure is:
> {noformat}
> cqlsh:AppBrain> describe table "GroupedSeries";
> CREATE TABLE "AppBrain"."GroupedSeries" (
>     key blob,
>     column1 blob,
>     value blob,
>     PRIMARY KEY (key, column1)
> ) WITH COMPACT STORAGE
>     AND CLUSTERING ORDER BY (column1 ASC)
>     AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
>     AND comment = ''
>     AND compaction = {'min_threshold': '4', 'class': 
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
> 'max_threshold': '32'}
>     AND compression = {'sstable_compression': 
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>     AND dclocal_read_repair_chance = 0.1
>     AND default_time_to_live = 0
>     AND gc_grace_seconds = 864000
>     AND max_index_interval = 2048
>     AND memtable_flush_period_in_ms = 0
>     AND min_index_interval = 128
>     AND read_repair_chance = 1.0
>     AND speculative_retry = 'NONE';
> {noformat}
> Let me know if I can give more information that may be helpful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-9540) Cql IN query wrong on rows with values bigger than 64kb

Reply via email to