[ 
https://issues.apache.org/jira/browse/CASSANDRA-6137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Constance Eustace updated CASSANDRA-6137:
-----------------------------------------

      Description: 
I am elevating this to Critical after doing some trace and reproducing in 
several environments. No one has commented on this bug from the cassandra team, 
and I view unreliable/corrupted data a pretty big deal. We are considering 
pulling cassandra and using something else.


------------------------------------------------------------------

We are encountering inconsistent results from CQL3 queries with column keys 
using IN clause in WHERE. This has been reproduced in cqlsh and the jdbc driver.

Rowkey is e_entid
Column key is p_prop

This returns roughly 21 rows for 21 column keys that match p_prop.

cqlsh> SELECT 
e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
 FROM internal_submission.Entity_Job WHERE e_entid = 
'845b38f1-2b91-11e3-854d-126aad0075d4-CJOB';

These three queries each return one row for the requested single column key in 
the IN clause:

SELECT 
e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
 FROM internal_submission.Entity_Job WHERE e_entid = 
'845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'  AND p_prop in 
('urn:bby:pcm:job:ingest:content:complete:count');
SELECT 
e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
 FROM internal_submission.Entity_Job WHERE e_entid = 
'845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'  AND p_prop in 
('urn:bby:pcm:job:ingest:content:all:count');
SELECT 
e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
 FROM internal_submission.Entity_Job WHERE e_entid = 
'845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'  AND p_prop in 
('urn:bby:pcm:job:ingest:content:fail:count');

This query returns ONLY ONE ROW (one column key), not three as I would expect 
from the three-column-key IN clause:

cqlsh> SELECT 
e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
 FROM internal_submission.Entity_Job WHERE e_entid = 
'845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'  AND p_prop in 
('urn:bby:pcm:job:ingest:content:complete:count','urn:bby:pcm:job:ingest:content:all:count','urn:bby:pcm:job:ingest:content:fail:count');

This query does return two rows however for the requested two column keys:

cqlsh> SELECT 
e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
 FROM internal_submission.Entity_Job WHERE e_entid = 
'845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'  AND p_prop in (                    
                            
'urn:bby:pcm:job:ingest:content:all:count','urn:bby:pcm:job:ingest:content:fail:count');





cqlsh> describe table internal_submission.entity_job;

CREATE TABLE entity_job (
  e_entid text,
  p_prop text,
  describes text,
  dndcondition text,
  e_entlinks text,
  e_entname text,
  e_enttype text,
  ingeststatus text,
  ingeststatusdetail text,
  p_flags text,
  p_propid text,
  p_proplinks text,
  p_storage text,
  p_subents text,
  p_val text,
  p_vallang text,
  p_vallinks text,
  p_valtype text,
  p_valunit text,
  p_vars text,
  partnerid text,
  referenceid text,
  size int,
  sourceip text,
  submitdate bigint,
  submitevent text,
  userid text,
  version text,
  PRIMARY KEY (e_entid, p_prop)
) WITH
  bloom_filter_fp_chance=0.010000 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.000000 AND
  gc_grace_seconds=864000 AND
  index_interval=128 AND
  read_repair_chance=0.100000 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  default_time_to_live=0 AND
  speculative_retry='NONE' AND
  memtable_flush_period_in_ms=0 AND
  compaction={'class': 'SizeTieredCompactionStrategy'} AND
  compression={'sstable_compression': 'LZ4Compressor'};

CREATE INDEX internal_submission__JobDescribesIDX ON entity_job (describes);

CREATE INDEX internal_submission__JobDNDConditionIDX ON entity_job 
(dndcondition);

CREATE INDEX internal_submission__JobIngestStatusIDX ON entity_job 
(ingeststatus);

CREATE INDEX internal_submission__JobIngestStatusDetailIDX ON entity_job 
(ingeststatusdetail);

CREATE INDEX internal_submission__JobReferenceIDIDX ON entity_job (referenceid);

CREATE INDEX internal_submission__JobUserIDX ON entity_job (userid);

CREATE INDEX internal_submission__JobVersionIDX ON entity_job (version);

-------------------------------

My suspicion is that the three-column-key IN Clause is translated (improperly 
or not) to a two-column key range with the assumption that the third column key 
is present in that range, but it isn't...

-----------------------------------

We have tried: nodetool cache invalidations, start/stop of cassandra. Those did 
NOT fix the problem. A table dump (COPY TO) and then reload (COPY FROM) does 
fix the rows, but then more corruption creeps in.

We are using the cassandra-jdbc driver, but I don't see anything wrong with the 
issued statements inside the cassandra source code when I step through the 
code. 

With additional writes, it may be possible that some rows get fixed. Compaction 
or other jobs may repair this, but on the timescale of hours done debugging, 
the failures are consistent. 


  was:
We are encountering inconsistent results from CQL3 queries with column keys 
using IN clause in WHERE. This has been reproduced in cqlsh and the jdbc driver.

Rowkey is e_entid
Column key is p_prop

This returns roughly 21 rows for 21 column keys that match p_prop.

cqlsh> SELECT 
e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
 FROM internal_submission.Entity_Job WHERE e_entid = 
'845b38f1-2b91-11e3-854d-126aad0075d4-CJOB';

These three queries each return one row for the requested single column key in 
the IN clause:

SELECT 
e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
 FROM internal_submission.Entity_Job WHERE e_entid = 
'845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'  AND p_prop in 
('urn:bby:pcm:job:ingest:content:complete:count');
SELECT 
e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
 FROM internal_submission.Entity_Job WHERE e_entid = 
'845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'  AND p_prop in 
('urn:bby:pcm:job:ingest:content:all:count');
SELECT 
e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
 FROM internal_submission.Entity_Job WHERE e_entid = 
'845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'  AND p_prop in 
('urn:bby:pcm:job:ingest:content:fail:count');

This query returns ONLY ONE ROW (one column key), not three as I would expect 
from the three-column-key IN clause:

cqlsh> SELECT 
e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
 FROM internal_submission.Entity_Job WHERE e_entid = 
'845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'  AND p_prop in 
('urn:bby:pcm:job:ingest:content:complete:count','urn:bby:pcm:job:ingest:content:all:count','urn:bby:pcm:job:ingest:content:fail:count');

This query does return two rows however for the requested two column keys:

cqlsh> SELECT 
e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
 FROM internal_submission.Entity_Job WHERE e_entid = 
'845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'  AND p_prop in (                    
                            
'urn:bby:pcm:job:ingest:content:all:count','urn:bby:pcm:job:ingest:content:fail:count');





cqlsh> describe table internal_submission.entity_job;

CREATE TABLE entity_job (
  e_entid text,
  p_prop text,
  describes text,
  dndcondition text,
  e_entlinks text,
  e_entname text,
  e_enttype text,
  ingeststatus text,
  ingeststatusdetail text,
  p_flags text,
  p_propid text,
  p_proplinks text,
  p_storage text,
  p_subents text,
  p_val text,
  p_vallang text,
  p_vallinks text,
  p_valtype text,
  p_valunit text,
  p_vars text,
  partnerid text,
  referenceid text,
  size int,
  sourceip text,
  submitdate bigint,
  submitevent text,
  userid text,
  version text,
  PRIMARY KEY (e_entid, p_prop)
) WITH
  bloom_filter_fp_chance=0.010000 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.000000 AND
  gc_grace_seconds=864000 AND
  index_interval=128 AND
  read_repair_chance=0.100000 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  default_time_to_live=0 AND
  speculative_retry='NONE' AND
  memtable_flush_period_in_ms=0 AND
  compaction={'class': 'SizeTieredCompactionStrategy'} AND
  compression={'sstable_compression': 'LZ4Compressor'};

CREATE INDEX internal_submission__JobDescribesIDX ON entity_job (describes);

CREATE INDEX internal_submission__JobDNDConditionIDX ON entity_job 
(dndcondition);

CREATE INDEX internal_submission__JobIngestStatusIDX ON entity_job 
(ingeststatus);

CREATE INDEX internal_submission__JobIngestStatusDetailIDX ON entity_job 
(ingeststatusdetail);

CREATE INDEX internal_submission__JobReferenceIDIDX ON entity_job (referenceid);

CREATE INDEX internal_submission__JobUserIDX ON entity_job (userid);

CREATE INDEX internal_submission__JobVersionIDX ON entity_job (version);

-------------------------------

My suspicion is that the three-column-key IN Clause is translated (improperly 
or not) to a two-column key range with the assumption that the third column key 
is present in that range, but it isn't...


         Priority: Critical  (was: Major)
      Environment: 
Ubuntu AWS Cassandra 2.0.1 SINGLE NODE on EBS RAID storage
OSX Cassandra 1.2.8 on SSD storage

  was:Ubuntu AWS Cassandra 2.0.1 SINGLE NODE

    Fix Version/s: 1.2.8

> CQL3 SELECT IN CLAUSE inconsistent
> ----------------------------------
>
>                 Key: CASSANDRA-6137
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6137
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: Ubuntu AWS Cassandra 2.0.1 SINGLE NODE on EBS RAID 
> storage
> OSX Cassandra 1.2.8 on SSD storage
>            Reporter: Constance Eustace
>            Priority: Critical
>             Fix For: 1.2.8, 2.0.1
>
>
> I am elevating this to Critical after doing some trace and reproducing in 
> several environments. No one has commented on this bug from the cassandra 
> team, and I view unreliable/corrupted data a pretty big deal. We are 
> considering pulling cassandra and using something else.
> ------------------------------------------------------------------
> We are encountering inconsistent results from CQL3 queries with column keys 
> using IN clause in WHERE. This has been reproduced in cqlsh and the jdbc 
> driver.
> Rowkey is e_entid
> Column key is p_prop
> This returns roughly 21 rows for 21 column keys that match p_prop.
> cqlsh> SELECT 
> e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
>  FROM internal_submission.Entity_Job WHERE e_entid = 
> '845b38f1-2b91-11e3-854d-126aad0075d4-CJOB';
> These three queries each return one row for the requested single column key 
> in the IN clause:
> SELECT 
> e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
>  FROM internal_submission.Entity_Job WHERE e_entid = 
> '845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'  AND p_prop in 
> ('urn:bby:pcm:job:ingest:content:complete:count');
> SELECT 
> e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
>  FROM internal_submission.Entity_Job WHERE e_entid = 
> '845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'  AND p_prop in 
> ('urn:bby:pcm:job:ingest:content:all:count');
> SELECT 
> e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
>  FROM internal_submission.Entity_Job WHERE e_entid = 
> '845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'  AND p_prop in 
> ('urn:bby:pcm:job:ingest:content:fail:count');
> This query returns ONLY ONE ROW (one column key), not three as I would expect 
> from the three-column-key IN clause:
> cqlsh> SELECT 
> e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
>  FROM internal_submission.Entity_Job WHERE e_entid = 
> '845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'  AND p_prop in 
> ('urn:bby:pcm:job:ingest:content:complete:count','urn:bby:pcm:job:ingest:content:all:count','urn:bby:pcm:job:ingest:content:fail:count');
> This query does return two rows however for the requested two column keys:
> cqlsh> SELECT 
> e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
>  FROM internal_submission.Entity_Job WHERE e_entid = 
> '845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'  AND p_prop in (                  
>                               
> 'urn:bby:pcm:job:ingest:content:all:count','urn:bby:pcm:job:ingest:content:fail:count');
> cqlsh> describe table internal_submission.entity_job;
> CREATE TABLE entity_job (
>   e_entid text,
>   p_prop text,
>   describes text,
>   dndcondition text,
>   e_entlinks text,
>   e_entname text,
>   e_enttype text,
>   ingeststatus text,
>   ingeststatusdetail text,
>   p_flags text,
>   p_propid text,
>   p_proplinks text,
>   p_storage text,
>   p_subents text,
>   p_val text,
>   p_vallang text,
>   p_vallinks text,
>   p_valtype text,
>   p_valunit text,
>   p_vars text,
>   partnerid text,
>   referenceid text,
>   size int,
>   sourceip text,
>   submitdate bigint,
>   submitevent text,
>   userid text,
>   version text,
>   PRIMARY KEY (e_entid, p_prop)
> ) WITH
>   bloom_filter_fp_chance=0.010000 AND
>   caching='KEYS_ONLY' AND
>   comment='' AND
>   dclocal_read_repair_chance=0.000000 AND
>   gc_grace_seconds=864000 AND
>   index_interval=128 AND
>   read_repair_chance=0.100000 AND
>   replicate_on_write='true' AND
>   populate_io_cache_on_flush='false' AND
>   default_time_to_live=0 AND
>   speculative_retry='NONE' AND
>   memtable_flush_period_in_ms=0 AND
>   compaction={'class': 'SizeTieredCompactionStrategy'} AND
>   compression={'sstable_compression': 'LZ4Compressor'};
> CREATE INDEX internal_submission__JobDescribesIDX ON entity_job (describes);
> CREATE INDEX internal_submission__JobDNDConditionIDX ON entity_job 
> (dndcondition);
> CREATE INDEX internal_submission__JobIngestStatusIDX ON entity_job 
> (ingeststatus);
> CREATE INDEX internal_submission__JobIngestStatusDetailIDX ON entity_job 
> (ingeststatusdetail);
> CREATE INDEX internal_submission__JobReferenceIDIDX ON entity_job 
> (referenceid);
> CREATE INDEX internal_submission__JobUserIDX ON entity_job (userid);
> CREATE INDEX internal_submission__JobVersionIDX ON entity_job (version);
> -------------------------------
> My suspicion is that the three-column-key IN Clause is translated (improperly 
> or not) to a two-column key range with the assumption that the third column 
> key is present in that range, but it isn't...
> -----------------------------------
> We have tried: nodetool cache invalidations, start/stop of cassandra. Those 
> did NOT fix the problem. A table dump (COPY TO) and then reload (COPY FROM) 
> does fix the rows, but then more corruption creeps in.
> We are using the cassandra-jdbc driver, but I don't see anything wrong with 
> the issued statements inside the cassandra source code when I step through 
> the code. 
> With additional writes, it may be possible that some rows get fixed. 
> Compaction or other jobs may repair this, but on the timescale of hours done 
> debugging, the failures are consistent. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to