[ 
https://issues.apache.org/jira/browse/CASSANDRA-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-6517:
---------------------------------------

    Attachment: 0001-CASSANDRA-6517-Use-column-timestamp-to-check-for-del.patch
                repro.sh

> Loss of secondary index entries if nodetool cleanup called before compaction
> ----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-6517
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6517
>             Project: Cassandra
>          Issue Type: Bug
>          Components: API
>         Environment: Ubuntu 12.0.4 with 8+ GB RAM and 40GB hard disk for data 
> directory.
>            Reporter: Christoph Werres
>            Assignee: Sam Tunnicliffe
>             Fix For: 2.0.5
>
>         Attachments: 
> 0001-CASSANDRA-6517-Use-column-timestamp-to-check-for-del.patch, repro.sh
>
>
> From time to time we had the feeling of not getting all results that should 
> have been returned using secondary indexes. Now we tracked down some 
> situations and found out, it happened:
> 1) To primary keys that were already deleted and have been re-created later on
> 2) After our nightly maintenance scripts were running
> We can reproduce now the following szenario:
> - create a row entry with an indexed column included
> - query it and use the secondary index criteria -> Success
> - delete it, query again -> entry gone as expected
> - re-create it with the same key, query it -> success again
> Now use in exactly that sequence
> nodetool cleanup
> nodetool flush
> nodetool compact
> When issuing the query now, we don't get the result using the index. The 
> entry is indeed available in it's table when I just ask for the key. Below is 
> the exact copy-paste output from CQL when I reproduced the problem with an 
> example entry on on of our tables.
> mwerrch@mstc01401:/opt/cassandra$ current/bin/cqlsh Connected to 
> 14-15-Cluster at localhost:9160.
> [cqlsh 4.1.0 | Cassandra 2.0.3 | CQL spec 3.1.1 | Thrift protocol 19.38.0] 
> Use HELP for help.
> cqlsh> use mwerrch;
> cqlsh:mwerrch> desc tables;
> B4Container_Demo
> cqlsh:mwerrch> desc table "B4Container_Demo";
> CREATE TABLE "B4Container_Demo" (
>   key uuid,
>   archived boolean,
>   bytes int,
>   computer int,
>   deleted boolean,
>   description text,
>   doarchive boolean,
>   filename text,
>   first boolean,
>   frames int,
>   ifversion int,
>   imported boolean,
>   jobid int,
>   keepuntil bigint,
>   nextchunk text,
>   node int,
>   recordingkey blob,
>   recstart bigint,
>   recstop bigint,
>   simulationid bigint,
>   systemstart bigint,
>   systemstop bigint,
>   tapelabel bigint,
>   version blob,
>   PRIMARY KEY (key)
> ) WITH COMPACT STORAGE AND
>   bloom_filter_fp_chance=0.010000 AND
>   caching='KEYS_ONLY' AND
>   comment='demo' AND
>   dclocal_read_repair_chance=0.000000 AND
>   gc_grace_seconds=604800 AND
>   index_interval=128 AND
>   read_repair_chance=1.000000 AND
>   replicate_on_write='true' AND
>   populate_io_cache_on_flush='false' AND
>   default_time_to_live=0 AND
>   speculative_retry='NONE' AND
>   memtable_flush_period_in_ms=0 AND
>   compaction={'class': 'SizeTieredCompactionStrategy'} AND
>   compression={'sstable_compression': 'LZ4Compressor'};
> CREATE INDEX mwerrch_Demo_computer ON "B4Container_Demo" (computer);
> CREATE INDEX mwerrch_Demo_node ON "B4Container_Demo" (node);
> CREATE INDEX mwerrch_Demo_recordingkey ON "B4Container_Demo" (recordingkey);
> cqlsh:mwerrch> INSERT INTO "B4Container_Demo" (key,computer,node) VALUES 
> (78c70562-1f98-3971-9c28-2c3d8e09c10f, 50, 50); cqlsh:mwerrch> select 
> key,node,computer from "B4Container_Demo" where computer=50;
>  key                                  | node | computer
> --------------------------------------+------+----------
>  78c70562-1f98-3971-9c28-2c3d8e09c10f |   50 |       50
> (1 rows)
> cqlsh:mwerrch> DELETE FROM "B4Container_Demo" WHERE 
> key=78c70562-1f98-3971-9c28-2c3d8e09c10f;
> cqlsh:mwerrch> select key,node,computer from "B4Container_Demo" where 
> computer=50;
> (0 rows)
> cqlsh:mwerrch> INSERT INTO "B4Container_Demo" (key,computer,node) VALUES 
> (78c70562-1f98-3971-9c28-2c3d8e09c10f, 50, 50); cqlsh:mwerrch> select 
> key,node,computer from "B4Container_Demo" where computer=50;
>  key                                  | node | computer
> --------------------------------------+------+----------
>  78c70562-1f98-3971-9c28-2c3d8e09c10f |   50 |       50
> (1 rows)
> **********************************
> Now we execute (maybe from a different shell so we don't have to close this 
> session) from /opt/cassandra/current/bin directory:
> ./nodetool cleanup
> ./nodetool flush
> ./nodetool compact
> Going back to our CQL session the result will no longer be available if 
> queried via the index:
> *********************************
> cqlsh:mwerrch> select key,node,computer from "B4Container_Demo" where 
> computer=50;
> (0 rows)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to