[ https://issues.apache.org/jira/browse/CASSANDRA-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sam Tunnicliffe updated CASSANDRA-6517: --------------------------------------- Attachment: 0001-CASSANDRA-6517-Use-column-timestamp-to-check-for-del.patch repro.sh > Loss of secondary index entries if nodetool cleanup called before compaction > ---------------------------------------------------------------------------- > > Key: CASSANDRA-6517 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6517 > Project: Cassandra > Issue Type: Bug > Components: API > Environment: Ubuntu 12.0.4 with 8+ GB RAM and 40GB hard disk for data > directory. > Reporter: Christoph Werres > Assignee: Sam Tunnicliffe > Fix For: 2.0.5 > > Attachments: > 0001-CASSANDRA-6517-Use-column-timestamp-to-check-for-del.patch, repro.sh > > > From time to time we had the feeling of not getting all results that should > have been returned using secondary indexes. Now we tracked down some > situations and found out, it happened: > 1) To primary keys that were already deleted and have been re-created later on > 2) After our nightly maintenance scripts were running > We can reproduce now the following szenario: > - create a row entry with an indexed column included > - query it and use the secondary index criteria -> Success > - delete it, query again -> entry gone as expected > - re-create it with the same key, query it -> success again > Now use in exactly that sequence > nodetool cleanup > nodetool flush > nodetool compact > When issuing the query now, we don't get the result using the index. The > entry is indeed available in it's table when I just ask for the key. Below is > the exact copy-paste output from CQL when I reproduced the problem with an > example entry on on of our tables. > mwerrch@mstc01401:/opt/cassandra$ current/bin/cqlsh Connected to > 14-15-Cluster at localhost:9160. > [cqlsh 4.1.0 | Cassandra 2.0.3 | CQL spec 3.1.1 | Thrift protocol 19.38.0] > Use HELP for help. > cqlsh> use mwerrch; > cqlsh:mwerrch> desc tables; > B4Container_Demo > cqlsh:mwerrch> desc table "B4Container_Demo"; > CREATE TABLE "B4Container_Demo" ( > key uuid, > archived boolean, > bytes int, > computer int, > deleted boolean, > description text, > doarchive boolean, > filename text, > first boolean, > frames int, > ifversion int, > imported boolean, > jobid int, > keepuntil bigint, > nextchunk text, > node int, > recordingkey blob, > recstart bigint, > recstop bigint, > simulationid bigint, > systemstart bigint, > systemstop bigint, > tapelabel bigint, > version blob, > PRIMARY KEY (key) > ) WITH COMPACT STORAGE AND > bloom_filter_fp_chance=0.010000 AND > caching='KEYS_ONLY' AND > comment='demo' AND > dclocal_read_repair_chance=0.000000 AND > gc_grace_seconds=604800 AND > index_interval=128 AND > read_repair_chance=1.000000 AND > replicate_on_write='true' AND > populate_io_cache_on_flush='false' AND > default_time_to_live=0 AND > speculative_retry='NONE' AND > memtable_flush_period_in_ms=0 AND > compaction={'class': 'SizeTieredCompactionStrategy'} AND > compression={'sstable_compression': 'LZ4Compressor'}; > CREATE INDEX mwerrch_Demo_computer ON "B4Container_Demo" (computer); > CREATE INDEX mwerrch_Demo_node ON "B4Container_Demo" (node); > CREATE INDEX mwerrch_Demo_recordingkey ON "B4Container_Demo" (recordingkey); > cqlsh:mwerrch> INSERT INTO "B4Container_Demo" (key,computer,node) VALUES > (78c70562-1f98-3971-9c28-2c3d8e09c10f, 50, 50); cqlsh:mwerrch> select > key,node,computer from "B4Container_Demo" where computer=50; > key | node | computer > --------------------------------------+------+---------- > 78c70562-1f98-3971-9c28-2c3d8e09c10f | 50 | 50 > (1 rows) > cqlsh:mwerrch> DELETE FROM "B4Container_Demo" WHERE > key=78c70562-1f98-3971-9c28-2c3d8e09c10f; > cqlsh:mwerrch> select key,node,computer from "B4Container_Demo" where > computer=50; > (0 rows) > cqlsh:mwerrch> INSERT INTO "B4Container_Demo" (key,computer,node) VALUES > (78c70562-1f98-3971-9c28-2c3d8e09c10f, 50, 50); cqlsh:mwerrch> select > key,node,computer from "B4Container_Demo" where computer=50; > key | node | computer > --------------------------------------+------+---------- > 78c70562-1f98-3971-9c28-2c3d8e09c10f | 50 | 50 > (1 rows) > ********************************** > Now we execute (maybe from a different shell so we don't have to close this > session) from /opt/cassandra/current/bin directory: > ./nodetool cleanup > ./nodetool flush > ./nodetool compact > Going back to our CQL session the result will no longer be available if > queried via the index: > ********************************* > cqlsh:mwerrch> select key,node,computer from "B4Container_Demo" where > computer=50; > (0 rows) -- This message was sent by Atlassian JIRA (v6.1.5#6160)