[ https://issues.apache.org/jira/browse/CASSANDRA-8712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14306835#comment-14306835 ]
mlowicki edited comment on CASSANDRA-8712 at 2/11/15 7:30 AM: -------------------------------------------------------------- [~slebresne] don't have repro steps yet. What I've found on our production though is that index returns always (17340/17340 cases) superset of what we get from table directly without supporting index. After reading www.datastax.com/dev/blog/improving-secondary-index-write-performance-in-1-2 I would suspect that there is problem with removing stale items from the index. What do you think? Should {{rebuild_index}} help with such issue or it just re-adds missing items and do not remove old ones? Checked and {{rebuild_index}} doesn't fix the index. Still it returns too much data. was (Author: mlowicki): [~slebresne] don't have repro steps yet. What I've found on our production though is that index returns always (17340/17340 cases) superset of what we get from table directly without supporting index. After reading www.datastax.com/dev/blog/improving-secondary-index-write-performance-in-1-2 I would suspect that there is problem with removing stale items from the index. What do you think? Should {{rebuild_index}} help with such issue or it just re-adds missing items and do not remove old ones? > Out-of-sync secondary index > --------------------------- > > Key: CASSANDRA-8712 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8712 > Project: Cassandra > Issue Type: Bug > Environment: 2.1.2 > Reporter: mlowicki > Fix For: 2.1.3 > > > I've such table with index: > {code} > CREATE TABLE entity ( > user_id text, > data_type_id int, > version bigint, > id text, > cache_guid text, > client_defined_unique_tag text, > ctime timestamp, > deleted boolean, > folder boolean, > mtime timestamp, > name text, > originator_client_item_id text, > parent_id text, > position blob, > server_defined_unique_tag text, > specifics blob, > PRIMARY KEY (user_id, data_type_id, version, id) > ) WITH CLUSTERING ORDER BY (data_type_id ASC, version ASC, id ASC) > AND bloom_filter_fp_chance = 0.01 > AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}' > AND comment = '' > AND compaction = {'min_threshold': '4', 'class': > 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', > 'max_threshold': '32'} > AND compression = {'sstable_compression': > 'org.apache.cassandra.io.compress.LZ4Compressor'} > AND dclocal_read_repair_chance = 0.1 > AND default_time_to_live = 0 > AND gc_grace_seconds = 864000 > AND max_index_interval = 2048 > AND memtable_flush_period_in_ms = 0 > AND min_index_interval = 128 > AND read_repair_chance = 0.0 > AND speculative_retry = '99.0PERCENTILE'; > CREATE INDEX index_entity_parent_id ON entity (parent_id); > {code} > It turned out that index became out of sync: > {code} > >>> Entity.objects.filter(user_id='255824802', > >>> parent_id=parent_id).consistency(6).count() > 16 > > >>> counter = 0 > >>> for e in Entity.objects.filter(user_id='255824802'): > ... if e.parent_id and e.parent_id == parent_id: > ... counter += 1 > ... > >>> counter > 10 > {code} > After couple of hours it was fine (at night) but then when user probably > started to interact with DB we got the same problem. As a temporary solution > we'll try to rebuild indexes from time to time as suggested in > http://dev.nuclearrooster.com/2013/01/20/using-nodetool-to-rebuild-secondary-indexes-in-cassandra/ > Launched simple script for checking such anomaly and before rebuilding index > for 4024856 folders 10378 had this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)