[jira] [Commented] (CASSANDRA-8712) Out-of-sync secondary index

2015-02-05 Thread mlowicki (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14306835#comment-14306835
 ] 

mlowicki commented on CASSANDRA-8712:
-

[~slebresne] don't have repro steps yet. What I've found on our production 
though is that index returns always (17340/17340 cases) superset of what we get 
from table directly without supporting index. After reading 
www.datastax.com/dev/blog/improving-secondary-index-write-performance-in-1-2 I 
would suspect that there is problem with removing stale items from the index. 
What do you think? Should {{rebuild_index}} help with such issue or it just 
re-adds missing items and do not remove old ones?

 Out-of-sync secondary index
 ---

 Key: CASSANDRA-8712
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8712
 Project: Cassandra
  Issue Type: Bug
 Environment: 2.1.2
Reporter: mlowicki
 Fix For: 2.1.3


 I've such table with index:
 {code}
 CREATE TABLE entity (
 user_id text,
 data_type_id int,
 version bigint,
 id text,
 cache_guid text,
 client_defined_unique_tag text,
 ctime timestamp,
 deleted boolean,
 folder boolean,
 mtime timestamp,
 name text,
 originator_client_item_id text,
 parent_id text,
 position blob,
 server_defined_unique_tag text,
 specifics blob,
 PRIMARY KEY (user_id, data_type_id, version, id)
 ) WITH CLUSTERING ORDER BY (data_type_id ASC, version ASC, id ASC)
 AND bloom_filter_fp_chance = 0.01
 AND caching = '{keys:ALL, rows_per_partition:NONE}'
 AND comment = ''
 AND compaction = {'min_threshold': '4', 'class': 
 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
 'max_threshold': '32'}
 AND compression = {'sstable_compression': 
 'org.apache.cassandra.io.compress.LZ4Compressor'}
 AND dclocal_read_repair_chance = 0.1
 AND default_time_to_live = 0
 AND gc_grace_seconds = 864000
 AND max_index_interval = 2048
 AND memtable_flush_period_in_ms = 0
 AND min_index_interval = 128
 AND read_repair_chance = 0.0
 AND speculative_retry = '99.0PERCENTILE';
 CREATE INDEX index_entity_parent_id ON entity (parent_id);
 {code}
 It turned out that index became out of sync:
 {code}
  Entity.objects.filter(user_id='255824802', 
  parent_id=parent_id).consistency(6).count()
 16
  
  counter = 0
  for e in Entity.objects.filter(user_id='255824802'):
 ... if e.parent_id and e.parent_id == parent_id:
 ... counter += 1
 ... 
  counter
 10
 {code}
 After couple of hours it was fine (at night) but then when user probably 
 started to interact with DB we got the same problem. As a temporary solution 
 we'll try to rebuild indexes from time to time as suggested in 
 http://dev.nuclearrooster.com/2013/01/20/using-nodetool-to-rebuild-secondary-indexes-in-cassandra/
 Launched simple script for checking such anomaly and before rebuilding index 
 for 4024856 folders 10378 had this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8712) Out-of-sync secondary index

2015-02-02 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14301096#comment-14301096
 ] 

Sylvain Lebresne commented on CASSANDRA-8712:
-

I'm not familiar with django-cassandra-engine and I'm not sure other Cassandra 
devs are, so it would be much simpler to limit the layer used to reproduce (to 
limit the possibility that the problem actually come from one of those layers).

 Out-of-sync secondary index
 ---

 Key: CASSANDRA-8712
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8712
 Project: Cassandra
  Issue Type: Bug
 Environment: 2.1.2
Reporter: mlowicki
 Fix For: 2.1.3


 I've such table with index:
 {code}
 CREATE TABLE entity (
 user_id text,
 data_type_id int,
 version bigint,
 id text,
 cache_guid text,
 client_defined_unique_tag text,
 ctime timestamp,
 deleted boolean,
 folder boolean,
 mtime timestamp,
 name text,
 originator_client_item_id text,
 parent_id text,
 position blob,
 server_defined_unique_tag text,
 specifics blob,
 PRIMARY KEY (user_id, data_type_id, version, id)
 ) WITH CLUSTERING ORDER BY (data_type_id ASC, version ASC, id ASC)
 AND bloom_filter_fp_chance = 0.01
 AND caching = '{keys:ALL, rows_per_partition:NONE}'
 AND comment = ''
 AND compaction = {'min_threshold': '4', 'class': 
 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
 'max_threshold': '32'}
 AND compression = {'sstable_compression': 
 'org.apache.cassandra.io.compress.LZ4Compressor'}
 AND dclocal_read_repair_chance = 0.1
 AND default_time_to_live = 0
 AND gc_grace_seconds = 864000
 AND max_index_interval = 2048
 AND memtable_flush_period_in_ms = 0
 AND min_index_interval = 128
 AND read_repair_chance = 0.0
 AND speculative_retry = '99.0PERCENTILE';
 CREATE INDEX index_entity_parent_id ON entity (parent_id);
 {code}
 It turned out that index became out of sync:
 {code}
  Entity.objects.filter(user_id='255824802', 
  parent_id=parent_id).consistency(6).count()
 16
  
  counter = 0
  for e in Entity.objects.filter(user_id='255824802'):
 ... if e.parent_id and e.parent_id == parent_id:
 ... counter += 1
 ... 
  counter
 10
 {code}
 After couple of hours it was fine (at night) but then when user probably 
 started to interact with DB we got the same problem. As a temporary solution 
 we'll try to rebuild indexes from time to time as suggested in 
 http://dev.nuclearrooster.com/2013/01/20/using-nodetool-to-rebuild-secondary-indexes-in-cassandra/
 Launched simple script for checking such anomaly and before rebuilding index 
 for 4024856 folders 10378 had this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8712) Out-of-sync secondary index

2015-02-02 Thread mlowicki (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14301074#comment-14301074
 ] 

mlowicki commented on CASSANDRA-8712:
-

1. Drop keyspace
{code}
cqlsh use sync;
cqlsh:sync drop keyspace sync;
cqlsh:sync
{code}

2. Creating keyspace from scratch (I'm using sync_casandra from 
django-cassandra-engine)
{code}
./bin/django sync_cassandra
Creating keyspace sync..
Syncing sync.api.models.Entity
Syncing sync.api.models.UserStore
{code}

3. Populate database using Django's shell
{code}
 from sync.api.models import Entity, UserStore
 user = UserStore.objects.create(user_id='foo')
 root = Entity.objects.create(user_id='foo', data_type_id=0, version=0, 
 id='-1')
{code}
4. Run {{check_parent_index_consistency}} script:

{code}
./bin/django check_parent_index_consistency
{
folder: 1, 
user: 1
}
{code}

5. Add entities to root folder
{code}
 for i in range(1):
 Entity.objects.create(user_id='foo', data_type_id=0, version=0, id='a' 
 + str(i), parent_id='-1', folder=False)
{code}


6. While inserting run {{check_parent_index_consistency}} script:

{code}
./bin/django check_parent_index_consistency
{   
folder: 1,
inconsistent_folder: 1,
user: 1
}
{code}

Number of entities returned directly from {{entity}} while running insert was 
8918 but got only 372 from index.

It seems to be related to number of entities I'm adding. If less than 1 I 
couldn't reproduce the issue. When running {{check_parent_index_consistency}} 
script after couple of minutes it was completely fine - no inconsistencies. Not 
sure if this is the same issue as number of inconsistencies is zero after some 
time but maybe it'll help.

{{check_parent_index_consistency}} is available on https://cpaste.org/p7zht9rli

 Out-of-sync secondary index
 ---

 Key: CASSANDRA-8712
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8712
 Project: Cassandra
  Issue Type: Bug
 Environment: 2.1.2
Reporter: mlowicki
 Fix For: 2.1.3


 I've such table with index:
 {code}
 CREATE TABLE entity (
 user_id text,
 data_type_id int,
 version bigint,
 id text,
 cache_guid text,
 client_defined_unique_tag text,
 ctime timestamp,
 deleted boolean,
 folder boolean,
 mtime timestamp,
 name text,
 originator_client_item_id text,
 parent_id text,
 position blob,
 server_defined_unique_tag text,
 specifics blob,
 PRIMARY KEY (user_id, data_type_id, version, id)
 ) WITH CLUSTERING ORDER BY (data_type_id ASC, version ASC, id ASC)
 AND bloom_filter_fp_chance = 0.01
 AND caching = '{keys:ALL, rows_per_partition:NONE}'
 AND comment = ''
 AND compaction = {'min_threshold': '4', 'class': 
 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
 'max_threshold': '32'}
 AND compression = {'sstable_compression': 
 'org.apache.cassandra.io.compress.LZ4Compressor'}
 AND dclocal_read_repair_chance = 0.1
 AND default_time_to_live = 0
 AND gc_grace_seconds = 864000
 AND max_index_interval = 2048
 AND memtable_flush_period_in_ms = 0
 AND min_index_interval = 128
 AND read_repair_chance = 0.0
 AND speculative_retry = '99.0PERCENTILE';
 CREATE INDEX index_entity_parent_id ON entity (parent_id);
 {code}
 It turned out that index became out of sync:
 {code}
  Entity.objects.filter(user_id='255824802', 
  parent_id=parent_id).consistency(6).count()
 16
  
  counter = 0
  for e in Entity.objects.filter(user_id='255824802'):
 ... if e.parent_id and e.parent_id == parent_id:
 ... counter += 1
 ... 
  counter
 10
 {code}
 After couple of hours it was fine (at night) but then when user probably 
 started to interact with DB we got the same problem. As a temporary solution 
 we'll try to rebuild indexes from time to time as suggested in 
 http://dev.nuclearrooster.com/2013/01/20/using-nodetool-to-rebuild-secondary-indexes-in-cassandra/
 Launched simple script for checking such anomaly and before rebuilding index 
 for 4024856 folders 10378 had this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8712) Out-of-sync secondary index

2015-02-02 Thread mlowicki (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14301161#comment-14301161
 ] 

mlowicki commented on CASSANDRA-8712:
-

I'll try to provide sth soon. We've checked and {{rebuild_index}} doesn't help 
at all.

 Out-of-sync secondary index
 ---

 Key: CASSANDRA-8712
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8712
 Project: Cassandra
  Issue Type: Bug
 Environment: 2.1.2
Reporter: mlowicki
 Fix For: 2.1.3


 I've such table with index:
 {code}
 CREATE TABLE entity (
 user_id text,
 data_type_id int,
 version bigint,
 id text,
 cache_guid text,
 client_defined_unique_tag text,
 ctime timestamp,
 deleted boolean,
 folder boolean,
 mtime timestamp,
 name text,
 originator_client_item_id text,
 parent_id text,
 position blob,
 server_defined_unique_tag text,
 specifics blob,
 PRIMARY KEY (user_id, data_type_id, version, id)
 ) WITH CLUSTERING ORDER BY (data_type_id ASC, version ASC, id ASC)
 AND bloom_filter_fp_chance = 0.01
 AND caching = '{keys:ALL, rows_per_partition:NONE}'
 AND comment = ''
 AND compaction = {'min_threshold': '4', 'class': 
 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
 'max_threshold': '32'}
 AND compression = {'sstable_compression': 
 'org.apache.cassandra.io.compress.LZ4Compressor'}
 AND dclocal_read_repair_chance = 0.1
 AND default_time_to_live = 0
 AND gc_grace_seconds = 864000
 AND max_index_interval = 2048
 AND memtable_flush_period_in_ms = 0
 AND min_index_interval = 128
 AND read_repair_chance = 0.0
 AND speculative_retry = '99.0PERCENTILE';
 CREATE INDEX index_entity_parent_id ON entity (parent_id);
 {code}
 It turned out that index became out of sync:
 {code}
  Entity.objects.filter(user_id='255824802', 
  parent_id=parent_id).consistency(6).count()
 16
  
  counter = 0
  for e in Entity.objects.filter(user_id='255824802'):
 ... if e.parent_id and e.parent_id == parent_id:
 ... counter += 1
 ... 
  counter
 10
 {code}
 After couple of hours it was fine (at night) but then when user probably 
 started to interact with DB we got the same problem. As a temporary solution 
 we'll try to rebuild indexes from time to time as suggested in 
 http://dev.nuclearrooster.com/2013/01/20/using-nodetool-to-rebuild-secondary-indexes-in-cassandra/
 Launched simple script for checking such anomaly and before rebuilding index 
 for 4024856 folders 10378 had this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8712) Out-of-sync secondary index

2015-02-02 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14300987#comment-14300987
 ] 

Sylvain Lebresne commented on CASSANDRA-8712:
-

I think we'd need some kind of reproduction steps/script to make any kind of 
progress on this. 

 Out-of-sync secondary index
 ---

 Key: CASSANDRA-8712
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8712
 Project: Cassandra
  Issue Type: Bug
 Environment: 2.1.2
Reporter: mlowicki
 Fix For: 2.1.3


 I've such table with index:
 {code}
 CREATE TABLE entity (
 user_id text,
 data_type_id int,
 version bigint,
 id text,
 cache_guid text,
 client_defined_unique_tag text,
 ctime timestamp,
 deleted boolean,
 folder boolean,
 mtime timestamp,
 name text,
 originator_client_item_id text,
 parent_id text,
 position blob,
 server_defined_unique_tag text,
 specifics blob,
 PRIMARY KEY (user_id, data_type_id, version, id)
 ) WITH CLUSTERING ORDER BY (data_type_id ASC, version ASC, id ASC)
 AND bloom_filter_fp_chance = 0.01
 AND caching = '{keys:ALL, rows_per_partition:NONE}'
 AND comment = ''
 AND compaction = {'min_threshold': '4', 'class': 
 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
 'max_threshold': '32'}
 AND compression = {'sstable_compression': 
 'org.apache.cassandra.io.compress.LZ4Compressor'}
 AND dclocal_read_repair_chance = 0.1
 AND default_time_to_live = 0
 AND gc_grace_seconds = 864000
 AND max_index_interval = 2048
 AND memtable_flush_period_in_ms = 0
 AND min_index_interval = 128
 AND read_repair_chance = 0.0
 AND speculative_retry = '99.0PERCENTILE';
 CREATE INDEX index_entity_parent_id ON entity (parent_id);
 {code}
 It turned out that index became out of sync:
 {code}
  Entity.objects.filter(user_id='255824802', 
  parent_id=parent_id).consistency(6).count()
 16
  
  counter = 0
  for e in Entity.objects.filter(user_id='255824802'):
 ... if e.parent_id and e.parent_id == parent_id:
 ... counter += 1
 ... 
  counter
 10
 {code}
 After couple of hours it was fine (at night) but then when user probably 
 started to interact with DB we got the same problem. As a temporary solution 
 we'll try to rebuild indexes from time to time as suggested in 
 http://dev.nuclearrooster.com/2013/01/20/using-nodetool-to-rebuild-secondary-indexes-in-cassandra/
 Launched simple script for checking such anomaly and before rebuilding index 
 for 4024856 folders 10378 had this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)