[jira] [Commented] (CASSANDRA-5540) Concurrent secondary index updates remove rows from the index

Sam Tunnicliffe (JIRA) Tue, 07 May 2013 05:31:25 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-5540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13650762#comment-13650762
 ]


Sam Tunnicliffe commented on CASSANDRA-5540:
--------------------------------------------

I don't think this is caused by the index updates in KeysSearcher. There, we 
only compare the values & since this test always writes the same values the 
index entry is never deemed stale, and so we don't ever write a tombstone. 

The test script does reproduce the issue completely reliably though, so I'll 
dig in and find the actual cause.
                
> Concurrent secondary index updates remove rows from the index
> -------------------------------------------------------------
>
>                 Key: CASSANDRA-5540
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5540
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.2.4
>            Reporter: Alexei Bakanov
>
> Existing rows disappear from secondary index when doing simultaneous updates 
> of a row with the same secondary index value.
> Here is a little pycassa script that reproduces a bug. The script inserts 4 
> rows with same secondary index value, reads those rows back and check that 
> there are 4 of them.
> Please run two instances of the script simultaneously in two separate 
> terminals in order to simulate concurrent updates:
> {code}
> -----scrpit.py START-----
> import pycassa
> from pycassa.index import *
> pool = pycassa.ConnectionPool('ks123')
> cf = pycassa.ColumnFamily(pool, 'cf1')
> while True:
>     for rowKey in xrange(4):
>         cf.insert(str(rowKey), {'indexedColumn': 'indexedValue'})
>     index_expression = create_index_expression('indexedColumn', 
> 'indexedValue')
>     index_clause = create_index_clause([index_expression])
>     rows = cf.get_indexed_slices(index_clause)
>     length = len(list(rows))
>     if length == 4:
>         pass
>     else:
>         print 'found just %d rows out of 4' % length
> pool.dispose()
> ---script.py FINISH---
> ---schema cli start---
> create keyspace ks123
>   with placement_strategy = 'NetworkTopologyStrategy'
>   and strategy_options = {datacenter1 : 1}
>   and durable_writes = true;
> use ks123;
> create column family cf1
>   with column_type = 'Standard'
>   and comparator = 'AsciiType'
>   and default_validation_class = 'AsciiType'
>   and key_validation_class = 'AsciiType'
>   and read_repair_chance = 0.1
>   and dclocal_read_repair_chance = 0.0
>   and populate_io_cache_on_flush = false
>   and gc_grace = 864000
>   and min_compaction_threshold = 4
>   and max_compaction_threshold = 32
>   and replicate_on_write = true
>   and compaction_strategy = 
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'
>   and caching = 'KEYS_ONLY'
>   and column_metadata = [
>     {column_name : 'indexedColumn',
>     validation_class : AsciiType,
>     index_name : 'INDEX1',
>     index_type : 0}]
>   and compression_options = {'sstable_compression' : 
> 'org.apache.cassandra.io.compress.SnappyCompressor'};
> ---schema cli finish---
> {code}
> Test cluster created with 'ccm create --cassandra-version 1.2.4 --nodes 1 
> --start testUpdate'

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-5540) Concurrent secondary index updates remove rows from the index

Reply via email to