[ 
https://issues.apache.org/jira/browse/CASSANDRA-5540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13652893#comment-13652893
 ] 

Sam Tunnicliffe commented on CASSANDRA-5540:
--------------------------------------------

yes, you're right that's dumb sorry. 

It took me a while, but there's actually 2 issues here. The first, as you 
identified, is caused by overwrites with identical timestamps and is fixed by 
making the case where oldColumn.equals(newColumn) a no-op. The second is the 
window of inconsistency that I mentioned earlier. When the 2 instances of the 
test script are running, its possible for one to query the index while 
inbetween the old index entry being deleted & the new one inserted, leading to 
a "missing" result. To address that, I've reversed the order so that the new 
entry is added before the old one is removed. This should be safe for readers 
due to the checking for stale values in the index searcher. 
                
> Concurrent secondary index updates remove rows from the index
> -------------------------------------------------------------
>
>                 Key: CASSANDRA-5540
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5540
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.2.4
>            Reporter: Alexei Bakanov
>            Assignee: Sam Tunnicliffe
>         Attachments: 
> 0001-Use-different-index-updater-for-live-updates-compact.patch, 5540.txt
>
>
> Existing rows disappear from secondary index when doing simultaneous updates 
> of a row with the same secondary index value.
> Here is a little pycassa script that reproduces a bug. The script inserts 4 
> rows with same secondary index value, reads those rows back and check that 
> there are 4 of them.
> Please run two instances of the script simultaneously in two separate 
> terminals in order to simulate concurrent updates:
> {code}
> -----scrpit.py START-----
> import pycassa
> from pycassa.index import *
> pool = pycassa.ConnectionPool('ks123')
> cf = pycassa.ColumnFamily(pool, 'cf1')
> while True:
>     for rowKey in xrange(4):
>         cf.insert(str(rowKey), {'indexedColumn': 'indexedValue'})
>     index_expression = create_index_expression('indexedColumn', 
> 'indexedValue')
>     index_clause = create_index_clause([index_expression])
>     rows = cf.get_indexed_slices(index_clause)
>     length = len(list(rows))
>     if length == 4:
>         pass
>     else:
>         print 'found just %d rows out of 4' % length
> pool.dispose()
> ---script.py FINISH---
> ---schema cli start---
> create keyspace ks123
>   with placement_strategy = 'NetworkTopologyStrategy'
>   and strategy_options = {datacenter1 : 1}
>   and durable_writes = true;
> use ks123;
> create column family cf1
>   with column_type = 'Standard'
>   and comparator = 'AsciiType'
>   and default_validation_class = 'AsciiType'
>   and key_validation_class = 'AsciiType'
>   and read_repair_chance = 0.1
>   and dclocal_read_repair_chance = 0.0
>   and populate_io_cache_on_flush = false
>   and gc_grace = 864000
>   and min_compaction_threshold = 4
>   and max_compaction_threshold = 32
>   and replicate_on_write = true
>   and compaction_strategy = 
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'
>   and caching = 'KEYS_ONLY'
>   and column_metadata = [
>     {column_name : 'indexedColumn',
>     validation_class : AsciiType,
>     index_name : 'INDEX1',
>     index_type : 0}]
>   and compression_options = {'sstable_compression' : 
> 'org.apache.cassandra.io.compress.SnappyCompressor'};
> ---schema cli finish---
> {code}
> Test cluster created with 'ccm create --cassandra-version 1.2.4 --nodes 1 
> --start testUpdate'

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to