[ 
https://issues.apache.org/jira/browse/CASSANDRA-5540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-5540:
---------------------------------------

    Attachment: 0001-Use-different-index-updater-for-live-updates-compact.patch

Checking oldColumn.equals(column) in SU.update() isn't sufficient. I found that 
even with the short circuit, occasionally the test script would return only 3 
of the 4 expected columns. My suspicion is that this is caused by the delete & 
subsequent insert in SU.update() being non-atomic, though I haven't proved 
this. Rather than go down that rabbit hole, I've split the Updater 
implementation into 2 subclasses - LiveUpdater & CompactionUpdater. The 
difference between them is that the CU behaves like SU and always purges old 
values, whereas LU just upserts into the index. SIM.updaterFor() now takes a 
second argument to determine whether the updater is for processing live updates 
or for use during compaction.

Unit tests pass & the test script runs without issue.

                
> Concurrent secondary index updates remove rows from the index
> -------------------------------------------------------------
>
>                 Key: CASSANDRA-5540
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5540
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.2.4
>            Reporter: Alexei Bakanov
>            Assignee: Sam Tunnicliffe
>         Attachments: 
> 0001-Use-different-index-updater-for-live-updates-compact.patch
>
>
> Existing rows disappear from secondary index when doing simultaneous updates 
> of a row with the same secondary index value.
> Here is a little pycassa script that reproduces a bug. The script inserts 4 
> rows with same secondary index value, reads those rows back and check that 
> there are 4 of them.
> Please run two instances of the script simultaneously in two separate 
> terminals in order to simulate concurrent updates:
> {code}
> -----scrpit.py START-----
> import pycassa
> from pycassa.index import *
> pool = pycassa.ConnectionPool('ks123')
> cf = pycassa.ColumnFamily(pool, 'cf1')
> while True:
>     for rowKey in xrange(4):
>         cf.insert(str(rowKey), {'indexedColumn': 'indexedValue'})
>     index_expression = create_index_expression('indexedColumn', 
> 'indexedValue')
>     index_clause = create_index_clause([index_expression])
>     rows = cf.get_indexed_slices(index_clause)
>     length = len(list(rows))
>     if length == 4:
>         pass
>     else:
>         print 'found just %d rows out of 4' % length
> pool.dispose()
> ---script.py FINISH---
> ---schema cli start---
> create keyspace ks123
>   with placement_strategy = 'NetworkTopologyStrategy'
>   and strategy_options = {datacenter1 : 1}
>   and durable_writes = true;
> use ks123;
> create column family cf1
>   with column_type = 'Standard'
>   and comparator = 'AsciiType'
>   and default_validation_class = 'AsciiType'
>   and key_validation_class = 'AsciiType'
>   and read_repair_chance = 0.1
>   and dclocal_read_repair_chance = 0.0
>   and populate_io_cache_on_flush = false
>   and gc_grace = 864000
>   and min_compaction_threshold = 4
>   and max_compaction_threshold = 32
>   and replicate_on_write = true
>   and compaction_strategy = 
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'
>   and caching = 'KEYS_ONLY'
>   and column_metadata = [
>     {column_name : 'indexedColumn',
>     validation_class : AsciiType,
>     index_name : 'INDEX1',
>     index_type : 0}]
>   and compression_options = {'sstable_compression' : 
> 'org.apache.cassandra.io.compress.SnappyCompressor'};
> ---schema cli finish---
> {code}
> Test cluster created with 'ccm create --cassandra-version 1.2.4 --nodes 1 
> --start testUpdate'

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to