[jira] [Commented] (CASSANDRA-5540) Concurrent secondary index updates remove rows from the index

Jonathan Ellis (JIRA) Mon, 06 May 2013 13:02:18 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-5540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13650050#comment-13650050
 ]


Jonathan Ellis commented on CASSANDRA-5540:
-------------------------------------------

Hmm.  It looks like this can happen when multiple inserts happen at the same 
timestamp, since we delete the existing entry with its own timestamp.  But if 
the replacement has the same timestamp, then the tombstone wins the tie.

Any clever ideas to fix this [~beobal]?
                
> Concurrent secondary index updates remove rows from the index
> -------------------------------------------------------------
>
>                 Key: CASSANDRA-5540
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5540
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.2.4
>            Reporter: Alexei Bakanov
>
> Existing rows disappear from secondary index when doing simultaneous updates 
> of a row with the same secondary index value.
> Here is a little pycassa script that reproduces a bug. The script inserts 4 
> rows with same secondary index value, reads those rows back and check that 
> there are 4 of them.
> Please run two instances of the script simultaneously in two separate 
> terminals in order to simulate concurrent updates:
> {code}
> -----scrpit.py START-----
> import pycassa
> from pycassa.index import *
> pool = pycassa.ConnectionPool('ks123')
> cf = pycassa.ColumnFamily(pool, 'cf1')
> while True:
>     for rowKey in xrange(4):
>         cf.insert(str(rowKey), {'indexedColumn': 'indexedValue'})
>     index_expression = create_index_expression('indexedColumn', 
> 'indexedValue')
>     index_clause = create_index_clause([index_expression])
>     rows = cf.get_indexed_slices(index_clause)
>     length = len(list(rows))
>     if length == 4:
>         pass
>     else:
>         print 'found just %d rows out of 4' % length
> pool.dispose()
> ---script.py FINISH---
> ---schema cli start---
> create keyspace ks123
>   with placement_strategy = 'NetworkTopologyStrategy'
>   and strategy_options = {datacenter1 : 1}
>   and durable_writes = true;
> use ks123;
> create column family cf1
>   with column_type = 'Standard'
>   and comparator = 'AsciiType'
>   and default_validation_class = 'AsciiType'
>   and key_validation_class = 'AsciiType'
>   and read_repair_chance = 0.1
>   and dclocal_read_repair_chance = 0.0
>   and populate_io_cache_on_flush = false
>   and gc_grace = 864000
>   and min_compaction_threshold = 4
>   and max_compaction_threshold = 32
>   and replicate_on_write = true
>   and compaction_strategy = 
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'
>   and caching = 'KEYS_ONLY'
>   and column_metadata = [
>     {column_name : 'indexedColumn',
>     validation_class : AsciiType,
>     index_name : 'INDEX1',
>     index_type : 0}]
>   and compression_options = {'sstable_compression' : 
> 'org.apache.cassandra.io.compress.SnappyCompressor'};
> ---schema cli finish---
> {code}
> Test cluster created with 'ccm create --cassandra-version 1.2.4 --nodes 1 
> --start testUpdate'

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-5540) Concurrent secondary index updates remove rows from the index

Reply via email to