[jira] Issue Comment Edited: (CASSANDRA-1040) read failure during flush

Stu Hood (JIRA) Fri, 07 May 2010 11:50:16 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12865261#action_12865261
 ]


Stu Hood edited comment on CASSANDRA-1040 at 5/7/10 2:49 PM:
-------------------------------------------------------------

Independent of (but related to) this issue, ColumnFamilyStore.getRangeRows has 
a race condition in memtable handling. The order of operations that might 
trigger the problem is:
# Copy memtablesPendingFlush
# (new memtable becomes pending)
# Copy reference to current Memtable

Swapping 3. with 1. would prevent new memtables from being ignored, but would 
mean we might scan one memtable twice. Making 1. and 3. atomic would remove the 
race, but is a longer time to hold the lock than we are use to.

EDIT: this description only applies to trunk

      was (Author: stuhood):
    Independent of (but related to) this issue, ColumnFamilyStore.getRangeRows 
has a race condition in memtable handling. The order of operations that might 
trigger the problem is:
# Copy memtablesPendingFlush
# (new memtable becomes pending)
# Copy reference to current Memtable

Swapping 3. with 1. would prevent new memtables from being ignored, but would 
mean we might scan one memtable twice. Making 1. and 3. atomic would remove the 
race, but is a longer time to hold the lock than we are use to.
  
> read failure during flush
> -------------------------
>
>                 Key: CASSANDRA-1040
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1040
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>            Priority: Critical
>             Fix For: 0.6.2
>
>
> Joost Ouwerkerk writes:
>       
> On a single-node cassandra cluster with basic config (-Xmx:1G)
> loop {
>   * insert 5,000 records in a single columnfamily with UUID keys and
> random string values (between 1 and 1000 chars) in 5 different columns
> spanning two different supercolumns
>   * delete all the data by iterating over the rows with
> get_range_slices(ONE) and calling remove(QUORUM) on each row id
> returned (path containing only columnfamily)
>   * count number of non-tombstone rows by iterating over the rows
> with get_range_slices(ONE) and testing data.  Break if not zero.
> }
> while this is running, call "bin/nodetool -h localhost -p 8081 flush 
> KeySpace" in the background every minute or so.  When the data hits some 
> critical size, the loop will break.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (CASSANDRA-1040) read failure during flush

Reply via email to