[jira] [Issue Comment Edited] (CASSANDRA-2590) row delete breaks read repair

Jonathan Ellis (JIRA) Tue, 03 May 2011 15:29:44 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-2590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13028462#comment-13028462
 ]


Jonathan Ellis edited comment on CASSANDRA-2590 at 5/3/11 10:27 PM:
--------------------------------------------------------------------

... but that's not what we want for RowRepairResolver. (I freely admit that 
dealing with tombstones is subtle and tricky. :)

removeDeleted will give you back a version of the row with any GC-able 
tombstones removed. That's not what we want for read repair; we want to 
preserve tombstones, but we want a "canonical" representation of only the 
minimum tombstones necessary. (Technically, this doesn't matter for the repair 
per se, because repairing obsolete data is harmless. What we're concerned with 
is getting the right result back to the client, and thriftifyColumns & friends 
in CassandraServer assume that canonicalization has been performed previously.)

So we do want to do what you were doing with ensureRelevant, but it's a little 
more complex than that because we have the same problem at the supercolumn 
level, as at the row level.

QueryFilter.collectCollatedColumns is responsible for doing this when merging 
different versions from memtables and sstables, so we just need to wire it up 
in RRR. Here's a patch that uses an IdentityQueryFilter to do this.

      was (Author: jbellis):
    ... but that's not what we want for RowRepairResolver. (I freely admit that 
dealing with tombstones is subtle and tricky. :)

removeDeleted will give you back a version of the row with any GC-able 
tombstones removed. That's not what we want for read repair; we want to 
preserve tombstones, but we want a "canonical" representation of only the 
minimum tombstones necessary. (Technically, this doesn't matter for the repair 
per se, because repairing obsolete data is harmless. What we're concerned with 
is getting the right result back to the client, and thriftifyColumns & friends 
in CassandraServer assume that canonicalization has been performed previously.)

So we do want to do what you were doing with ensureRelevant, but it's a little 
more complex than that because we have the same problem at the supercolumn 
level, as at the row level.

This is what QueryFilter.collectCollatedColumns already does when merging 
different versions from memtables and sstables, so we just need to wire it up 
in RRR. Here's a patch that uses an IdentityQueryFilter to do this.
  
> row delete breaks read repair 
> ------------------------------
>
>                 Key: CASSANDRA-2590
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2590
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.5, 0.8 beta 1
>            Reporter: Aaron Morton
>            Assignee: Aaron Morton
>            Priority: Minor
>         Attachments: 
> 0001-cf-resolve-test-and-possible-solution-for-read-repai.patch, 2590-v2.txt
>
>
> related to CASSANDRA-2589 
> Working at CL ALL can get inconsistent reads after row deletion. Reproduced 
> on the 0.7 and 0.8 source. 
> Steps to reproduce:
> # two node cluster with rf 2 and HH turned off
> # insert rows via cli 
> # flush both nodes 
> # shutdown node 1
> # connect to node 2 via cli and delete one row
> # bring up node 1
> # connect to node 1 via cli and issue get with CL ALL 
> # first get returns the deleted row, second get returns zero rows.
> RowRepairResolver.resolveSuperSet() resolves a local CF with the old row 
> columns, and the remote CF which is marked for deletion. CF.resolve() does 
> not pay attention to the deletion flags and the resolved CF has both 
> markedForDeletion set and a column with a lower timestamp. The return from 
> resolveSuperSet() is used as the return for the read without checking if the 
> cols are relevant. 
> Also when RowRepairResolver.mabeScheduleRepairs() runs it sends two 
> mutations. Node 1 is given the row level deletation, and Node 2 is given a 
> mutation to write the old (and now deleted) column from node 2. I have some 
> log traces for this if needed. 
> A quick fix is to check for relevant columns in the RowRepairResolver, will 
> attach shortly.    

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (CASSANDRA-2590) row delete breaks read repair

Reply via email to