[ 
https://issues.apache.org/jira/browse/CASSANDRA-8979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14390519#comment-14390519
 ] 

Sylvain Lebresne commented on CASSANDRA-8979:
---------------------------------------------

I'm not sure unconditionally skipping the top-level partition tombstone if it's 
live is a good idea, because that means that if you do a repair with nodes both 
prior and after this patch, all partitions will mismatch (they will include the 
live partition tombstone before this patch, not after it). And that's probably 
not something we should do in a minor upgrade.

Now, I don't think including the top-level partition tombstone is really a 
problem in general, it's only when the partition ends up having no live cells 
(when the partition is empty) that we should skip it so it's equivalent to 
having no partition at all. If we only skip it in that case, that will avoid 
the repair hell in mixed version cluster.

> MerkleTree mismatch for deleted and non-existing rows
> -----------------------------------------------------
>
>                 Key: CASSANDRA-8979
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8979
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Stefan Podkowinski
>            Assignee: Stefan Podkowinski
>             Fix For: 2.1.4, 2.0.14
>
>         Attachments: 8979-AvoidBufferAllocation-2.0_patch.txt, 
> cassandra-2.0-8979-lazyrow_patch.txt, cassandra-2.0-8979-validator_patch.txt, 
> cassandra-2.0-8979-validatortest_patch.txt, 
> cassandra-2.1-8979-lazyrow_patch.txt, cassandra-2.1-8979-validator_patch.txt
>
>
> Validation compaction will currently create different hashes for rows that 
> have been deleted compared to nodes that have not seen the rows at all or 
> have already compacted them away. 
> In case this sounds familiar to you, see CASSANDRA-4905 which was supposed to 
> prevent hashing of expired tombstones. This still seems to be in place, but 
> does not address the issue completely. Or there was a change in 2.0 that 
> rendered the patch ineffective. 
> The problem is that rowHash() in the Validator will return a new hash in any 
> case, whether the PrecompactedRow did actually update the digest or not. This 
> will lead to the case that a purged, PrecompactedRow will not change the 
> digest, but we end up with a different tree compared to not having rowHash 
> called at all (such as in case the row already doesn't exist).
> As an implication, repair jobs will constantly detect mismatches between 
> older sstables containing purgable rows and nodes that have already compacted 
> these rows. After transfering the reported ranges, the newly created sstables 
> will immediately get deleted again during the following compaction. This will 
> happen for each repair run over again until the sstable with the purgable row 
> finally gets compacted. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to