[ https://issues.apache.org/jira/browse/CASSANDRA-3641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13173138#comment-13173138 ]
Sylvain Lebresne commented on CASSANDRA-3641: --------------------------------------------- Let's open a separate ticket to discuss that. So far we've use the log only for recording errors so let's keep it at that for this ticket. > inconsistent/corrupt counters w/ broken shards never converge > ------------------------------------------------------------- > > Key: CASSANDRA-3641 > URL: https://issues.apache.org/jira/browse/CASSANDRA-3641 > Project: Cassandra > Issue Type: Bug > Reporter: Peter Schuller > Assignee: Peter Schuller > Attachments: 3641-0.8-internal-not-for-inclusion.txt, 3641-trunk.txt > > > We ran into a case (which MIGHT be related to CASSANDRA-3070) whereby we had > counters that were corrupt (hopefully due to CASSANDRA-3178). The corruption > was that there would exist shards with the *same* node_id, *same* clock id, > but *different* counts. > The counter column diffing and reconciliation code assumes that this never > happens, and ignores the count. The problem with this is that if there is an > inconsistency, the result of a reconciliation will depend on the order of the > shards. > In our case for example, we would see the value of the counter randomly > fluctuating on a CL.ALL read, but we would get consistent (whatever the node > had) on CL.ONE (submitted to one of the nodes in the replica set for the key). > In addition, read repair would not work despite digest mismatches because the > diffing algorithm also did not care about the counts when determining the > differences to send. > I'm attaching patches that fixes this. The first patch is against our 0.8 > branch, which is not terribly useful to people, but I include it because it > is the well-tested version that we have used on the production cluster which > was subject to this corruption. > The other patch is against trunk, and contains the same change. > What the patch does is: > * On diffing, treat as DISJOINT if there is a count discrepancy. > * On reconciliation, look at the count and *deterministically* pick the > higher one, and: > ** log the fact that we detected a corrupt counter > ** increment a JMX observable counter for monitoring purposes > A cluster which is subject to such corruption and has this patch, will fix > itself with and AES + compact (or just repeated compactions assuming the > replicate-on-compact is able to deliver correctly). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira