Topology changes can lead to bad counters (at RF=1)
---------------------------------------------------

                 Key: CASSANDRA-4071
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4071
             Project: Cassandra
          Issue Type: Bug
          Components: Core
    Affects Versions: 1.0.0
            Reporter: Sylvain Lebresne


A counter is broken into shards (partitions), each shard being 'owned' by a 
given replica (meaning that only this replica will increment that shard).  For 
a given node A, the resolution of 2 shards (having the same owner) follows the 
following rules:
* if the shards are owned by A, then sum the values (in the original patch, 
'owned by A' was based on the machine IP address, in the current code, it's 
based on the shard having a delta flag but the principle is the same)
* otherwise, keep the maximum value (based on the shards clocks)

During topology changes (boostrap/move/decommission), we transfer data from A 
to B, but the shards owned by A are not owned by B (and we cannot make them 
owned by B because during those operations (boostrap, ...) a given shard would 
be owned by A and B which would break counters). But this means that B won't 
interpret the streamed shards correctly.

Concretely, if A receives a number of counter increments that end up in 
different sstables (the shards should thus be summed) and then those increments 
are streamed to B as part of boostrap, B will not sum the increments but use 
the clocks to keep the maximum value.

I've pushed a test that show the breakeage at 
https://github.com/riptano/cassandra-dtest/commits/counters_test (the test 
needs CASSANDRA-4070 to work correctly).

Note that in practice, replication will hide this (even though B will have the 
bad value after the boostrap, read or read/repair from the other replica will 
repair it). This is a problem for RF=1 however.

Another problem is that during repair, a node won't correctly repair other 
nodes on it's own shards (unless everything is fully compacted).


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to