[ https://issues.apache.org/jira/browse/CASSANDRA-6023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sylvain Lebresne updated CASSANDRA-6023: ---------------------------------------- Attachment: 0002-Populate-commitsByReplica-in-PrepareCallback.txt 0001-Distinguish-between-promised-and-accepted-ballots.txt Attaching patch for the suggestion above. The patch also simplify slightly SK.savePaxosCommit: we used to not erase the update if the commit was older than in-progress. I believe that was a bit buggy and in any case unecessary since we write with the commit timestamp (so that there was no risk to erase a more recent update in fact). The other thing we were doing is to update the in-progress ballot if the commit was newer: I'm not sure that has any benefit and it makes me nervous to update in-progress outside of the prepare phase. Besides, if we remove that, we don't need to read the state to commit, which save a read and the lock acquisition on every commit. I'm including a 2nd trivial patch that adds the population of commitsByReplica in PrepareCallback. It's partly unrelated to the problem of this ticket but it's clearly wrong and I'm not sure that warrant a separate ticket. > CAS should distinguish promised and accepted ballots > ---------------------------------------------------- > > Key: CASSANDRA-6023 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6023 > Project: Cassandra > Issue Type: Bug > Reporter: Sylvain Lebresne > Assignee: Sylvain Lebresne > Fix For: 2.0.1 > > Attachments: > 0001-Distinguish-between-promised-and-accepted-ballots.txt, > 0002-Populate-commitsByReplica-in-PrepareCallback.txt > > > Currently, we only keep 1) the most recent promise we've made and 2) the last > update we've accepted. But we don't keep the ballot at which that last update > was accepted. And because a node always promise to newer ballot, this means > an already committed update can be replayed even after another update has > been committed. Re-committing a value is fine, but only as long as we've not > start a new round yet. > Concretely, we can have the following case (with 3 nodes A, B and C) with the > current implementation: > * A proposer P1 prepare and propose a value X at ballot t1. It is accepted by > all nodes. > * A proposer P2 propose at t2 (wanting to commit a new value Y). If say A and > B receive the commit of P1 before the propose of P2 but C receives those in > the reverse order, we'll current have the following states: > {noformat} > A: in-progress = (t2, _), mrc = (t1, X) > B: in-progress = (t2, _), mrc = (t1, X) > C: in-progress = (t2, X), mrc = (t1, X) > {noformat} > Because C has received the t1 commit after promising t2, it won't have > removed X during t1 commit (but note that the problem is not during commit, > that example still stand if C never receive any commit message). > * Now, based on the promise of A and B, P2 will propose Y at t2 (C don't see > this propose in particular, not before he promise on t3 below at least). A > and B accepts, P2 will send a commit for Y. > * In the meantime a proposer P3 submit a prepare at t3 (for some other > irrelevant value) which reaches C before it receives P2 propose&commit. That > prepare reaches A and B too, but after the P2 commit. At that point the state > will be: > {noformat} > A: in-progress = (t3, _), mrc = (t2, Y) > B: in-progress = (t3, _), mrc = (t2, Y) > C: in-progress = (t3, X), mrc = (t2, Y) > {noformat} > In particular, C still has X as update because each time it got a commit, it > has promised to a more recent ballot and thus skipped the delete. The value > is still X because it has received the P2 propose after having promised t3 > and has thus refused it. > * P3 gets back the promise of say C and A. Both response has t3 as > in-progress ballot (and it is more recent than any mrc) but C comes with > value X. So P3 will replay X. Assuming no more contention this replay will > succeed and X will be committed at t3. > At the end of that example, we've comitted X, Y and then X again, even though > only P1 has ever proposed X. > I believe the correct fix is to keep the ballot of when an update is accepted > (instead of using the most recent promised ballot). That way, in the example > above, P3 would receive from C a promise on t3, but would know that X was > accepted at t1. And so P3 would be able to ignore X since the mrc of A will > tell him it's an obsolete value. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira