[ 
https://issues.apache.org/jira/browse/CASSANDRA-6023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-6023:
----------------------------------------

    Attachment: 0002-Populate-commitsByReplica-in-PrepareCallback.txt
                0001-Distinguish-between-promised-and-accepted-ballots.txt

Attaching patch for the suggestion above. The patch also simplify slightly 
SK.savePaxosCommit: we used to not erase the update if the commit was older 
than in-progress. I believe that was a bit buggy and in any case unecessary 
since we write with the commit timestamp (so that there was no risk to erase a 
more recent update in fact). The other thing we were doing is to update the 
in-progress ballot if the commit was newer: I'm not sure that has any benefit 
and it makes me nervous to update in-progress outside of the prepare phase. 
Besides, if we remove that, we don't need to read the state to commit, which 
save a read and the lock acquisition on every commit.

I'm including a 2nd trivial patch that adds the population of commitsByReplica 
in PrepareCallback. It's partly unrelated to the problem of this ticket but 
it's clearly wrong and I'm not sure that warrant a separate ticket.

                
> CAS should distinguish promised and accepted ballots
> ----------------------------------------------------
>
>                 Key: CASSANDRA-6023
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6023
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>             Fix For: 2.0.1
>
>         Attachments: 
> 0001-Distinguish-between-promised-and-accepted-ballots.txt, 
> 0002-Populate-commitsByReplica-in-PrepareCallback.txt
>
>
> Currently, we only keep 1) the most recent promise we've made and 2) the last 
> update we've accepted. But we don't keep the ballot at which that last update 
> was accepted. And because a node always promise to newer ballot, this means 
> an already committed update can be replayed even after another update has 
> been committed. Re-committing a value is fine, but only as long as we've not 
> start a new round yet.
> Concretely, we can have the following case (with 3 nodes A, B and C) with the 
> current implementation:
> * A proposer P1 prepare and propose a value X at ballot t1. It is accepted by 
> all nodes.
> * A proposer P2 propose at t2 (wanting to commit a new value Y). If say A and 
> B receive the commit of P1 before the propose of P2 but C receives those in 
> the reverse order, we'll current have the following states:
> {noformat}
> A: in-progress = (t2, _), mrc = (t1, X)
> B: in-progress = (t2, _), mrc = (t1, X)
> C: in-progress = (t2, X), mrc = (t1, X)
> {noformat}
> Because C has received the t1 commit after promising t2, it won't have 
> removed X during t1 commit (but note that the problem is not during commit, 
> that example still stand if C never receive any commit message).
> * Now, based on the promise of A and B, P2 will propose Y at t2 (C don't see 
> this propose in particular, not before he promise on t3 below at least). A 
> and B accepts, P2 will send a commit for Y.
> * In the meantime a proposer P3 submit a prepare at t3 (for some other 
> irrelevant value) which reaches C before it receives P2 propose&commit. That 
> prepare reaches A and B too, but after the P2 commit. At that point the state 
> will be:
> {noformat}
> A: in-progress = (t3, _), mrc = (t2, Y)
> B: in-progress = (t3, _), mrc = (t2, Y)
> C: in-progress = (t3, X), mrc = (t2, Y)
> {noformat}
> In particular, C still has X as update because each time it got a commit, it 
> has promised to a more recent ballot and thus skipped the delete. The value 
> is still X because it has received the P2 propose after having promised t3 
> and has thus refused it.
> * P3 gets back the promise of say C and A. Both response has t3 as 
> in-progress ballot (and it is more recent than any mrc) but C comes with 
> value X. So P3 will replay X. Assuming no more contention this replay will 
> succeed and X will be committed at t3.
> At the end of that example, we've comitted X, Y and then X again, even though 
> only P1 has ever proposed X.
> I believe the correct fix is to keep the ballot of when an update is accepted 
> (instead of using the most recent promised ballot). That way, in the example 
> above, P3 would receive from C a promise on t3, but would know that X was 
> accepted at t1. And so P3 would be able to ignore X since the mrc of A will 
> tell him it's an obsolete value.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to