[ 
https://issues.apache.org/jira/browse/CASSANDRA-5062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13588567#comment-13588567
 ] 

Cristian Opris edited comment on CASSANDRA-5062 at 2/27/13 6:09 PM:
--------------------------------------------------------------------

Note that a proposal may eventually succeed on recovery even if a less than a 
quorum has managed to ack it before the leader fails (and the client timed 
out). The need for quorum writes is to be able to survive F failures out of 
2F+1 replicas. Reads are not quorum, just replica local reads.

Let's say we have 5 replicas, F1 leader, F4 and F5 are ignored here as they 
don't matter
{code}
1a F1 -> proposal -> F2
1b F1 <-  ack     <- F2
2a F1 -> proposal -> F3
2b F1 <-  ack     <- F3
3a F1 ->  OK      -> client
3b F1 -> COMMIT   -> F2,F3
{code}

If F1 fails immediately after step 1b, F2 would become the leader since he has 
the latest seq number. Now only F2 has the proposal but it can continue and 
commit it to the other followers.
If it can't get a quorum (maybe it's partitioned in a minority) then it gives 
up leadership. When it rejoins the majority, it runs another recovery procedure 
that uses epoch numbers to determine if it needs to throw away that proposal. 
This is fine since no client has actually been confirmed that the proposal has 
been committed. This is detailed in the paper.

                
      was (Author: onetoinfin...@yahoo.com):
    Note that a proposal may eventually succeed on recovery even if a less than 
a quorum has managed to ack it before the leader fails (and the client timed 
out). The need for quorum writes is to be able to survive F failures out of 
2F+1 replicas. Reads are not quorum, just replica local reads.

Let's say we have 5 replicas, F1 leader, F4 and F5 are ignored here as they 
don't matter
{{
1a. F1 -> proposal -> F2
1b. F1 <-  ack     <- F2
2a. F1 -> proposal -> F3
2b. F1 <-  ack     <- F3
3a F1 ->  OK      -> client
3b F1 -> COMMIT   -> F2,F3
}}

If F1 fails immediately after step 1b, F2 would become the leader since he has 
the latest seq number. Now only F2 has the proposal but it can continue and 
commit it to the other followers.
If it can't get a quorum (maybe it's partitioned in a minority) then it gives 
up leadership. When it rejoins the majority, it runs another recovery procedure 
that uses epoch numbers to determine if it needs to throw away that proposal. 
This is fine since no client has actually been confirmed that the proposal has 
been committed. This is detailed in the paper.

                  
> Support CAS
> -----------
>
>                 Key: CASSANDRA-5062
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5062
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: API, Core
>            Reporter: Jonathan Ellis
>             Fix For: 2.0
>
>
> "Strong" consistency is not enough to prevent race conditions.  The classic 
> example is user account creation: we want to ensure usernames are unique, so 
> we only want to signal account creation success if nobody else has created 
> the account yet.  But naive read-then-write allows clients to race and both 
> think they have a green light to create.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to