[ 
https://issues.apache.org/jira/browse/CASSANDRA-5062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13586290#comment-13586290
 ] 

Cristian Opris commented on CASSANDRA-5062:
-------------------------------------------

This shouldn't be too complicated with Paxos leader election very similar to 
Spinnaker

I don't think it requires changing the read/write paths at the lower level, at 
least not significantly.

Assume for the sake of simplicity that we use a column prefix to encode the 
version

The leader elected should always be the one that has the latest version.

This allows the leader to perform read-modify-write (conditional update) 
locally and do a simple quorum write to propagate that if successful.

The leader can also increment the version sequentially.

Conflicting writes from other replicas cannot succeed because any node that 
wants to write needs to get itself elected reader first.

Since we do quorum writes not all replicas will have the full sequence of 
versions but regular anti-entropy (read-repair) on quorum reads should take 
care of that.
  
If the leader fails the newly elected leader necessarily will be the one that 
has the latest write so it can continue to do cas locally.

Anti-entropy should also take care of recovery and catch-up of a replica just 
like now.

I believe this can all be done on top of existing functionality without major 
changes to read/write paths

You could also reuse the Zab algorithm from ZK for expediency without using 
having to use the entire 
ZK codebase.



                
> Support CAS
> -----------
>
>                 Key: CASSANDRA-5062
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5062
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: API, Core
>            Reporter: Jonathan Ellis
>             Fix For: 2.0
>
>
> "Strong" consistency is not enough to prevent race conditions.  The classic 
> example is user account creation: we want to ensure usernames are unique, so 
> we only want to signal account creation success if nobody else has created 
> the account yet.  But naive read-then-write allows clients to race and both 
> think they have a green light to create.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to