[ https://issues.apache.org/jira/browse/CASSANDRA-5062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13588303#comment-13588303 ]
Sergio Bossa commented on CASSANDRA-5062: ----------------------------------------- [~jbellis] {quote}This is not correct for Paxos. (Not sufficiently familiar with ZAB to comment there){quote} Right, I was talking about Zab, which does that exactly for improving liveness and performance. {quote}What does this 2PC-that-avoids-lost-acks look like?{quote} Well, given my lack of familiarity with Cassandra internals, I may be missing something here, so let's be clear about the lost-ack problem: my understanding of lost-ack is about what happens when the coordinator node sends a QUORUM request and fails before getting the ack back, causing uncertainty about the request status. So please correct me if I'm wrong here. But stated this way, this problem can be overcame with Zab-like 2PC: once the coordinator gets the acks from the prepare phase, it can commit without having to wait for all acks, because only committed values with the highest "commit id" will be (QUORUM) read. Then: 1) If the coordinator fails during the prepare phase (lost ack), nothing will be committed, hence the previous committed value will be read, and if it will be hinted/repaired, it will just be a tentative value. 2) If the coordinator fails after sending commits, the coordinator with the highest commit id will take over and "realign" followers. 3) If a partition happens, the coordinator with the minority of followers will refuse to operate CAS (Paxos would behave exactly the same here). Does it make sense to you? Obviously I may be missing some corner case, and above all, I'm not sure about how comfortably this could be implemented in Cassandra (lack of knowledge again), so take my comments just as food for thoughts. > Support CAS > ----------- > > Key: CASSANDRA-5062 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5062 > Project: Cassandra > Issue Type: New Feature > Components: API, Core > Reporter: Jonathan Ellis > Fix For: 2.0 > > > "Strong" consistency is not enough to prevent race conditions. The classic > example is user account creation: we want to ensure usernames are unique, so > we only want to signal account creation success if nobody else has created > the account yet. But naive read-then-write allows clients to race and both > think they have a green light to create. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira