[ https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227449#comment-17227449 ]
Benedict Elliott Smith commented on CASSANDRA-12126: ---------------------------------------------------- Yes, that sounds like a great idea, and I really appreciate you offering to take that to the list. I'll chime in with any necessary details to help inform the decision, but will try not to influence it otherwise. I don't have a strong opinion about which of those four options we select, except that my experiments do suggest (3) is perhaps dangerous for some of our users. It's probably a trade-off that should be made with careful business consideration and experimentation by each end user. As far as delaying 4.0 is concerned, that's probably also a matter of community decision-making. We could quite quickly have a patch, that has been reviewed by multiple committers, posted in fairly short order - perhaps before we exit beta. This work will have had much greater validation than the current implementation, but publishing all of this validation work will take longer - likely also achievable before GA, but we might have to invert our process a little. Perhaps this is acceptable, given the balance of correctness and regression we're considering as an alternative, but given my proximity to the work (and that I also don't have a strong position either way), I would prefer to let others make that call. > CAS Reads Inconsistencies > -------------------------- > > Key: CASSANDRA-12126 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12126 > Project: Cassandra > Issue Type: Bug > Components: Feature/Lightweight Transactions, Legacy/Coordination > Reporter: Sankalp Kohli > Assignee: Sylvain Lebresne > Priority: Normal > Labels: LWT, pull-request-available > Fix For: 3.0.x, 3.11.x, 4.0-beta > > Time Spent: 10m > Remaining Estimate: 0h > > While looking at the CAS code in Cassandra, I found a potential issue with > CAS Reads. Here is how it can happen with RF=3 > 1) You issue a CAS Write and it fails in the propose phase. A machine replies > true to a propose and saves the commit in accepted filed. The other two > machines B and C does not get to the accept phase. > Current state is that machine A has this commit in paxos table as accepted > but not committed and B and C does not. > 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the > value written in step 1. This step is as if nothing is inflight. > 3) Issue another CAS Read and it goes to A and B. Now we will discover that > there is something inflight from A and will propose and commit it with the > current ballot. Now we can read the value written in step 1 as part of this > CAS read. > If we skip step 3 and instead run step 4, we will never learn about value > written in step 1. > 4. Issue a CAS Write and it involves only B and C. This will succeed and > commit a different value than step 1. Step 1 value will never be seen again > and was never seen before. > If you read the Lamport “paxos made simple” paper and read section 2.3. It > talks about this issue which is how learners can find out if majority of the > acceptors have accepted the proposal. > In step 3, it is correct that we propose the value again since we dont know > if it was accepted by majority of acceptors. When we ask majority of > acceptors, and more than one acceptors but not majority has something in > flight, we have no way of knowing if it is accepted by majority of acceptors. > So this behavior is correct. > However we need to fix step 2, since it caused reads to not be linearizable > with respect to writes and other reads. In this case, we know that majority > of acceptors have no inflight commit which means we have majority that > nothing was accepted by majority. I think we should run a propose step here > with empty commit and that will cause write written in step 1 to not be > visible ever after. > With this fix, we will either see data written in step 1 on next serial read > or will never see it which is what we want. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org