[ https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17117615#comment-17117615 ]
Sylvain Lebresne commented on CASSANDRA-12126: ---------------------------------------------- {quote}I think we should include a flag to disable the fix {quote} The option of having a flag occurred to me, but I rejected it initially because I continue to believe the current behavior is wrong (a moral judgment, I guess) and in principle, having a "please, make my database broken" flag does not feel like a good idea. But I reckon that it _may_ exists advanced users that did noticed the lack of linearizability for reads and effectively built around it knowingly, for which the performance impact may be considered a regression with no upside (but if you sense skepticism on my part when reading that sentence, you're radar is not completely off). And as we're talking minor upgrade here, I'm amenable to such flag, though I'd prefer making it clear somehow that it is unsafe/risky and something we may remove in the future with no particular warning. {quote}It would be good to have a test for that as well. {quote} Certainly, good point, I can add the 2 missing interleaving. {quote}do we actually claim our consistency properties are for SERIAL? {quote} While our official doc on the matter is certainly lacking (not spelling much guarantee at all afaict, and I'm happy to piggy-back on this ticket to correct that), we've always implied linearizability. I have, at least, and I'm sure I can dig up other doing it as well on the mailing list if necessary. We did this both by throwing the linearizable word out from time to time, but also by repeatedly recommending that when a write times out, one needs to issue a SERIAL read to 'observe' if that write went through or not (and as an aside, if you can't rely on either reads or non-applying CAS for that, I'm not even sure how to use LWTs, except maybe for excessively specific cases). {quote}perhaps we should instead introduce a new STRICT_SERIAL consistency level {quote} I'm rather cold on that because, tbh. I think non-strict serializability is a theoretical notion that is useless in practice and that it is something we should not offer. And I'd rather avoid one more "feature" for which we spend our time saying "don't use it". {quote}I've pushed various test cases {quote} Awesome, thanks. I'll look at integrating those in the branch if you don't mind. > CAS Reads Inconsistencies > -------------------------- > > Key: CASSANDRA-12126 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12126 > Project: Cassandra > Issue Type: Bug > Components: Feature/Lightweight Transactions, Legacy/Coordination > Reporter: Sankalp Kohli > Assignee: Sylvain Lebresne > Priority: Normal > Labels: LWT, pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > While looking at the CAS code in Cassandra, I found a potential issue with > CAS Reads. Here is how it can happen with RF=3 > 1) You issue a CAS Write and it fails in the propose phase. A machine replies > true to a propose and saves the commit in accepted filed. The other two > machines B and C does not get to the accept phase. > Current state is that machine A has this commit in paxos table as accepted > but not committed and B and C does not. > 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the > value written in step 1. This step is as if nothing is inflight. > 3) Issue another CAS Read and it goes to A and B. Now we will discover that > there is something inflight from A and will propose and commit it with the > current ballot. Now we can read the value written in step 1 as part of this > CAS read. > If we skip step 3 and instead run step 4, we will never learn about value > written in step 1. > 4. Issue a CAS Write and it involves only B and C. This will succeed and > commit a different value than step 1. Step 1 value will never be seen again > and was never seen before. > If you read the Lamport “paxos made simple” paper and read section 2.3. It > talks about this issue which is how learners can find out if majority of the > acceptors have accepted the proposal. > In step 3, it is correct that we propose the value again since we dont know > if it was accepted by majority of acceptors. When we ask majority of > acceptors, and more than one acceptors but not majority has something in > flight, we have no way of knowing if it is accepted by majority of acceptors. > So this behavior is correct. > However we need to fix step 2, since it caused reads to not be linearizable > with respect to writes and other reads. In this case, we know that majority > of acceptors have no inflight commit which means we have majority that > nothing was accepted by majority. I think we should run a propose step here > with empty commit and that will cause write written in step 1 to not be > visible ever after. > With this fix, we will either see data written in step 1 on next serial read > or will never see it which is what we want. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org