[ 
https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17126937#comment-17126937
 ] 

Sylvain Lebresne commented on CASSANDRA-12126:
----------------------------------------------

bq. The test cases I provided demonstrate several consistency violations during 
range movements.

Yes, sorry I hadn't read them before commenting. And I certainly agree those 
are problematic (I was about to open a ticket so it's tracked, but I'd say 
CASSANDRA-15745 kind of cover those).

bq. There are also (more debatably) issues with TTL on system.paxos

Agreed this has always been a weak point. It does feel somewhat separated of 
other consistency points though, and maybe short term we can just offer a way 
to override the TTL (with documentation on the tradeoffs involved)?

bq. Also, mixing LOCAL_SERIAL and SERIAL is entirely unsafe

Yeah. I'm not sure how to fix that one without a breaking API change though 
(namely, limiting their unrestricted use together). It's not "that" different 
from the fact we allow unrestricted mixing of serial and non-serial operations. 
 Which is something I don't like and I'm happy to discuss moving forward, but 
imo post-3.X material in the best of cases.

bq. I think it is worth considering if we should instead aggressively try to 
remedy all of the known issues, have a strong verification push, and then roll 
out all of the changes at-once - including a fix for this that does not regress 
performance.

It is certainly an option worth bringing, and thank you for that. I'm not sure 
how to really know what is the best option though, so I can only offer my 
current opinion.

Which is that I feel this issue is a very serious issue. And I don't mean that 
in a way that diminishes the seriousness of the other problems you mentioned, I 
mean that in absolute terms (the range movement issues are also fairly bad imo 
for instance). But leaving less of our known serious unaddressed feels better 
than not, so I'd personally prefer fixing that issue ASAP. Basically, I'm 
worried that waiting for a more all-encompassing fix might take us quite some 
time, with no absolute guarantee that we'll be collectively at ease with 
pushing that to 3.X.

Anyway, I'd like to move this forward personally. How do we decide if we do?


> CAS Reads Inconsistencies 
> --------------------------
>
>                 Key: CASSANDRA-12126
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12126
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Feature/Lightweight Transactions, Legacy/Coordination
>            Reporter: Sankalp Kohli
>            Assignee: Sylvain Lebresne
>            Priority: Normal
>              Labels: LWT, pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> While looking at the CAS code in Cassandra, I found a potential issue with 
> CAS Reads. Here is how it can happen with RF=3
> 1) You issue a CAS Write and it fails in the propose phase. A machine replies 
> true to a propose and saves the commit in accepted filed. The other two 
> machines B and C does not get to the accept phase. 
> Current state is that machine A has this commit in paxos table as accepted 
> but not committed and B and C does not. 
> 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the 
> value written in step 1. This step is as if nothing is inflight. 
> 3) Issue another CAS Read and it goes to A and B. Now we will discover that 
> there is something inflight from A and will propose and commit it with the 
> current ballot. Now we can read the value written in step 1 as part of this 
> CAS read.
> If we skip step 3 and instead run step 4, we will never learn about value 
> written in step 1. 
> 4. Issue a CAS Write and it involves only B and C. This will succeed and 
> commit a different value than step 1. Step 1 value will never be seen again 
> and was never seen before. 
> If you read the Lamport “paxos made simple” paper and read section 2.3. It 
> talks about this issue which is how learners can find out if majority of the 
> acceptors have accepted the proposal. 
> In step 3, it is correct that we propose the value again since we dont know 
> if it was accepted by majority of acceptors. When we ask majority of 
> acceptors, and more than one acceptors but not majority has something in 
> flight, we have no way of knowing if it is accepted by majority of acceptors. 
> So this behavior is correct. 
> However we need to fix step 2, since it caused reads to not be linearizable 
> with respect to writes and other reads. In this case, we know that majority 
> of acceptors have no inflight commit which means we have majority that 
> nothing was accepted by majority. I think we should run a propose step here 
> with empty commit and that will cause write written in step 1 to not be 
> visible ever after. 
> With this fix, we will either see data written in step 1 on next serial read 
> or will never see it which is what we want. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to