[ 
https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17117652#comment-17117652
 ] 

Benedict Elliott Smith commented on CASSANDRA-12126:
----------------------------------------------------

bq. I'm rather cold on that because, tbh. I think non-strict serializability is 
a theoretical notion that is useless in practice and that it is something we 
should not offer. And I'd rather avoid one more "feature" for which we spend 
our time saying "don't use it".

Yeah, I'm very sympathetic to this view, and have always assumed 
linearizability with partitions as the object.  I'm just really trying to 
morally justify providing some time to fix this without any negative 
repercussions.  

Either way, we should definitely clarify what we mean by SERIAL in some 
official project documentation somewhere though.  We probably need to do so in 
terms of strict serializability as opposed to linearizability, so that it can 
be consistent with a future in which we support multi-partition transactions 
(which as a project we really need to deliver in the not-too-distant future).

bq. non-applying CAS for that

FWIW, I think this particular case is a no-brainer; there's no real cost to 
strengthening the semantics of non-applying CAS IMO, since users should 
anticipate their CAS operations will ordinarily take this long.  Whatever the 
conclusion of our discussion, I think we should apply a fix at least for the 
non-applying case immediately, and I do not believe any flag to disable this 
part of the fix is necessary.

Reads are trickier, because the user will see a significant performance penalty 
on patch version upgrade.  I'm sympathetic to the view we should just fix the 
read part immediately, performance regressions be damned.  But we do have other 
serious consistency violations that should also be fixed.  I think it is worth 
_considering_ if we should instead aggressively try to remedy all of the known 
issues, have a strong verification push, and then roll out all of the changes 
at-once - including a fix for this that does not regress performance.  It might 
seem a lot for a patch version, but I'm not sure risk is a concern when we know 
there are several serious problems today, and have been for years.

I'm not going to advocate super strongly for either approach, as I don't think 
there's a clear answer, I just want to raise the alternative as an option to 
expressly consider.

bq. Awesome, thanks. I'll look at integrating those in the branch if you don't 
mind.

Absolutely, that was my intention.

> CAS Reads Inconsistencies 
> --------------------------
>
>                 Key: CASSANDRA-12126
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12126
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Feature/Lightweight Transactions, Legacy/Coordination
>            Reporter: Sankalp Kohli
>            Assignee: Sylvain Lebresne
>            Priority: Normal
>              Labels: LWT, pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> While looking at the CAS code in Cassandra, I found a potential issue with 
> CAS Reads. Here is how it can happen with RF=3
> 1) You issue a CAS Write and it fails in the propose phase. A machine replies 
> true to a propose and saves the commit in accepted filed. The other two 
> machines B and C does not get to the accept phase. 
> Current state is that machine A has this commit in paxos table as accepted 
> but not committed and B and C does not. 
> 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the 
> value written in step 1. This step is as if nothing is inflight. 
> 3) Issue another CAS Read and it goes to A and B. Now we will discover that 
> there is something inflight from A and will propose and commit it with the 
> current ballot. Now we can read the value written in step 1 as part of this 
> CAS read.
> If we skip step 3 and instead run step 4, we will never learn about value 
> written in step 1. 
> 4. Issue a CAS Write and it involves only B and C. This will succeed and 
> commit a different value than step 1. Step 1 value will never be seen again 
> and was never seen before. 
> If you read the Lamport “paxos made simple” paper and read section 2.3. It 
> talks about this issue which is how learners can find out if majority of the 
> acceptors have accepted the proposal. 
> In step 3, it is correct that we propose the value again since we dont know 
> if it was accepted by majority of acceptors. When we ask majority of 
> acceptors, and more than one acceptors but not majority has something in 
> flight, we have no way of knowing if it is accepted by majority of acceptors. 
> So this behavior is correct. 
> However we need to fix step 2, since it caused reads to not be linearizable 
> with respect to writes and other reads. In this case, we know that majority 
> of acceptors have no inflight commit which means we have majority that 
> nothing was accepted by majority. I think we should run a propose step here 
> with empty commit and that will cause write written in step 1 to not be 
> visible ever after. 
> With this fix, we will either see data written in step 1 on next serial read 
> or will never see it which is what we want. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to