[ 
https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16630850#comment-16630850
 ] 

Jeffrey F. Lukman commented on CASSANDRA-12126:
-----------------------------------------------

Thank you for your responses, [~jjordan] and [~kohlisankalp].
I think you have cleared up some misunderstandings for me (and our team) where 
timeout is a "gray area" for the client 
to determine whether a request has been successfully processed.

One thing that we would like to point out maybe, based on the early discussion 
in this bug description, quote
{quote}However we need to fix step 2, since it caused reads to not be 
linearizable with respect to writes and other reads. In this case, we know that 
majority of acceptors have no inflight commit which means we have majority that 
nothing was accepted by majority. I think we should run a propose step here 
with empty commit and that will cause write written in step 1 to not be visible 
ever after.
{quote}
What we tried to mimic with our model checker in the beginning actually was 
this scenario where node Y saw that the majority of nodes do not have 
inProgress value, but then suddenly node Z saw that there is an inProgress 
value from node X and tried to repair and commit it.
So, we confirm that we can also see this behavior:
{quote}2: Read -> Nothing
3: Read -> Something
{quote}
We read nothing in node Y, yet node Z read something in the next request.



To sum up, at least, our scenario explains this behavior: Node Y does not try 
to repair the Paxos because node X's prepare response comes last, therefore 
node Y ignores the node X's prepare response and based its decision to not 
repair the Paxos.
But in node Z's client request, node Z decides to repair the Paxos based on 
node X's existing inProgress value_1="A" because node X's prepare response 
comes early (1st or 2nd). Which cause an inconsistent reaction in some way 
between node Y and node Z (although this is correct based on the original Paxos 
algorithm).


A solution to avoid this inconsistent reactions from these two nodes maybe is 
for each node to decide whether to repair a Paxos or not based on the complete 
view of the alive nodes, therefore if the response X's comes last with an 
inProgress value, node Y will still repair the Paxos.

> CAS Reads Inconsistencies 
> --------------------------
>
>                 Key: CASSANDRA-12126
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12126
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Coordination
>            Reporter: sankalp kohli
>            Priority: Major
>              Labels: LWT
>
> While looking at the CAS code in Cassandra, I found a potential issue with 
> CAS Reads. Here is how it can happen with RF=3
> 1) You issue a CAS Write and it fails in the propose phase. A machine replies 
> true to a propose and saves the commit in accepted filed. The other two 
> machines B and C does not get to the accept phase. 
> Current state is that machine A has this commit in paxos table as accepted 
> but not committed and B and C does not. 
> 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the 
> value written in step 1. This step is as if nothing is inflight. 
> 3) Issue another CAS Read and it goes to A and B. Now we will discover that 
> there is something inflight from A and will propose and commit it with the 
> current ballot. Now we can read the value written in step 1 as part of this 
> CAS read.
> If we skip step 3 and instead run step 4, we will never learn about value 
> written in step 1. 
> 4. Issue a CAS Write and it involves only B and C. This will succeed and 
> commit a different value than step 1. Step 1 value will never be seen again 
> and was never seen before. 
> If you read the Lamport “paxos made simple” paper and read section 2.3. It 
> talks about this issue which is how learners can find out if majority of the 
> acceptors have accepted the proposal. 
> In step 3, it is correct that we propose the value again since we dont know 
> if it was accepted by majority of acceptors. When we ask majority of 
> acceptors, and more than one acceptors but not majority has something in 
> flight, we have no way of knowing if it is accepted by majority of acceptors. 
> So this behavior is correct. 
> However we need to fix step 2, since it caused reads to not be linearizable 
> with respect to writes and other reads. In this case, we know that majority 
> of acceptors have no inflight commit which means we have majority that 
> nothing was accepted by majority. I think we should run a propose step here 
> with empty commit and that will cause write written in step 1 to not be 
> visible ever after. 
> With this fix, we will either see data written in step 1 on next serial read 
> or will never see it which is what we want. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to