[ 
https://issues.apache.org/jira/browse/CASSANDRA-15543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17053578#comment-17053578
 ] 

Kevin Gallardo edited comment on CASSANDRA-15543 at 3/6/20, 4:36 PM:
---------------------------------------------------------------------

I have been looking into this issue and trying to reproduce it locally. Did not 
have much luck, re-executed the test about 2500 times (let it run overnight) 
and only managed to reproduce something ~inconsistent in one case.

The test setup is: execute a schema alteration request on 1 node, don't let the 
node propagate the schema changes to other nodes, then execute a select at 
Consistency = ALL on that same node and expect a failure, because the 2 other 
nodes did not get the schema alteration request.

So the message we expect from the Select request would be the following:
{code:java}
Operation failed - received 1 responses and 2 failures: INCOMPATIBLE_SCHEMA 
from 127.0.0.3:7012, INCOMPATIBLE_SCHEMA from 127.0.0.2:7012{code}
Node1's response is successful (because it has the schema changes), but all the 
others fail because they have a stale schema.

The closest I got to reproducing something inconsistent to this was:
{code:java}
Operation failed - received 0 responses and 2 failures: INCOMPATIBLE_SCHEMA 
from 127.0.0.3:7012, INCOMPATIBLE_SCHEMA from 127.0.0.2:7012{code}
But also let's consider the ones Brandon had observed.

I have looked into some of the places where {{ReadFailureException}} are 
generated, and landed in the 
[{{ReadCallback}}|https://github.com/apache/cassandra/blob/aed15bfb01e753f9bac8b149b382c7a7c8d33183/src/java/org/apache/cassandra/service/reads/ReadCallback.java#L162]
 class.

The logic of the code is:
- call {{awaitResults()}}
- in {{awaitResults()}}:
-- block on {{await()}}
-- when {{await()}} exits, check what responses we got and whether errors came 
back

Also looked into {{onFailure()}} method, the logic is the following:
{code:java}
int n = waitingFor(from)
      ? failuresUpdater.incrementAndGet(this)
      : failures;

failureReasonByEndpoint.put(from, failureReason);

if (blockFor + n > replicaPlan().contacts().size()) {
    condition.signalAll(); // this signals as soon as 1 failure happens
}
{code}
{{waitingFor(from)}} is supposed to filter out errors coming from non local 
nodes it seems. In this test it always returns true, for simplicity we'll 
consider it is.
 Same for {{blockFor}}, and {{replicaPlan().contacts().size()}} these are both 
{{3}} for the test, I supposed these may differ in some use case.

As we can see above once 1 failure happens, the {{condition}} will be signaled 
to unblock waiting callers. When {{awaitResults()}} unblocks, it checks the 
value of {{failures}} to see how many errors happened and report it as a 
{{ReadFailureException}}.

The code is designed so that the {{condition}} is freed when either all 
responses are successful, or one failure happened.

Now my understanding of this is that, on certain circumstances, we end up 
returning an error message as soon as 1 error happens, and not wait for all 
responses to come back, and the values of {{failures}} and {{received}} are not 
guaranteed to be at a certain value, except we know in case of a failure that 
{{failures}} will be > 0.

I am unclear on what the expected behavior should be here. 
>From my POV, I would think the most reasonable logic would be that {{await()}} 
>only exits once all responses have come back, failed or successful, or it 
>times out. Therefore I would replace the {{"SimpleCondition condition"}} with 
>a {{CountDownLatch}} initialized to the value of the number of expected 
>responses, and {{countDown()}} on {{onReceived()}} and {{onFailure()}} and 
>have {{await()}} use that countDownLatch.

If the behavior of failing as soon as 1 failure happens is preferred, then the 
logic can stay that way, but the test needs to be modified to account for the 
possibility that only 1 error and No successful response may be returned in the 
{{ReadFailureException}}.

[~brandon.williams] [~e.dimitrova] got any insights?


was (Author: newkek):
I have been looking into this issue and trying to reproduce it locally. Did not 
have much luck, re-executed the test about 2500 times (let it run overnight) 
and only managed to reproduce something ~inconsistent in one case.

The test setup is: execute a schema alteration request on 1 node, don't let the 
node propagate the schema changes to other nodes, then execute a select at 
Consistency = ALL on that same node and expect a failure, because the 2 other 
nodes did not get the schema alteration request.

So the message we expect from the Select request would be the following:
{code:java}
Operation failed - received 1 responses and 2 failures: INCOMPATIBLE_SCHEMA 
from 127.0.0.3:7012, INCOMPATIBLE_SCHEMA from 127.0.0.2:7012{code}
Node1's response is successful (because it has the schema changes), but all the 
others fail because they have a stale schema.

The closest I got to reproducing something inconsistent to this was:
{code:java}
Operation failed - received 0 responses and 2 failures: INCOMPATIBLE_SCHEMA 
from 127.0.0.3:7012, INCOMPATIBLE_SCHEMA from 127.0.0.2:7012{code}
But also let's consider the ones Brandon had observed.

I have looked into some of the places where {{ReadFailureException}} are 
generated, and landed in the 
[{{ReadCallback}}|https://github.com/apache/cassandra/blob/aed15bfb01e753f9bac8b149b382c7a7c8d33183/src/java/org/apache/cassandra/service/reads/ReadCallback.java#L162]
 class.

The logic of the code is:
- call {{awaitResults()}}
- in {{awaitResults()}}:
-- block on {{await()}}
-- when {{await()}} exits, check what responses we got and whether errors came 
back

Also looked into {{onFailure()}} method, the logic is the following:
{code:java}
int n = waitingFor(from)
      ? failuresUpdater.incrementAndGet(this)
      : failures;

failureReasonByEndpoint.put(from, failureReason);

if (blockFor + n > replicaPlan().contacts().size()) {
    condition.signalAll(); // this signals as soon as 1 failure happens
}
{code}
{{waitingFor(from)}} is supposed to filter out errors coming from non local 
nodes it seems. In this test it always returns true, for simplicity we'll 
consider it is.
 Same for {{blockFor}}, and {{replicaPlan().contacts().size()}} these are both 
{{3}} for the test, I supposed these may differ in some use case.

As we can see above once 1 failure happens, the {{condition}} will be signaled 
to unblock waiting callers. When {{awaitResults()}} unblocks, it checks the 
value of {{failures}} to see how many errors happened and report it as a 
{{ReadFailureException}}.

The code is designed so that the {{condition}} is freed when either all 
responses are successful, or one failure happened.

Now my understanding of this is that, on certain circumstances, we end up 
returning an error message as soon as 1 error happens, and not wait for all 
responses to come back, and the values of {{failures}} and {{received}} are not 
guaranteed to be at a certain value, except we know in case of a failure that 
{{failures}} will be > 0.

I am unclear on what the expected behavior should be here. 
>From my POV, I would think the most reasonable logic would be that {{await()}} 
>only exits once all responses have come back, failed or successful, or it 
>times out. Therefore I would replace the {{SimpleCondition condition}} with a 
>{{CountDownLatch}} initialized to the value of the number of expected 
>responses, and {{countDown()}} on {{onReceived}} and {{onFailure}} and have 
>{{await()}} use that countDownLatch.

If the behavior of failing as soon as 1 failure happens is preferred, then the 
logic can stay that way, but the test needs to be modified to account for the 
possibility that only 1 error and No successful response may be returned in the 
`ReadFailureException`.

[~brandon.williams] [~e.dimitrova] got any insights?

> flaky test 
> org.apache.cassandra.distributed.test.SimpleReadWriteTest.readWithSchemaDisagreement
> -----------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-15543
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15543
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Test/dtest
>            Reporter: David Capwell
>            Assignee: Kevin Gallardo
>            Priority: Normal
>             Fix For: 4.0-alpha
>
>
> This fails infrequently, last seen failure was on java 8
> {code}
> junit.framework.AssertionFailedError
>       at 
> org.apache.cassandra.distributed.test.DistributedReadWritePathTest.readWithSchemaDisagreement(DistributedReadWritePathTest.java:276)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to