[jira] [Commented] (CASSANDRA-16807) Weak visibility guarantees of Accumulator lead to failed assertions during digest comparison

Benedict Elliott Smith (Jira) Tue, 20 Jul 2021 01:05:15 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-16807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17383866#comment-17383866
 ]


Benedict Elliott Smith commented on CASSANDRA-16807:
----------------------------------------------------

So, the only thing that has me worried is the code in {{waitingFor}}, which 
implies we might receive responses we don't require for our consistency level. 
I _think_ this is dead code and should be removed, and we should assert that we 
have only contacted relevant hosts. If, however, it isn't dead code then this 
fix would be insufficient, as we could have inserted these first, and may only 
have those visible. Only writes should contact replicas not involved in 
consistency, however, and I did perform a cursory check this is the case here.

So, I'd suggest:

1. Verify we do not send queries to replicas we don't want responses from
2. Do not pre-process responses we aren't {{waitingFor}}; or, assert we are 
{{waitingFor}} the commands 
3. Only test that {{resolver.getMessages().size() >= blockFor}}
4. Maybe remove {{received}} altogether, in favour of 
{{resolver.getMessages().size()}}

> Weak visibility guarantees of Accumulator lead to failed assertions during 
> digest comparison
> --------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-16807
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16807
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Consistency/Coordination
>            Reporter: Caleb Rackliffe
>            Assignee: Caleb Rackliffe
>            Priority: Normal
>             Fix For: 4.0-rc, 4.0.x, 4.x
>
>          Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> This problem could manifest on all versions, beginning on at least 3.0, but 
> I’ll focus on the way it manifests in 4.0 here.
> In what now seems like a wise move, CASSANDRA-16097 added an assertion to 
> {{DigestResolver#responseMatch()}} that ensures the responses snapshot has at 
> least one visible elements to compare (although of course only one element 
> trivially cannot generate a mismatch and short-circuits immediately). 
> However, at the point {{ReadCallback#onResponse()}} signals the waiting 
> resolver, there is no guarantee that the size of the generated snapshot of 
> the responses {{Accumulator}} is non-zero, or perhaps more worryingly, at 
> least equal to the number of blocked-for responses. This seems to be a 
> consequence of the documented weak visibility guarantees on 
> {{Accumulator#add()}}. In short, if there are concurrent invocations of 
> add(), is it not guaranteed that there is any visible size change after any 
> one of them return, but only after all complete.
> The particular exception looks something like this:
> {noformat}
> java.lang.AssertionError: Attempted response match comparison while no 
> responses have been received.
>       at 
> org.apache.cassandra.service.reads.DigestResolver.responsesMatch(DigestResolver.java:110)
>       at 
> org.apache.cassandra.service.reads.AbstractReadExecutor.awaitResponses(AbstractReadExecutor.java:393)
>       at 
> org.apache.cassandra.service.StorageProxy.fetchRows(StorageProxy.java:2150)
>       at 
> org.apache.cassandra.service.StorageProxy.readRegular(StorageProxy.java:1979)
>       at 
> org.apache.cassandra.service.StorageProxy.read(StorageProxy.java:1882)
>       at 
> org.apache.cassandra.db.SinglePartitionReadCommand$Group.execute(SinglePartitionReadCommand.java:1121)
>       at 
> org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:296)
>       at 
> org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:248)
>       at 
> org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:90)
> {noformat}
> It’s possible to reproduce this on simple single-partition reads without any 
> short-read protection or replica filtering protection. I’ve also been able to 
> reproduce this synthetically with [a unit 
> test|https://github.com/apache/cassandra/pull/1110] on {{ReadCallback}}.
> It seems like the most straightforward way to fix this would be to avoid 
> signaling in {{ReadCallback#onResponse()}} until the visible size of the 
> accumulator is at least the number of received responses. In most cases, this 
> is trivially true, and our signaling behavior won’t change at all. In the 
> very rare case that there are two (or more) concurrent calls to 
> {{onResponse()}}, the second (or last) will signal, and having one more 
> response than we strictly need should have no negative side effects. (We 
> don’t seem to make any strict assertions about having exactly the number of 
> required responses, only that we have enough.)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-16807) Weak visibility guarantees of Accumulator lead to failed assertions during digest comparison

Reply via email to