[ https://issues.apache.org/jira/browse/CASSANDRA-16807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17383303#comment-17383303 ]
Andres de la Peña commented on CASSANDRA-16807: ----------------------------------------------- The patch looks good to me, I have left some minor suggestions on the PR. > Weak visibility guarantees of Accumulator lead to failed assertions during > digest comparison > -------------------------------------------------------------------------------------------- > > Key: CASSANDRA-16807 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16807 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Coordination > Reporter: Caleb Rackliffe > Assignee: Caleb Rackliffe > Priority: Normal > Fix For: 4.0.x, 4.x > > Time Spent: 1h 10m > Remaining Estimate: 0h > > This problem could manifest on all versions, beginning on at least 3.0, but > I’ll focus on the way it manifests in 4.0 here. > In what now seems like a wise move, CASSANDRA-16097 added an assertion to > {{DigestResolver#responseMatch()}} that ensures the responses snapshot has at > least one visible elements to compare (although of course only one element > trivially cannot generate a mismatch and short-circuits immediately). > However, at the point {{ReadCallback#onResponse()}} signals the waiting > resolver, there is no guarantee that the size of the generated snapshot of > the responses {{Accumulator}} is non-zero, or perhaps more worryingly, at > least equal to the number of blocked-for responses. This seems to be a > consequence of the documented weak visibility guarantees on > {{Accumulator#add()}}. In short, if there are concurrent invocations of > add(), is it not guaranteed that there is any visible size change after any > one of them return, but only after all complete. > The particular exception looks something like this: > {noformat} > java.lang.AssertionError: Attempted response match comparison while no > responses have been received. > at > org.apache.cassandra.service.reads.DigestResolver.responsesMatch(DigestResolver.java:110) > at > org.apache.cassandra.service.reads.AbstractReadExecutor.awaitResponses(AbstractReadExecutor.java:393) > at > org.apache.cassandra.service.StorageProxy.fetchRows(StorageProxy.java:2150) > at > org.apache.cassandra.service.StorageProxy.readRegular(StorageProxy.java:1979) > at > org.apache.cassandra.service.StorageProxy.read(StorageProxy.java:1882) > at > org.apache.cassandra.db.SinglePartitionReadCommand$Group.execute(SinglePartitionReadCommand.java:1121) > at > org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:296) > at > org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:248) > at > org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:90) > {noformat} > It’s possible to reproduce this on simple single-partition reads without any > short-read protection or replica filtering protection. I’ve also been able to > reproduce this synthetically with [a unit > test|https://github.com/apache/cassandra/pull/1110] on {{ReadCallback}}. > It seems like the most straightforward way to fix this would be to avoid > signaling in {{ReadCallback#onResponse()}} until the visible size of the > accumulator is at least the number of received responses. In most cases, this > is trivially true, and our signaling behavior won’t change at all. In the > very rare case that there are two (or more) concurrent calls to > {{onResponse()}}, the second (or last) will signal, and having one more > response than we strictly need should have no negative side effects. (We > don’t seem to make any strict assertions about having exactly the number of > required responses, only that we have enough.) -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org