Stefan Miklosovic created CASSANDRA-20135:
---------------------------------------------
Summary: Assertion errors on CheckForAbort /
QueryCancellationChecker on multiple calls of applyToPartition
Key: CASSANDRA-20135
URL: https://issues.apache.org/jira/browse/CASSANDRA-20135
Project: Apache Cassandra
Issue Type: Bug
Components: Legacy/Core
Reporter: Stefan Miklosovic
We see there are assertion errors thrown in 4.1 at least in
StoppingTransformation like these:
{code}
java.lang.RuntimeException: java.lang.AssertionError
at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:108)
at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:45)
at
org.apache.cassandra.net.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:430)
at
org.apache.cassandra.concurrent.ExecutionFailure$1.run(ExecutionFailure.java:133)
at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:142)
at
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.AssertionError: null
at
org.apache.cassandra.db.transform.StoppingTransformation.attachTo(StoppingTransformation.java:72)
at org.apache.cassandra.db.transform.BaseRows.add(BaseRows.java:104)
at
org.apache.cassandra.db.transform.UnfilteredRows.add(UnfilteredRows.java:49)
at
org.apache.cassandra.db.transform.Transformation.add(Transformation.java:198)
at
org.apache.cassandra.db.transform.Transformation.apply(Transformation.java:140)
at
org.apache.cassandra.db.ReadCommand$CheckForAbort.applyToPartition(ReadCommand.java:616)
at
org.apache.cassandra.db.ReadCommand$CheckForAbort.applyToPartition(ReadCommand.java:604)
at
org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:97)
at
org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:303)
at
org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:201)
at
org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:186)
at
org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:48)
at
org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:337)
at
org.apache.cassandra.db.ReadCommandVerbHandler.doVerb(ReadCommandVerbHandler.java:63)
at
org.apache.cassandra.net.InboundSink.lambda$new$0(InboundSink.java:78)
at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:97)
... 6 common frames omitted
{code}
This does not make sense at first sight and it is quite a rabbit hole to go
throught. If you follow the stacktrace, you see that
{code}
Caused by: java.lang.AssertionError: null
at
org.apache.cassandra.db.transform.StoppingTransformation.attachTo(StoppingTransformation.java:72)
{code}
but ... why? It means that this (1) was called twice because that is the only
place where "this.rows" are ever updated in that class (and this.rows is
private) which means that _something_ has to call this twice in a row. Once it
sets it just fine and another time it goes to set it again but it fails as it
is not null anymore. Hence, the question is why is that set twice?
The reason is quite elaborative. "attachTo" which throws is ever called in
BaseRows#add(Transformation) (2) and just on the next line it calls
"super.add(transformation);" which adds that transformation at the end of a
stack in Stack class which BaseRows extends.
{code}
void add(Transformation add)
{
if (length == stack.length)
stack = resize(stack);
stack[length++] = add;
}
{code}
Next thing we see from the stacktrace is that CheckForAbort.applyToPartition is
calling Transformation.apply (3) and what that ultimately does is that it will
add itself, again, at the end of the stack (4).
When we look at that stacktrace as a whole, what it does is that it is
iterating over Unfiliteredpartition while building a local data response on a
read and as it does so, it calls "BasePartitions.hasNext". Now we are getting
to that ... (5). What "hasNext" is doing is that while this.next is null, it
will take the stack and it loops in while by taking "next" from "input" and it
applies all the transformations by calling "fs[i].applyToParition(next)".
So, there is a stack of transformations and they are called just one after
another until some result of "applyToPartition" does not return null. The chain
of transformations also include "CheckForAbort" transformation which we added
here (6) so what happens is that when we call "applyToPartitions" for the first
time on CheckForAbort, it will run just fine, but when that while loop / for
loop in BasePartitions is called _again_ (e.g. we are calling "hasNext" upon
iterating in UnfilteredPartitionIterators), then "applyPartitions" for
"CheckForAbort" will be called again as well. But CheckForAbort is doing this
(7).
{code}
protected UnfilteredRowIterator applyToPartition(UnfilteredRowIterator
partition)
{
if (maybeAbort())
{
partition.close();
return null;
}
return Transformation.apply(partition, this);
}
{code}
Check the last line where it applies itself when it is not aborted:
{code}
Transformation.apply(partition, this)
{code}
The application of this stopping transformation to given partition means that
it will add that transformation at the end of the stack as we already showed.
Then, we will iterate over that stack again upon iterating in BaseRows, which
eventually call "attachTo" for the second time, hence the assertion error.
I also see that CheckForAbort was replaced by 17810 (8) by
QueryCancellationChecker but except some "cosmetic" changes, the logic remains
the same. That stopping transformation is applying itself in applyToPartition
so I think that this problem is present in 5.0+ too. No transformation is
applying itself like that but this one.
I am not completely sure what we should do about this but two ideas are obvious:
1) We apply _new instance_ of QueryCancellationChecker / CheckForAbort so we
will never call attachTo on the _same_ instance
2) We will remove "assert" in attachTo in StoppingTransformation so we will
enable this to be called twice.
(1)
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/transform/StoppingTransformation.java#L72
(2)
https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apache/cassandra/db/transform/BaseRows.java#L104
(3)
https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apache/cassandra/db/ReadCommand.java#L627
(4)
https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apache/cassandra/db/transform/Transformation.java#L198
(5)
https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apache/cassandra/db/transform/BasePartitions.java#L87-L109
(6)
https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apache/cassandra/db/ReadCommand.java#L436
(7)
https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apache/cassandra/db/ReadCommand.java#L619-L628
(8)
https://github.com/apache/cassandra/commit/f4b69ba0e82bb051e56a92d792142034d9f617f0#diff-554e7dff38b500f5eaed0b9b651c7098c3f8a1bd4f6aca12063eab352e685b9fR690
cc [~jmckenzie] [~marcuse] [~aleksey]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]