[ https://issues.apache.org/jira/browse/CASSANDRA-19427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17821316#comment-17821316 ]
Abe Ratnofsky commented on CASSANDRA-19427: ------------------------------------------- Added comment, updated patch on all branches: 3.11: [https://github.com/apache/cassandra/pull/3142] 4.0: [https://github.com/apache/cassandra/pull/3143] 4.1: [https://github.com/apache/cassandra/pull/3144] 5.0: [https://github.com/apache/cassandra/pull/3145] trunk: [https://github.com/apache/cassandra/pull/3129] All the 3.11-5.0 patches are the same, then trunk is different since the introduction of CASSANDRA-18330 / TCM. > Fix concurrent access of ClientWarn causing AIOBE for SELECT WHERE IN queries > with multiple coordinator-local partitions > ------------------------------------------------------------------------------------------------------------------------ > > Key: CASSANDRA-19427 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19427 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Coordination, Legacy/Local Write-Read Paths > Reporter: Abe Ratnofsky > Assignee: Abe Ratnofsky > Priority: Normal > Fix For: 3.11.x, 4.0.x, 4.1.x, 5.0.x, 5.x > > Time Spent: 50m > Remaining Estimate: 0h > > On one of our clusters, we noticed rare but periodic > ArrayIndexOutOfBoundsExceptions: > > {code:java} > message="Uncaught exception on thread Thread[ReadStage-3,5,main]" > exception="java.lang.RuntimeException: > java.lang.ArrayIndexOutOfBoundsException > at > org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2579) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at > org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162) > at > org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:134) > at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:119) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.base/java.lang.Thread.run(Thread.java:829) > Caused by: java.lang.ArrayIndexOutOfBoundsException"{code} > > > The error was in a Runnable, so the stacktrace didn't directly indicate where > the error was coming from. We enabled JFR to log the underlying exception > that was thrown: > > {code:java} > message="Uncaught exception on thread Thread[ReadStage-2,5,main]" > exception="java.lang.RuntimeException: > java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds for length 0 > at > org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2579) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at > org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162) > at > org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:134) > at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:119) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.base/java.lang.Thread.run(Thread.java:829) > Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds > for length 0 > at java.base/java.util.ArrayList.add(ArrayList.java:487) > at java.base/java.util.ArrayList.add(ArrayList.java:499) > at org.apache.cassandra.service.ClientWarn$State.add(ClientWarn.java:84) > at > org.apache.cassandra.service.ClientWarn$State.access$000(ClientWarn.java:77) > at org.apache.cassandra.service.ClientWarn.warn(ClientWarn.java:51) > at > org.apache.cassandra.db.ReadCommand$1MetricRecording.onClose(ReadCommand.java:596) > at > org.apache.cassandra.db.transform.BasePartitions.runOnClose(BasePartitions.java:70) > at org.apache.cassandra.db.transform.BaseIterator.close(BaseIterator.java:95) > at > org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:2260) > at > org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2575) > ... 6 more"{code} > > > An AIOBE on ArrayList.add(E) should only be possible when multiple threads > attempt to call the method at the same time. > > This was seen while executing a SELECT WHERE IN query with multiple partition > keys. This exception could happen when multiple local reads are dispatched by > the coordinator in > org.apache.cassandra.service.reads.AbstractReadExecutor#makeRequests. In this > case, multiple local reads exceed the tombstone warning threshold, so > multiple tombstone warnings are added to the same ClientWarn.State reference. > Currently, org.apache.cassandra.service.ClientWarn.State#warnings is an > ArrayList, which isn't safe for concurrent modification, causing the AIOBE to > be thrown. > > I have a patch available for this, and I'm preparing it now. The patch is > simple - it just changes > org.apache.cassandra.service.ClientWarn.State#warnings to a thread-safe > CopyOnWriteArrayList. I also have a jvm-dtest that demonstrates the issue but > doesn't need to be merged - it shows how a SELECT WHERE IN query with local > reads that add client warnings can add to the same ClientWarn.State from > different threads. I'll push that in a separate branch just for demonstration > purposes. > > Demonstration branch: > [https://github.com/apache/cassandra/compare/trunk...aratno:cassandra:CASSANDRA-19427-aiobe-clientwarn-demo] > Fix branch: > [https://github.com/apache/cassandra/compare/trunk...aratno:cassandra:CASSANDRA-19427-aiobe-clientwarn-fix] > (PR linked below) > > This appears to have been an issue since at least 3.11, that was the earliest > release I checked. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org