[ https://issues.apache.org/jira/browse/CASSANDRA-17116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17464183#comment-17464183 ]
Francisco Guerrero commented on CASSANDRA-17116: ------------------------------------------------ [~djoshi] and I took another look at this, it seems that the race is happening in {{org.apache.cassandra.streaming.StreamSession#maybeCompleted()}}. The race is most likely happening in the code block below. {code:java} channel.sendControlMessage(new CompleteMessage()); closeSession(State.COMPLETE); {code} The {{channel.sendControlMessage}} call returns a future and we immediately close the session without waiting for the future to execute. In the majority of cases, the message will be delivered on time, Network delays/system load/thread scheduling can cause the {{CompleteMessage}} to be sent/received after the session has been closed triggering the {{java.nio.channels.ClosedChannelException}}. A potential solution is to add a listener for the future, and only then close the session. {code:java} Future<?> messageFuture = channel.sendControlMessage(new CompleteMessage()); messageFuture.addListener(f -> closeSession(State.COMPLETE)); {code} > When zero-copy-streaming sees a channel close this triggers the disk failure > policy > ----------------------------------------------------------------------------------- > > Key: CASSANDRA-17116 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17116 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Streaming > Reporter: David Capwell > Assignee: David Capwell > Priority: Normal > Fix For: 4.x > > > Found in CASSANDRA-17085. > https://app.circleci.com/pipelines/github/dcapwell/cassandra/1069/workflows/26b7b83a-686f-4516-a56a-0709d428d4f2/jobs/7264 > https://app.circleci.com/pipelines/github/dcapwell/cassandra/1069/workflows/26b7b83a-686f-4516-a56a-0709d428d4f2/jobs/7256 > {code} > ERROR [Stream-Deserializer-/127.0.0.1:7000-f2eb1a15] 2021-11-02 21:35:40,983 > DefaultFSErrorHandler.java:104 - Exiting forcefully due to file system > exception on startup, disk failure policy "stop" > org.apache.cassandra.io.FSWriteError: java.nio.channels.ClosedChannelException > at > org.apache.cassandra.io.sstable.format.big.BigTableZeroCopyWriter.write(BigTableZeroCopyWriter.java:227) > at > org.apache.cassandra.io.sstable.format.big.BigTableZeroCopyWriter.writeComponent(BigTableZeroCopyWriter.java:206) > at > org.apache.cassandra.db.streaming.CassandraEntireSSTableStreamReader.read(CassandraEntireSSTableStreamReader.java:125) > at > org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:84) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:51) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:37) > at > org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:50) > at > org.apache.cassandra.streaming.StreamDeserializingTask.run(StreamDeserializingTask.java:62) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.nio.channels.ClosedChannelException: null > at > org.apache.cassandra.net.AsyncStreamingInputPlus.reBuffer(AsyncStreamingInputPlus.java:136) > at > org.apache.cassandra.net.AsyncStreamingInputPlus.consume(AsyncStreamingInputPlus.java:155) > at > org.apache.cassandra.io.sstable.format.big.BigTableZeroCopyWriter.write(BigTableZeroCopyWriter.java:217) > ... 9 common frames omitted > {code} > When bootstrap fails and streaming is closed, this triggers the disk failure > policy which causes the JVM to halt by default (if this happens outside of > bootstrap, then we stop transports and keep the JVM up). > org.apache.cassandra.streaming.StreamDeserializingTask attempts to handle > this by ignoring this exception, but the call to > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize > Does try/catch and inspects exception; triggering this condition. -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org