[ https://issues.apache.org/jira/browse/CASSANDRA-15666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17070970#comment-17070970 ]
ZhaoYang edited comment on CASSANDRA-15666 at 3/30/20, 6:47 PM: ---------------------------------------------------------------- {quote}1) Only the "follower" is allowed to send the CompleteMessage. 2) Only the "initiator" is allowed to close the session and its channels after receiving the CompleteMessage. {quote} At the end, I reverted the above implementation, as I think it doesn't fix the racy cases here. Improving synchronization on "maybeCompleted()" should be sufficient to avoid the StreamSession racy cases and less invasive for 4.0.. [Patch|https://github.com/apache/cassandra/pull/497]: * Make sure {{maybeCompleted()}} is executed at most once as it may be triggered by different threads asynchronously. * Synchronize {{maybeCompleted()}} so that it won't race with itself or {{complete()}}. * Added {{MessageStateSink}} in {{StreamSession}} to capture received stream messages and state transitions to improve testing. * Unified terminologies to use {{Initiator}} and {{Follower}}. * Ported some simplifications from CASSANDRA-15754 here: ** Simplify StreamSession with State.isFinalState ** Remove unused StreamSession from StreamMessage.Serializer.deserialize ** Change QueryState to use ClientState instead of clientState [~blerer] [~sbtourist] WDYT? was (Author: jasonstack): {quote}1) Only the "follower" is allowed to send the CompleteMessage. 2) Only the "initiator" is allowed to close the session and its channels after receiving the CompleteMessage. {quote} At the end, I reverted the above implementation, as I think it may cause backward compatibility issue on cross-version streaming and doesn't fix the racy cases here. Improving synchronization on "maybeCompleted()" should be sufficient to avoid the StreamSession racy cases and less invasive for 4.0.. [Patch|https://github.com/apache/cassandra/pull/497]: * Make sure {{maybeCompleted()}} is executed at most once as it may be triggered by different threads asynchronously. * Synchronize {{maybeCompleted()}} so that it won't race with itself or {{complete()}}. * Added {{MessageStateSink}} in {{StreamSession}} to capture received stream messages and state transitions to improve testing. * Unified terminologies to use {{Initiator}} and {{Follower}}. * Ported some simplifications from CASSANDRA-15754 here: ** Simplify StreamSession with State.isFinalState ** Remove unused StreamSession from StreamMessage.Serializer.deserialize ** Change QueryState to use ClientState instead of clientState [~blerer] [~sbtourist] WDYT? > Race condition when completing stream sessions > ---------------------------------------------- > > Key: CASSANDRA-15666 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15666 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Streaming and Messaging > Reporter: Sergio Bossa > Assignee: ZhaoYang > Priority: Normal > Fix For: 4.0 > > > {{StreamSession#prepareAsync()}} executes, as the name implies, > asynchronously from the IO thread: this opens up for race conditions between > the sending of the {{PrepareSynAckMessage}} and the call to > {{StreamSession#maybeCompleted()}}. I.e., the following could happen: > 1) Node A sends {{PrepareSynAckMessage}} from the {{prepareAsync()}} thread. > 2) Node B receives it and starts streaming. > 3) Node A receives the streamed file and sends {{ReceivedMessage}}. > 4) At this point, if this was the only file to stream, both nodes are ready > to close the session via {{maybeCompleted()}}, but: > a) Node A will call it twice from both the IO thread and the thread at #1, > closing the session and its channels. > b) Node B will attempt to send a {{CompleteMessage}}, but will fail because > the session has been closed in the meantime. > There are other subtle variations of the pattern above, depending on the > order of concurrently sent/received messages. > I believe the best fix would be to modify the message exchange so that: > 1) Only the "follower" is allowed to send the {{CompleteMessage}}. > 2) Only the "initiator" is allowed to close the session and its channels > after receiving the {{CompleteMessage}}. > By doing so, the message exchange logic would be easier to reason about, > which is overall a win anyway. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org