[jira] [Updated] (CASSANDRA-15229) BufferPool Regression
[ https://issues.apache.org/jira/browse/CASSANDRA-15229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhaoYang updated CASSANDRA-15229: - Attachment: 15229-unsafe.png 15229-direct.png > BufferPool Regression > - > > Key: CASSANDRA-15229 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15229 > Project: Cassandra > Issue Type: Bug > Components: Local/Caching >Reporter: Benedict Elliott Smith >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.0, 4.0-beta > > Attachments: 15229-count.png, 15229-direct.png, 15229-hit-rate.png, > 15229-recirculate-count.png, 15229-recirculate-hit-rate.png, > 15229-recirculate-size.png, 15229-recirculate.png, 15229-size.png, > 15229-unsafe.png > > > The BufferPool was never intended to be used for a {{ChunkCache}}, and we > need to either change our behaviour to handle uncorrelated lifetimes or use > something else. This is particularly important with the default chunk size > for compressed sstables being reduced. If we address the problem, we should > also utilise the BufferPool for native transport connections like we do for > internode messaging, and reduce the number of pooling solutions we employ. > Probably the best thing to do is to improve BufferPool’s behaviour when used > for things with uncorrelated lifetimes, which essentially boils down to > tracking those chunks that have not been freed and re-circulating them when > we run out of completely free blocks. We should probably also permit > instantiating separate {{BufferPool}}, so that we can insulate internode > messaging from the {{ChunkCache}}, or at least have separate memory bounds > for each, and only share fully-freed chunks. > With these improvements we can also safely increase the {{BufferPool}} chunk > size to 128KiB or 256KiB, to guarantee we can fit compressed pages and reduce > the amount of global coordination and per-allocation overhead. We don’t need > 1KiB granularity for allocations, nor 16 byte granularity for tiny > allocations. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-15666) Race condition when completing stream sessions
[ https://issues.apache.org/jira/browse/CASSANDRA-15666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17081858#comment-17081858 ] ZhaoYang edited comment on CASSANDRA-15666 at 4/13/20, 5:29 AM: |[patch|https://github.com/apache/cassandra/pull/497]|[dtest|https://github.com/apache/cassandra-dtest/pull/63]| Previous changes: * Synchronization on "StreamSession#maybeComplete()" to avoid race condition on streaming completion. * Only the "follower" is allowed to send the CompleteMessage. * Only the "initiator" is allowed to close the session and its channels after receiving the CompleteMessage. New changes to fix dtest failures: * NettyStreamingMessageSender: ** don't close channels in NettyStreamingMessageSender, as they will be closed by initator on "closeSession()" ** handle fileTransferExecutor pool shutdown gracefully in case of ClosedByInterruptException ** only include inbound handler for initiator's control channel * ChannelProxy: ** Use new "ChannelProxy" instance instead of shared copy in stream writer to prevent interruped thread closing backing channel * StreamSession: close session if channel is closed due to EOF from "StreamingInboundHandler" * OutboundConnection: fix OutboundConnection.id() to use proper remote/local address * Preview repair: ** Move follower's "completePreview()" from "prepareAck()" to "prepareAsync()" because initiator will close connection via "completePreview()" on "prepareSynAck()". Without this change, preview_repair_test.py will fail because initiator closes channels before follower moving to final state and follower "StreamSession" will log some error upon EOF. ** In case of preview, do not send "PrepareAckMessage" to follower, as follower has already closed stream session on "prepareAsync()" * Dtest: include "Socket closed before session completion" into {{ignore_log_patterns in "repair_test.py::TestRepair::test_dead_sync_participant"}}, it's logged when stream session was closed upon EOF due to node down. was (Author: jasonstack): |[patch|https://github.com/apache/cassandra/pull/497]|[dtest|https://github.com/apache/cassandra-dtest/pull/63]| Previous changes: * Synchronization on "StreamSession#maybeComplete()" to avoid race condition on streaming completion. * Only the "follower" is allowed to send the CompleteMessage. * Only the "initiator" is allowed to close the session and its channels after receiving the CompleteMessage. New changes to fix dtest failures: * NettyStreamingMessageSender: ** don't close channels in NettyStreamingMessageSender, as they will be closed by initator on "closeSession()" ** handle fileTransferExecutor pool shutdown gracefully in case of ClosedByInterruptException ** only include inbound handler for initiator's control channel * ChannelProxy: ** Use new "ChannelProxy" instance instead of shared copy in stream writer to prevent interruped thread closing backing channel * StreamSession: close session if channel is closed due to EOF from "StreamingInboundHandler" * OutboundConnection: fix OutboundConnection.id() to use proper remote/local address * Preview repair: ** Move follower's "completePreview()" from "prepareAck()" to "prepareAsync()" because initiator will close connection via "completePreview()" on "prepareSynAck()". Without this change, preview_repair_test.py will fail because initiator closes channels before follower moving to final state and follower "StreamSession" will log some error upon EOF. ** In case of preview, do not send "PrepareAckMessage" to follower, as follower has already closed stream session on "prepareAsync()" * Dtest: include "Socket closed before session completion" into {{ignore_log_patterns in "{{repair_test.py::TestRepair::test_dead_sync_participant", it's logged when stream session was closed upon EOF due to node down. > Race condition when completing stream sessions > -- > > Key: CASSANDRA-15666 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15666 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Streaming and Messaging >Reporter: Sergio Bossa >Assignee: ZhaoYang >Priority: Normal > Labels: pull-request-available > Fix For: 4.0 > > Time Spent: 10m > Remaining Estimate: 0h > > {{StreamSession#prepareAsync()}} executes, as the name implies, > asynchronously from the IO thread: this opens up for race conditions between > the sending of the {{PrepareSynAckMessage}} and the call to > {{StreamSession#maybeCompleted()}}. I.e., the following could happen: > 1) Node A sends {{PrepareSynAckMessage}} from the {{prepareAsync()}} thread. > 2) Node B receives it and starts streaming. > 3) Node A receives the streamed file and sends
[jira] [Comment Edited] (CASSANDRA-15666) Race condition when completing stream sessions
[ https://issues.apache.org/jira/browse/CASSANDRA-15666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17081858#comment-17081858 ] ZhaoYang edited comment on CASSANDRA-15666 at 4/13/20, 5:28 AM: |[patch|https://github.com/apache/cassandra/pull/497]|[dtest|https://github.com/apache/cassandra-dtest/pull/63]| Previous changes: * Synchronization on "StreamSession#maybeComplete()" to avoid race condition on streaming completion. * Only the "follower" is allowed to send the CompleteMessage. * Only the "initiator" is allowed to close the session and its channels after receiving the CompleteMessage. New changes to fix dtest failures: * NettyStreamingMessageSender: ** don't close channels in NettyStreamingMessageSender, as they will be closed by initator on "closeSession()" ** handle fileTransferExecutor pool shutdown gracefully in case of ClosedByInterruptException ** only include inbound handler for initiator's control channel * ChannelProxy: ** Use new "ChannelProxy" instance instead of shared copy in stream writer to prevent interruped thread closing backing channel * StreamSession: close session if channel is closed due to EOF from "StreamingInboundHandler" * OutboundConnection: fix OutboundConnection.id() to use proper remote/local address * Preview repair: ** Move follower's "completePreview()" from "prepareAck()" to "prepareAsync()" because initiator will close connection via "completePreview()" on "prepareSynAck()". Without this change, preview_repair_test.py will fail because initiator closes channels before follower moving to final state and follower "StreamSession" will log some error upon EOF. ** In case of preview, do not send "PrepareAckMessage" to follower, as follower has already closed stream session on "prepareAsync()" * Dtest: include "Socket closed before session completion" into {{ignore_log_patterns in "{{repair_test.py::TestRepair::test_dead_sync_participant", it's logged when stream session was closed upon EOF due to node down. was (Author: jasonstack): |[patch|https://github.com/apache/cassandra/pull/497]|[dtest|https://github.com/apache/cassandra-dtest/pull/63]| Previous changes: * Synchronization on "StreamSession#maybeComplete()" to avoid race condition on streaming completion. * Only the "follower" is allowed to send the CompleteMessage. * Only the "initiator" is allowed to close the session and its channels after receiving the CompleteMessage. New changes to fix dtest failures: * NettyStreamingMessageSender: ** don't close channels in NettyStreamingMessageSender, as they will be closed by initator on "closeSession()" ** handle fileTransferExecutor pool shutdown gracefully in case of ClosedByInterruptException ** only include inbound handler for initiator's control channel * ChannelProxy: ** Use new "ChannelProxy" instance instead of shared copy in stream writer to prevent interruped thread closing backing channel * StreamSession: close session if channel is closed due to EOF from "StreamingInboundHandler" * OutboundConnection: fix OutboundConnection.id() to use proper remote/local address * Preview repair: ** Move follower's "completePreview()" from "prepareAck()" to "prepareAsync()" because initiator will close connection via "completePreview()" on "prepareSynAck()". Without this change, preview_repair_test.py will fail because initiator closes channels before follower moving to final state and follower "StreamSession" will log some error upon EOF. ** In case of preview, do not send "PrepareAckMessage" to follower, as follower has already closed stream session on "prepareAsync()" * Dtest: include "Socket closed before session completion" into {{ignore_log_patterns}}, it's logged when stream session was closed upon EOF due to node down. > Race condition when completing stream sessions > -- > > Key: CASSANDRA-15666 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15666 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Streaming and Messaging >Reporter: Sergio Bossa >Assignee: ZhaoYang >Priority: Normal > Labels: pull-request-available > Fix For: 4.0 > > Time Spent: 10m > Remaining Estimate: 0h > > {{StreamSession#prepareAsync()}} executes, as the name implies, > asynchronously from the IO thread: this opens up for race conditions between > the sending of the {{PrepareSynAckMessage}} and the call to > {{StreamSession#maybeCompleted()}}. I.e., the following could happen: > 1) Node A sends {{PrepareSynAckMessage}} from the {{prepareAsync()}} thread. > 2) Node B receives it and starts streaming. > 3) Node A receives the streamed file and sends {{ReceivedMessage}}. > 4) At this point, if this was the only file to stream,
[jira] [Comment Edited] (CASSANDRA-15666) Race condition when completing stream sessions
[ https://issues.apache.org/jira/browse/CASSANDRA-15666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17081858#comment-17081858 ] ZhaoYang edited comment on CASSANDRA-15666 at 4/13/20, 5:27 AM: |[patch|https://github.com/apache/cassandra/pull/497]|[dtest|https://github.com/apache/cassandra-dtest/pull/63]| Previous changes: * Synchronization on "StreamSession#maybeComplete()" to avoid race condition on streaming completion. * Only the "follower" is allowed to send the CompleteMessage. * Only the "initiator" is allowed to close the session and its channels after receiving the CompleteMessage. New changes to fix dtest failures: * NettyStreamingMessageSender: ** don't close channels in NettyStreamingMessageSender, as they will be closed by initator on "closeSession()" ** handle fileTransferExecutor pool shutdown gracefully in case of ClosedByInterruptException ** only include inbound handler for initiator's control channel * ChannelProxy: ** Use new "ChannelProxy" instance instead of shared copy in stream writer to prevent interruped thread closing backing channel * StreamSession: close session if channel is closed due to EOF from "StreamingInboundHandler" * OutboundConnection: fix OutboundConnection.id() to use proper remote/local address * Preview repair: ** Move follower's "completePreview()" from "prepareAck()" to "prepareAsync()" because initiator will close connection via "completePreview()" on "prepareSynAck()". Without this change, preview_repair_test.py will fail because initiator closes channels before follower moving to final state and follower "StreamSession" will log some error upon EOF. ** In case of preview, do not send "PrepareAckMessage" to follower, as follower has already closed stream session on "prepareAsync()" * Dtest: include "Socket closed before session completion" into {{ignore_log_patterns}}, it's logged when stream session was closed upon EOF due to node down. was (Author: jasonstack): | [patch|https://github.com/apache/cassandra/pull/497] | [dtest|https://github.com/apache/cassandra-dtest/pull/63] | Previous changes: * Synchronization on "StreamSession#maybeComplete()" to avoid race condition on streaming completion. * Only the "follower" is allowed to send the CompleteMessage. * Only the "initiator" is allowed to close the session and its channels after receiving the CompleteMessage. New changes to fix dtest failures: * NettyStreamingMessageSender: ** don't close channels in NettyStreamingMessageSender, as they will be closed by initator on "closeSession()" ** handle fileTransferExecutor pool shutdown gracefully in case of ClosedByInterruptException ** only include inbound handler for initiator's control channel * ChannelProxy: ** Use new "ChannelProxy" instance instead of shared copy in stream writer to prevent interruped thread closing backing channel * StreamSession: close session if channel is closed due to EOF from "StreamingInboundHandler" * OutboundConnection: fix OutboundConnection.id() to use proper remote/local address * Preview repair: ** Move follower's "completePreview()" from "prepareAck()" to "prepareAsync()" because initiator will close connection via "completePreview()" on "prepareSynAck()". Without this change, preview_repair_test.py will fail because initiator closes channels before follower moving to final state and follower "StreamSession" will log some error upon EOF. ** In case of preview, do not send "PrepareAckMessage" to follower, as follower has already closed connection on "prepareAsync()" * Dtest: include "Socket closed before session completion" into {{ignore_log_patterns}}, it's logged when stream session was closed upon EOF due to node down. > Race condition when completing stream sessions > -- > > Key: CASSANDRA-15666 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15666 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Streaming and Messaging >Reporter: Sergio Bossa >Assignee: ZhaoYang >Priority: Normal > Labels: pull-request-available > Fix For: 4.0 > > Time Spent: 10m > Remaining Estimate: 0h > > {{StreamSession#prepareAsync()}} executes, as the name implies, > asynchronously from the IO thread: this opens up for race conditions between > the sending of the {{PrepareSynAckMessage}} and the call to > {{StreamSession#maybeCompleted()}}. I.e., the following could happen: > 1) Node A sends {{PrepareSynAckMessage}} from the {{prepareAsync()}} thread. > 2) Node B receives it and starts streaming. > 3) Node A receives the streamed file and sends {{ReceivedMessage}}. > 4) At this point, if this was the only file to stream, both nodes are ready > to close the session via {{maybeCompleted()}}, but: >
[jira] [Updated] (CASSANDRA-15666) Race condition when completing stream sessions
[ https://issues.apache.org/jira/browse/CASSANDRA-15666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhaoYang updated CASSANDRA-15666: - Status: Patch Available (was: In Progress) > Race condition when completing stream sessions > -- > > Key: CASSANDRA-15666 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15666 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Streaming and Messaging >Reporter: Sergio Bossa >Assignee: ZhaoYang >Priority: Normal > Labels: pull-request-available > Fix For: 4.0 > > Time Spent: 10m > Remaining Estimate: 0h > > {{StreamSession#prepareAsync()}} executes, as the name implies, > asynchronously from the IO thread: this opens up for race conditions between > the sending of the {{PrepareSynAckMessage}} and the call to > {{StreamSession#maybeCompleted()}}. I.e., the following could happen: > 1) Node A sends {{PrepareSynAckMessage}} from the {{prepareAsync()}} thread. > 2) Node B receives it and starts streaming. > 3) Node A receives the streamed file and sends {{ReceivedMessage}}. > 4) At this point, if this was the only file to stream, both nodes are ready > to close the session via {{maybeCompleted()}}, but: > a) Node A will call it twice from both the IO thread and the thread at #1, > closing the session and its channels. > b) Node B will attempt to send a {{CompleteMessage}}, but will fail because > the session has been closed in the meantime. > There are other subtle variations of the pattern above, depending on the > order of concurrently sent/received messages. > I believe the best fix would be to modify the message exchange so that: > 1) Only the "follower" is allowed to send the {{CompleteMessage}}. > 2) Only the "initiator" is allowed to close the session and its channels > after receiving the {{CompleteMessage}}. > By doing so, the message exchange logic would be easier to reason about, > which is overall a win anyway. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-15666) Race condition when completing stream sessions
[ https://issues.apache.org/jira/browse/CASSANDRA-15666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17081858#comment-17081858 ] ZhaoYang edited comment on CASSANDRA-15666 at 4/12/20, 5:23 PM: | [patch|https://github.com/apache/cassandra/pull/497] | [dtest|https://github.com/apache/cassandra-dtest/pull/63] | Previous changes: * Synchronization on "StreamSession#maybeComplete()" to avoid race condition on streaming completion. * Only the "follower" is allowed to send the CompleteMessage. * Only the "initiator" is allowed to close the session and its channels after receiving the CompleteMessage. New changes to fix dtest failures: * NettyStreamingMessageSender: ** don't close channels in NettyStreamingMessageSender, as they will be closed by initator on "closeSession()" ** handle fileTransferExecutor pool shutdown gracefully in case of ClosedByInterruptException ** only include inbound handler for initiator's control channel * ChannelProxy: ** Use new "ChannelProxy" instance instead of shared copy in stream writer to prevent interruped thread closing backing channel * StreamSession: close session if channel is closed due to EOF from "StreamingInboundHandler" * OutboundConnection: fix OutboundConnection.id() to use proper remote/local address * Preview repair: ** Move follower's "completePreview()" from "prepareAck()" to "prepareAsync()" because initiator will close connection via "completePreview()" on "prepareSynAck()". Without this change, preview_repair_test.py will fail because initiator closes channels before follower moving to final state and follower "StreamSession" will log some error upon EOF. ** In case of preview, do not send "PrepareAckMessage" to follower, as follower has already closed connection on "prepareAsync()" * Dtest: include "Socket closed before session completion" into {{ignore_log_patterns}}, it's logged when stream session was closed upon EOF due to node down. was (Author: jasonstack): | [patch|https://github.com/apache/cassandra/pull/497] | [dtest|https://github.com/apache/cassandra-dtest/pull/63] | Previous changes: * Synchronization on "StreamSession#maybeComplete()" to avoid race condition on streaming completion. * Only the "follower" is allowed to send the CompleteMessage. * Only the "initiator" is allowed to close the session and its channels after receiving the CompleteMessage. New changes to fix dtest failures: * NettyStreamingMessageSender: ** don't close channels in NettyStreamingMessageSender, as they will be closed by initator on "closeSession()" ** handle fileTransferExecutor pool shutdown gracefully in case of ClosedByInterruptException ** only include inbound handler for initiator's control channel * ChannelProxy: ** Use new "ChannelProxy" instance instead of shared copy in stream writer to prevent interruped thread closing backing channel * StreamSession: close session if channel is closed due to EOF from "StreamingInboundHandler" * OutboundConnection: fix OutboundConnection.id() to use proper remote/local address * Preview repair: ** Move follower's "completePreview()" from "prepareAck()" to "prepareAsync()" because initiator will close connection via "completePreview" on "prepareSynAck()" ** In case of preview, do not send "PrepareAckMessage" to follower, as follower has already closed connection on "prepareAck()" * Dtest: include "Socket closed before session completion" into {{ignore_log_patterns}}, it's logged when stream session was closed upon EOF due to node down. > Race condition when completing stream sessions > -- > > Key: CASSANDRA-15666 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15666 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Streaming and Messaging >Reporter: Sergio Bossa >Assignee: ZhaoYang >Priority: Normal > Labels: pull-request-available > Fix For: 4.0 > > Time Spent: 10m > Remaining Estimate: 0h > > {{StreamSession#prepareAsync()}} executes, as the name implies, > asynchronously from the IO thread: this opens up for race conditions between > the sending of the {{PrepareSynAckMessage}} and the call to > {{StreamSession#maybeCompleted()}}. I.e., the following could happen: > 1) Node A sends {{PrepareSynAckMessage}} from the {{prepareAsync()}} thread. > 2) Node B receives it and starts streaming. > 3) Node A receives the streamed file and sends {{ReceivedMessage}}. > 4) At this point, if this was the only file to stream, both nodes are ready > to close the session via {{maybeCompleted()}}, but: > a) Node A will call it twice from both the IO thread and the thread at #1, > closing the session and its channels. > b) Node B will attempt to send a {{CompleteMessage}}, but will fail because > the
[jira] [Commented] (CASSANDRA-15666) Race condition when completing stream sessions
[ https://issues.apache.org/jira/browse/CASSANDRA-15666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17081858#comment-17081858 ] ZhaoYang commented on CASSANDRA-15666: -- | [patch|https://github.com/apache/cassandra/pull/497] | [dtest|https://github.com/apache/cassandra-dtest/pull/63] | Previous changes: * Synchronization on "StreamSession#maybeComplete()" to avoid race condition on streaming completion. * Only the "follower" is allowed to send the CompleteMessage. * Only the "initiator" is allowed to close the session and its channels after receiving the CompleteMessage. New changes to fix dtest failures: * NettyStreamingMessageSender: ** don't close channels in NettyStreamingMessageSender, as they will be closed by initator on "closeSession()" ** handle fileTransferExecutor pool shutdown gracefully in case of ClosedByInterruptException ** only include inbound handler for initiator's control channel * ChannelProxy: ** Use new "ChannelProxy" instance instead of shared copy in stream writer to prevent interruped thread closing backing channel * StreamSession: close session if channel is closed due to EOF from "StreamingInboundHandler" * OutboundConnection: fix OutboundConnection.id() to use proper remote/local address * Preview repair: ** Move follower's "completePreview()" from "prepareAck()" to "prepareAsync()" because initiator will close connection via "completePreview" on "prepareSynAck()" ** In case of preview, do not send "PrepareAckMessage" to follower, as follower has already closed connection on "prepareAck()" * Dtest: include "Socket closed before session completion" into {{ignore_log_patterns}}, it's logged when stream session was closed upon EOF due to node down. > Race condition when completing stream sessions > -- > > Key: CASSANDRA-15666 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15666 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Streaming and Messaging >Reporter: Sergio Bossa >Assignee: ZhaoYang >Priority: Normal > Labels: pull-request-available > Fix For: 4.0 > > Time Spent: 10m > Remaining Estimate: 0h > > {{StreamSession#prepareAsync()}} executes, as the name implies, > asynchronously from the IO thread: this opens up for race conditions between > the sending of the {{PrepareSynAckMessage}} and the call to > {{StreamSession#maybeCompleted()}}. I.e., the following could happen: > 1) Node A sends {{PrepareSynAckMessage}} from the {{prepareAsync()}} thread. > 2) Node B receives it and starts streaming. > 3) Node A receives the streamed file and sends {{ReceivedMessage}}. > 4) At this point, if this was the only file to stream, both nodes are ready > to close the session via {{maybeCompleted()}}, but: > a) Node A will call it twice from both the IO thread and the thread at #1, > closing the session and its channels. > b) Node B will attempt to send a {{CompleteMessage}}, but will fail because > the session has been closed in the meantime. > There are other subtle variations of the pattern above, depending on the > order of concurrently sent/received messages. > I believe the best fix would be to modify the message exchange so that: > 1) Only the "follower" is allowed to send the {{CompleteMessage}}. > 2) Only the "initiator" is allowed to close the session and its channels > after receiving the {{CompleteMessage}}. > By doing so, the message exchange logic would be easier to reason about, > which is overall a win anyway. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15666) Race condition when completing stream sessions
[ https://issues.apache.org/jira/browse/CASSANDRA-15666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhaoYang updated CASSANDRA-15666: - Test and Documentation Plan: Added interceptor to verify stream messages and state transition. CI: [https://circleci.com/workflow-run/deca618c-1f9c-4e3b-b4f1-f0fb1a8aac7f] dtest failure "repair_test.py" is fixed in [https://github.com/apache/cassandra-dtest/pull/63] was: Added interceptor to verify stream messages and state transition. CI: [https://circleci.com/workflow-run/deca618c-1f9c-4e3b-b4f1-f0fb1a8aac7f] dtest failure "repair_test.py" is fixed in > Race condition when completing stream sessions > -- > > Key: CASSANDRA-15666 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15666 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Streaming and Messaging >Reporter: Sergio Bossa >Assignee: ZhaoYang >Priority: Normal > Labels: pull-request-available > Fix For: 4.0 > > Time Spent: 10m > Remaining Estimate: 0h > > {{StreamSession#prepareAsync()}} executes, as the name implies, > asynchronously from the IO thread: this opens up for race conditions between > the sending of the {{PrepareSynAckMessage}} and the call to > {{StreamSession#maybeCompleted()}}. I.e., the following could happen: > 1) Node A sends {{PrepareSynAckMessage}} from the {{prepareAsync()}} thread. > 2) Node B receives it and starts streaming. > 3) Node A receives the streamed file and sends {{ReceivedMessage}}. > 4) At this point, if this was the only file to stream, both nodes are ready > to close the session via {{maybeCompleted()}}, but: > a) Node A will call it twice from both the IO thread and the thread at #1, > closing the session and its channels. > b) Node B will attempt to send a {{CompleteMessage}}, but will fail because > the session has been closed in the meantime. > There are other subtle variations of the pattern above, depending on the > order of concurrently sent/received messages. > I believe the best fix would be to modify the message exchange so that: > 1) Only the "follower" is allowed to send the {{CompleteMessage}}. > 2) Only the "initiator" is allowed to close the session and its channels > after receiving the {{CompleteMessage}}. > By doing so, the message exchange logic would be easier to reason about, > which is overall a win anyway. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15666) Race condition when completing stream sessions
[ https://issues.apache.org/jira/browse/CASSANDRA-15666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhaoYang updated CASSANDRA-15666: - Test and Documentation Plan: Added interceptor to verify stream messages and state transition. CI: [https://circleci.com/workflow-run/deca618c-1f9c-4e3b-b4f1-f0fb1a8aac7f] dtest failure "repair_test.py" is fixed in was: Added interceptor to verify stream messages and state transition. CI pending: [https://circleci.com/workflow-run/54fa1412-65ec-4580-9f62-a81eaa0e6ec2] > Race condition when completing stream sessions > -- > > Key: CASSANDRA-15666 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15666 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Streaming and Messaging >Reporter: Sergio Bossa >Assignee: ZhaoYang >Priority: Normal > Labels: pull-request-available > Fix For: 4.0 > > Time Spent: 10m > Remaining Estimate: 0h > > {{StreamSession#prepareAsync()}} executes, as the name implies, > asynchronously from the IO thread: this opens up for race conditions between > the sending of the {{PrepareSynAckMessage}} and the call to > {{StreamSession#maybeCompleted()}}. I.e., the following could happen: > 1) Node A sends {{PrepareSynAckMessage}} from the {{prepareAsync()}} thread. > 2) Node B receives it and starts streaming. > 3) Node A receives the streamed file and sends {{ReceivedMessage}}. > 4) At this point, if this was the only file to stream, both nodes are ready > to close the session via {{maybeCompleted()}}, but: > a) Node A will call it twice from both the IO thread and the thread at #1, > closing the session and its channels. > b) Node B will attempt to send a {{CompleteMessage}}, but will fail because > the session has been closed in the meantime. > There are other subtle variations of the pattern above, depending on the > order of concurrently sent/received messages. > I believe the best fix would be to modify the message exchange so that: > 1) Only the "follower" is allowed to send the {{CompleteMessage}}. > 2) Only the "initiator" is allowed to close the session and its channels > after receiving the {{CompleteMessage}}. > By doing so, the message exchange logic would be easier to reason about, > which is overall a win anyway. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15666) Race condition when completing stream sessions
[ https://issues.apache.org/jira/browse/CASSANDRA-15666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated CASSANDRA-15666: --- Labels: pull-request-available (was: ) > Race condition when completing stream sessions > -- > > Key: CASSANDRA-15666 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15666 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Streaming and Messaging >Reporter: Sergio Bossa >Assignee: ZhaoYang >Priority: Normal > Labels: pull-request-available > Fix For: 4.0 > > > {{StreamSession#prepareAsync()}} executes, as the name implies, > asynchronously from the IO thread: this opens up for race conditions between > the sending of the {{PrepareSynAckMessage}} and the call to > {{StreamSession#maybeCompleted()}}. I.e., the following could happen: > 1) Node A sends {{PrepareSynAckMessage}} from the {{prepareAsync()}} thread. > 2) Node B receives it and starts streaming. > 3) Node A receives the streamed file and sends {{ReceivedMessage}}. > 4) At this point, if this was the only file to stream, both nodes are ready > to close the session via {{maybeCompleted()}}, but: > a) Node A will call it twice from both the IO thread and the thread at #1, > closing the session and its channels. > b) Node B will attempt to send a {{CompleteMessage}}, but will fail because > the session has been closed in the meantime. > There are other subtle variations of the pattern above, depending on the > order of concurrently sent/received messages. > I believe the best fix would be to modify the message exchange so that: > 1) Only the "follower" is allowed to send the {{CompleteMessage}}. > 2) Only the "initiator" is allowed to close the session and its channels > after receiving the {{CompleteMessage}}. > By doing so, the message exchange logic would be easier to reason about, > which is overall a win anyway. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15666) Race condition when completing stream sessions
[ https://issues.apache.org/jira/browse/CASSANDRA-15666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhaoYang updated CASSANDRA-15666: - Source Control Link: [patch|https://github.com/apache/cassandra/pull/497], [dtest|https://github.com/apache/cassandra-dtest/pull/63] (was: https://github.com/apache/cassandra/pull/497) > Race condition when completing stream sessions > -- > > Key: CASSANDRA-15666 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15666 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Streaming and Messaging >Reporter: Sergio Bossa >Assignee: ZhaoYang >Priority: Normal > Labels: pull-request-available > Fix For: 4.0 > > Time Spent: 10m > Remaining Estimate: 0h > > {{StreamSession#prepareAsync()}} executes, as the name implies, > asynchronously from the IO thread: this opens up for race conditions between > the sending of the {{PrepareSynAckMessage}} and the call to > {{StreamSession#maybeCompleted()}}. I.e., the following could happen: > 1) Node A sends {{PrepareSynAckMessage}} from the {{prepareAsync()}} thread. > 2) Node B receives it and starts streaming. > 3) Node A receives the streamed file and sends {{ReceivedMessage}}. > 4) At this point, if this was the only file to stream, both nodes are ready > to close the session via {{maybeCompleted()}}, but: > a) Node A will call it twice from both the IO thread and the thread at #1, > closing the session and its channels. > b) Node B will attempt to send a {{CompleteMessage}}, but will fail because > the session has been closed in the meantime. > There are other subtle variations of the pattern above, depending on the > order of concurrently sent/received messages. > I believe the best fix would be to modify the message exchange so that: > 1) Only the "follower" is allowed to send the {{CompleteMessage}}. > 2) Only the "initiator" is allowed to close the session and its channels > after receiving the {{CompleteMessage}}. > By doing so, the message exchange logic would be easier to reason about, > which is overall a win anyway. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org