[jira] [Updated] (CASSANDRA-15229) BufferPool Regression

2020-04-12 Thread ZhaoYang (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhaoYang updated CASSANDRA-15229:
-
Attachment: 15229-unsafe.png
15229-direct.png

> BufferPool Regression
> -
>
> Key: CASSANDRA-15229
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15229
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Caching
>Reporter: Benedict Elliott Smith
>Assignee: ZhaoYang
>Priority: Normal
> Fix For: 4.0, 4.0-beta
>
> Attachments: 15229-count.png, 15229-direct.png, 15229-hit-rate.png, 
> 15229-recirculate-count.png, 15229-recirculate-hit-rate.png, 
> 15229-recirculate-size.png, 15229-recirculate.png, 15229-size.png, 
> 15229-unsafe.png
>
>
> The BufferPool was never intended to be used for a {{ChunkCache}}, and we 
> need to either change our behaviour to handle uncorrelated lifetimes or use 
> something else.  This is particularly important with the default chunk size 
> for compressed sstables being reduced.  If we address the problem, we should 
> also utilise the BufferPool for native transport connections like we do for 
> internode messaging, and reduce the number of pooling solutions we employ.
> Probably the best thing to do is to improve BufferPool’s behaviour when used 
> for things with uncorrelated lifetimes, which essentially boils down to 
> tracking those chunks that have not been freed and re-circulating them when 
> we run out of completely free blocks.  We should probably also permit 
> instantiating separate {{BufferPool}}, so that we can insulate internode 
> messaging from the {{ChunkCache}}, or at least have separate memory bounds 
> for each, and only share fully-freed chunks.
> With these improvements we can also safely increase the {{BufferPool}} chunk 
> size to 128KiB or 256KiB, to guarantee we can fit compressed pages and reduce 
> the amount of global coordination and per-allocation overhead.  We don’t need 
> 1KiB granularity for allocations, nor 16 byte granularity for tiny 
> allocations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15666) Race condition when completing stream sessions

2020-04-12 Thread ZhaoYang (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17081858#comment-17081858
 ] 

ZhaoYang edited comment on CASSANDRA-15666 at 4/13/20, 5:29 AM:


|[patch|https://github.com/apache/cassandra/pull/497]|[dtest|https://github.com/apache/cassandra-dtest/pull/63]|

Previous changes:
 * Synchronization on "StreamSession#maybeComplete()" to avoid race condition 
on streaming completion.
 * Only the "follower" is allowed to send the CompleteMessage.
 * Only the "initiator" is allowed to close the session and its channels after 
receiving the CompleteMessage.

New changes to fix dtest failures:
 * NettyStreamingMessageSender:
 ** don't close channels in NettyStreamingMessageSender, as they will be closed 
by initator on "closeSession()"
 ** handle fileTransferExecutor pool shutdown gracefully in case of 
ClosedByInterruptException
 ** only include inbound handler for initiator's control channel

 * ChannelProxy:
 ** Use new "ChannelProxy" instance instead of shared copy in stream writer to 
prevent interruped thread closing backing channel

 * StreamSession: close session if channel is closed due to EOF from 
"StreamingInboundHandler"

 * OutboundConnection: fix OutboundConnection.id() to use proper remote/local 
address

 * Preview repair:
 ** Move follower's "completePreview()" from "prepareAck()" to "prepareAsync()" 
because initiator will close connection via "completePreview()" on 
"prepareSynAck()". Without this change, preview_repair_test.py will fail 
because initiator closes channels before follower moving to final state and 
follower "StreamSession" will log some error upon EOF.
 ** In case of preview, do not send "PrepareAckMessage" to follower, as 
follower has already closed stream session on "prepareAsync()"

 * Dtest: include "Socket closed before session completion" into 
{{ignore_log_patterns in 
"repair_test.py::TestRepair::test_dead_sync_participant"}}, it's logged when 
stream session was closed upon EOF due to node down.


was (Author: jasonstack):
|[patch|https://github.com/apache/cassandra/pull/497]|[dtest|https://github.com/apache/cassandra-dtest/pull/63]|

Previous changes:
 * Synchronization on "StreamSession#maybeComplete()" to avoid race condition 
on streaming completion.
 * Only the "follower" is allowed to send the CompleteMessage.
 * Only the "initiator" is allowed to close the session and its channels after 
receiving the CompleteMessage.

New changes to fix dtest failures:
 * NettyStreamingMessageSender:
 ** don't close channels in NettyStreamingMessageSender, as they will be closed 
by initator on "closeSession()"
 ** handle fileTransferExecutor pool shutdown gracefully in case of 
ClosedByInterruptException
 ** only include inbound handler for initiator's control channel

 * ChannelProxy:
 ** Use new "ChannelProxy" instance instead of shared copy in stream writer to 
prevent interruped thread closing backing channel

 * StreamSession: close session if channel is closed due to EOF from 
"StreamingInboundHandler"

 * OutboundConnection: fix OutboundConnection.id() to use proper remote/local 
address

 * Preview repair:
 ** Move follower's "completePreview()" from "prepareAck()" to "prepareAsync()" 
because initiator will close connection via "completePreview()" on 
"prepareSynAck()". Without this change, preview_repair_test.py will fail 
because initiator closes channels before follower moving to final state and 
follower "StreamSession" will log some error upon EOF.
 ** In case of preview, do not send "PrepareAckMessage" to follower, as 
follower has already closed stream session on "prepareAsync()"

 * Dtest: include "Socket closed before session completion" into 
{{ignore_log_patterns in 
"{{repair_test.py::TestRepair::test_dead_sync_participant", it's logged 
when stream session was closed upon EOF due to node down.

> Race condition when completing stream sessions
> --
>
> Key: CASSANDRA-15666
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15666
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Streaming and Messaging
>Reporter: Sergio Bossa
>Assignee: ZhaoYang
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{StreamSession#prepareAsync()}} executes, as the name implies, 
> asynchronously from the IO thread: this opens up for race conditions between 
> the sending of the {{PrepareSynAckMessage}} and the call to 
> {{StreamSession#maybeCompleted()}}. I.e., the following could happen:
> 1) Node A sends {{PrepareSynAckMessage}} from the {{prepareAsync()}} thread.
> 2) Node B receives it and starts streaming.
> 3) Node A receives the streamed file and sends 

[jira] [Comment Edited] (CASSANDRA-15666) Race condition when completing stream sessions

2020-04-12 Thread ZhaoYang (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17081858#comment-17081858
 ] 

ZhaoYang edited comment on CASSANDRA-15666 at 4/13/20, 5:28 AM:


|[patch|https://github.com/apache/cassandra/pull/497]|[dtest|https://github.com/apache/cassandra-dtest/pull/63]|

Previous changes:
 * Synchronization on "StreamSession#maybeComplete()" to avoid race condition 
on streaming completion.
 * Only the "follower" is allowed to send the CompleteMessage.
 * Only the "initiator" is allowed to close the session and its channels after 
receiving the CompleteMessage.

New changes to fix dtest failures:
 * NettyStreamingMessageSender:
 ** don't close channels in NettyStreamingMessageSender, as they will be closed 
by initator on "closeSession()"
 ** handle fileTransferExecutor pool shutdown gracefully in case of 
ClosedByInterruptException
 ** only include inbound handler for initiator's control channel

 * ChannelProxy:
 ** Use new "ChannelProxy" instance instead of shared copy in stream writer to 
prevent interruped thread closing backing channel

 * StreamSession: close session if channel is closed due to EOF from 
"StreamingInboundHandler"

 * OutboundConnection: fix OutboundConnection.id() to use proper remote/local 
address

 * Preview repair:
 ** Move follower's "completePreview()" from "prepareAck()" to "prepareAsync()" 
because initiator will close connection via "completePreview()" on 
"prepareSynAck()". Without this change, preview_repair_test.py will fail 
because initiator closes channels before follower moving to final state and 
follower "StreamSession" will log some error upon EOF.
 ** In case of preview, do not send "PrepareAckMessage" to follower, as 
follower has already closed stream session on "prepareAsync()"

 * Dtest: include "Socket closed before session completion" into 
{{ignore_log_patterns in 
"{{repair_test.py::TestRepair::test_dead_sync_participant", it's logged 
when stream session was closed upon EOF due to node down.


was (Author: jasonstack):
|[patch|https://github.com/apache/cassandra/pull/497]|[dtest|https://github.com/apache/cassandra-dtest/pull/63]|

Previous changes:
 * Synchronization on "StreamSession#maybeComplete()" to avoid race condition 
on streaming completion.
 * Only the "follower" is allowed to send the CompleteMessage.
 * Only the "initiator" is allowed to close the session and its channels after 
receiving the CompleteMessage.

New changes to fix dtest failures:
 * NettyStreamingMessageSender:
 ** don't close channels in NettyStreamingMessageSender, as they will be closed 
by initator on "closeSession()"
 ** handle fileTransferExecutor pool shutdown gracefully in case of 
ClosedByInterruptException
 ** only include inbound handler for initiator's control channel

 * ChannelProxy:
 ** Use new "ChannelProxy" instance instead of shared copy in stream writer to 
prevent interruped thread closing backing channel

 * StreamSession: close session if channel is closed due to EOF from 
"StreamingInboundHandler"

 * OutboundConnection: fix OutboundConnection.id() to use proper remote/local 
address

 * Preview repair:
 ** Move follower's "completePreview()" from "prepareAck()" to "prepareAsync()" 
because initiator will close connection via "completePreview()" on 
"prepareSynAck()". Without this change, preview_repair_test.py will fail 
because initiator closes channels before follower moving to final state and 
follower "StreamSession" will log some error upon EOF.
 ** In case of preview, do not send "PrepareAckMessage" to follower, as 
follower has already closed stream session on "prepareAsync()"

 * Dtest: include "Socket closed before session completion" into 
{{ignore_log_patterns}}, it's logged when stream session was closed upon EOF 
due to node down.

> Race condition when completing stream sessions
> --
>
> Key: CASSANDRA-15666
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15666
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Streaming and Messaging
>Reporter: Sergio Bossa
>Assignee: ZhaoYang
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{StreamSession#prepareAsync()}} executes, as the name implies, 
> asynchronously from the IO thread: this opens up for race conditions between 
> the sending of the {{PrepareSynAckMessage}} and the call to 
> {{StreamSession#maybeCompleted()}}. I.e., the following could happen:
> 1) Node A sends {{PrepareSynAckMessage}} from the {{prepareAsync()}} thread.
> 2) Node B receives it and starts streaming.
> 3) Node A receives the streamed file and sends {{ReceivedMessage}}.
> 4) At this point, if this was the only file to stream, 

[jira] [Comment Edited] (CASSANDRA-15666) Race condition when completing stream sessions

2020-04-12 Thread ZhaoYang (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17081858#comment-17081858
 ] 

ZhaoYang edited comment on CASSANDRA-15666 at 4/13/20, 5:27 AM:


|[patch|https://github.com/apache/cassandra/pull/497]|[dtest|https://github.com/apache/cassandra-dtest/pull/63]|

Previous changes:
 * Synchronization on "StreamSession#maybeComplete()" to avoid race condition 
on streaming completion.
 * Only the "follower" is allowed to send the CompleteMessage.
 * Only the "initiator" is allowed to close the session and its channels after 
receiving the CompleteMessage.

New changes to fix dtest failures:
 * NettyStreamingMessageSender:
 ** don't close channels in NettyStreamingMessageSender, as they will be closed 
by initator on "closeSession()"
 ** handle fileTransferExecutor pool shutdown gracefully in case of 
ClosedByInterruptException
 ** only include inbound handler for initiator's control channel

 * ChannelProxy:
 ** Use new "ChannelProxy" instance instead of shared copy in stream writer to 
prevent interruped thread closing backing channel

 * StreamSession: close session if channel is closed due to EOF from 
"StreamingInboundHandler"

 * OutboundConnection: fix OutboundConnection.id() to use proper remote/local 
address

 * Preview repair:
 ** Move follower's "completePreview()" from "prepareAck()" to "prepareAsync()" 
because initiator will close connection via "completePreview()" on 
"prepareSynAck()". Without this change, preview_repair_test.py will fail 
because initiator closes channels before follower moving to final state and 
follower "StreamSession" will log some error upon EOF.
 ** In case of preview, do not send "PrepareAckMessage" to follower, as 
follower has already closed stream session on "prepareAsync()"

 * Dtest: include "Socket closed before session completion" into 
{{ignore_log_patterns}}, it's logged when stream session was closed upon EOF 
due to node down.


was (Author: jasonstack):
| [patch|https://github.com/apache/cassandra/pull/497] | 
[dtest|https://github.com/apache/cassandra-dtest/pull/63] |

Previous changes:
* Synchronization on "StreamSession#maybeComplete()" to avoid race condition on 
streaming completion.
* Only the "follower" is allowed to send the CompleteMessage.
* Only the "initiator" is allowed to close the session and its channels after 
receiving the CompleteMessage.

New changes to fix dtest failures:
* NettyStreamingMessageSender: 
** don't close channels in NettyStreamingMessageSender, as they will be closed 
by initator on "closeSession()"
** handle fileTransferExecutor pool shutdown gracefully in case of 
ClosedByInterruptException
** only include inbound handler for initiator's control channel

* ChannelProxy:
** Use new "ChannelProxy" instance instead of shared copy in stream writer to 
prevent interruped thread closing backing channel

* StreamSession: close session if channel is closed due to EOF from 
"StreamingInboundHandler"

* OutboundConnection: fix OutboundConnection.id() to use proper remote/local 
address

* Preview repair:
** Move follower's "completePreview()" from "prepareAck()" to "prepareAsync()" 
because initiator will close connection via "completePreview()" on 
"prepareSynAck()". Without this change, preview_repair_test.py will fail 
because initiator closes channels before follower moving to final state and 
follower "StreamSession" will log some error upon EOF.
** In case of preview, do not send "PrepareAckMessage" to follower, as follower 
has already closed connection on "prepareAsync()"

* Dtest: include "Socket closed before session completion" into 
{{ignore_log_patterns}}, it's logged when stream session was closed upon EOF 
due to node down.

> Race condition when completing stream sessions
> --
>
> Key: CASSANDRA-15666
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15666
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Streaming and Messaging
>Reporter: Sergio Bossa
>Assignee: ZhaoYang
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{StreamSession#prepareAsync()}} executes, as the name implies, 
> asynchronously from the IO thread: this opens up for race conditions between 
> the sending of the {{PrepareSynAckMessage}} and the call to 
> {{StreamSession#maybeCompleted()}}. I.e., the following could happen:
> 1) Node A sends {{PrepareSynAckMessage}} from the {{prepareAsync()}} thread.
> 2) Node B receives it and starts streaming.
> 3) Node A receives the streamed file and sends {{ReceivedMessage}}.
> 4) At this point, if this was the only file to stream, both nodes are ready 
> to close the session via {{maybeCompleted()}}, but:
> 

[jira] [Updated] (CASSANDRA-15666) Race condition when completing stream sessions

2020-04-12 Thread ZhaoYang (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhaoYang updated CASSANDRA-15666:
-
Status: Patch Available  (was: In Progress)

> Race condition when completing stream sessions
> --
>
> Key: CASSANDRA-15666
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15666
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Streaming and Messaging
>Reporter: Sergio Bossa
>Assignee: ZhaoYang
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{StreamSession#prepareAsync()}} executes, as the name implies, 
> asynchronously from the IO thread: this opens up for race conditions between 
> the sending of the {{PrepareSynAckMessage}} and the call to 
> {{StreamSession#maybeCompleted()}}. I.e., the following could happen:
> 1) Node A sends {{PrepareSynAckMessage}} from the {{prepareAsync()}} thread.
> 2) Node B receives it and starts streaming.
> 3) Node A receives the streamed file and sends {{ReceivedMessage}}.
> 4) At this point, if this was the only file to stream, both nodes are ready 
> to close the session via {{maybeCompleted()}}, but:
> a) Node A will call it twice from both the IO thread and the thread at #1, 
> closing the session and its channels.
> b) Node B will attempt to send a {{CompleteMessage}}, but will fail because 
> the session has been closed in the meantime.
> There are other subtle variations of the pattern above, depending on the 
> order of concurrently sent/received messages.
> I believe the best fix would be to modify the message exchange so that:
> 1) Only the "follower" is allowed to send the {{CompleteMessage}}.
> 2) Only the "initiator" is allowed to close the session and its channels 
> after receiving the {{CompleteMessage}}.
> By doing so, the message exchange logic would be easier to reason about, 
> which is overall a win anyway.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15666) Race condition when completing stream sessions

2020-04-12 Thread ZhaoYang (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17081858#comment-17081858
 ] 

ZhaoYang edited comment on CASSANDRA-15666 at 4/12/20, 5:23 PM:


| [patch|https://github.com/apache/cassandra/pull/497] | 
[dtest|https://github.com/apache/cassandra-dtest/pull/63] |

Previous changes:
* Synchronization on "StreamSession#maybeComplete()" to avoid race condition on 
streaming completion.
* Only the "follower" is allowed to send the CompleteMessage.
* Only the "initiator" is allowed to close the session and its channels after 
receiving the CompleteMessage.

New changes to fix dtest failures:
* NettyStreamingMessageSender: 
** don't close channels in NettyStreamingMessageSender, as they will be closed 
by initator on "closeSession()"
** handle fileTransferExecutor pool shutdown gracefully in case of 
ClosedByInterruptException
** only include inbound handler for initiator's control channel

* ChannelProxy:
** Use new "ChannelProxy" instance instead of shared copy in stream writer to 
prevent interruped thread closing backing channel

* StreamSession: close session if channel is closed due to EOF from 
"StreamingInboundHandler"

* OutboundConnection: fix OutboundConnection.id() to use proper remote/local 
address

* Preview repair:
** Move follower's "completePreview()" from "prepareAck()" to "prepareAsync()" 
because initiator will close connection via "completePreview()" on 
"prepareSynAck()". Without this change, preview_repair_test.py will fail 
because initiator closes channels before follower moving to final state and 
follower "StreamSession" will log some error upon EOF.
** In case of preview, do not send "PrepareAckMessage" to follower, as follower 
has already closed connection on "prepareAsync()"

* Dtest: include "Socket closed before session completion" into 
{{ignore_log_patterns}}, it's logged when stream session was closed upon EOF 
due to node down.


was (Author: jasonstack):
| [patch|https://github.com/apache/cassandra/pull/497] | 
[dtest|https://github.com/apache/cassandra-dtest/pull/63] |

Previous changes:
* Synchronization on "StreamSession#maybeComplete()" to avoid race condition on 
streaming completion.
* Only the "follower" is allowed to send the CompleteMessage.
* Only the "initiator" is allowed to close the session and its channels after 
receiving the CompleteMessage.

New changes to fix dtest failures:
* NettyStreamingMessageSender: 
** don't close channels in NettyStreamingMessageSender, as they will be closed 
by initator on "closeSession()"
** handle fileTransferExecutor pool shutdown gracefully in case of 
ClosedByInterruptException
** only include inbound handler for initiator's control channel

* ChannelProxy:
** Use new "ChannelProxy" instance instead of shared copy in stream writer to 
prevent interruped thread closing backing channel

* StreamSession: close session if channel is closed due to EOF from 
"StreamingInboundHandler"

* OutboundConnection: fix OutboundConnection.id() to use proper remote/local 
address

* Preview repair:
** Move follower's "completePreview()" from "prepareAck()" to "prepareAsync()" 
because initiator will close connection via "completePreview" on 
"prepareSynAck()"
** In case of preview, do not send "PrepareAckMessage" to follower, as follower 
has already closed connection on "prepareAck()"

* Dtest: include "Socket closed before session completion" into 
{{ignore_log_patterns}}, it's logged when stream session was closed upon EOF 
due to node down.


> Race condition when completing stream sessions
> --
>
> Key: CASSANDRA-15666
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15666
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Streaming and Messaging
>Reporter: Sergio Bossa
>Assignee: ZhaoYang
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{StreamSession#prepareAsync()}} executes, as the name implies, 
> asynchronously from the IO thread: this opens up for race conditions between 
> the sending of the {{PrepareSynAckMessage}} and the call to 
> {{StreamSession#maybeCompleted()}}. I.e., the following could happen:
> 1) Node A sends {{PrepareSynAckMessage}} from the {{prepareAsync()}} thread.
> 2) Node B receives it and starts streaming.
> 3) Node A receives the streamed file and sends {{ReceivedMessage}}.
> 4) At this point, if this was the only file to stream, both nodes are ready 
> to close the session via {{maybeCompleted()}}, but:
> a) Node A will call it twice from both the IO thread and the thread at #1, 
> closing the session and its channels.
> b) Node B will attempt to send a {{CompleteMessage}}, but will fail because 
> the 

[jira] [Commented] (CASSANDRA-15666) Race condition when completing stream sessions

2020-04-12 Thread ZhaoYang (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17081858#comment-17081858
 ] 

ZhaoYang commented on CASSANDRA-15666:
--

| [patch|https://github.com/apache/cassandra/pull/497] | 
[dtest|https://github.com/apache/cassandra-dtest/pull/63] |

Previous changes:
* Synchronization on "StreamSession#maybeComplete()" to avoid race condition on 
streaming completion.
* Only the "follower" is allowed to send the CompleteMessage.
* Only the "initiator" is allowed to close the session and its channels after 
receiving the CompleteMessage.

New changes to fix dtest failures:
* NettyStreamingMessageSender: 
** don't close channels in NettyStreamingMessageSender, as they will be closed 
by initator on "closeSession()"
** handle fileTransferExecutor pool shutdown gracefully in case of 
ClosedByInterruptException
** only include inbound handler for initiator's control channel

* ChannelProxy:
** Use new "ChannelProxy" instance instead of shared copy in stream writer to 
prevent interruped thread closing backing channel

* StreamSession: close session if channel is closed due to EOF from 
"StreamingInboundHandler"

* OutboundConnection: fix OutboundConnection.id() to use proper remote/local 
address

* Preview repair:
** Move follower's "completePreview()" from "prepareAck()" to "prepareAsync()" 
because initiator will close connection via "completePreview" on 
"prepareSynAck()"
** In case of preview, do not send "PrepareAckMessage" to follower, as follower 
has already closed connection on "prepareAck()"

* Dtest: include "Socket closed before session completion" into 
{{ignore_log_patterns}}, it's logged when stream session was closed upon EOF 
due to node down.


> Race condition when completing stream sessions
> --
>
> Key: CASSANDRA-15666
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15666
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Streaming and Messaging
>Reporter: Sergio Bossa
>Assignee: ZhaoYang
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{StreamSession#prepareAsync()}} executes, as the name implies, 
> asynchronously from the IO thread: this opens up for race conditions between 
> the sending of the {{PrepareSynAckMessage}} and the call to 
> {{StreamSession#maybeCompleted()}}. I.e., the following could happen:
> 1) Node A sends {{PrepareSynAckMessage}} from the {{prepareAsync()}} thread.
> 2) Node B receives it and starts streaming.
> 3) Node A receives the streamed file and sends {{ReceivedMessage}}.
> 4) At this point, if this was the only file to stream, both nodes are ready 
> to close the session via {{maybeCompleted()}}, but:
> a) Node A will call it twice from both the IO thread and the thread at #1, 
> closing the session and its channels.
> b) Node B will attempt to send a {{CompleteMessage}}, but will fail because 
> the session has been closed in the meantime.
> There are other subtle variations of the pattern above, depending on the 
> order of concurrently sent/received messages.
> I believe the best fix would be to modify the message exchange so that:
> 1) Only the "follower" is allowed to send the {{CompleteMessage}}.
> 2) Only the "initiator" is allowed to close the session and its channels 
> after receiving the {{CompleteMessage}}.
> By doing so, the message exchange logic would be easier to reason about, 
> which is overall a win anyway.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15666) Race condition when completing stream sessions

2020-04-12 Thread ZhaoYang (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhaoYang updated CASSANDRA-15666:
-
Test and Documentation Plan: 
Added interceptor to verify stream messages and state transition.

 CI: [https://circleci.com/workflow-run/deca618c-1f9c-4e3b-b4f1-f0fb1a8aac7f]

  dtest failure "repair_test.py" is fixed in 
[https://github.com/apache/cassandra-dtest/pull/63]

  was:
Added interceptor to verify stream messages and state transition.

 CI: [https://circleci.com/workflow-run/deca618c-1f9c-4e3b-b4f1-f0fb1a8aac7f]


  dtest failure "repair_test.py" is fixed in 


> Race condition when completing stream sessions
> --
>
> Key: CASSANDRA-15666
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15666
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Streaming and Messaging
>Reporter: Sergio Bossa
>Assignee: ZhaoYang
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{StreamSession#prepareAsync()}} executes, as the name implies, 
> asynchronously from the IO thread: this opens up for race conditions between 
> the sending of the {{PrepareSynAckMessage}} and the call to 
> {{StreamSession#maybeCompleted()}}. I.e., the following could happen:
> 1) Node A sends {{PrepareSynAckMessage}} from the {{prepareAsync()}} thread.
> 2) Node B receives it and starts streaming.
> 3) Node A receives the streamed file and sends {{ReceivedMessage}}.
> 4) At this point, if this was the only file to stream, both nodes are ready 
> to close the session via {{maybeCompleted()}}, but:
> a) Node A will call it twice from both the IO thread and the thread at #1, 
> closing the session and its channels.
> b) Node B will attempt to send a {{CompleteMessage}}, but will fail because 
> the session has been closed in the meantime.
> There are other subtle variations of the pattern above, depending on the 
> order of concurrently sent/received messages.
> I believe the best fix would be to modify the message exchange so that:
> 1) Only the "follower" is allowed to send the {{CompleteMessage}}.
> 2) Only the "initiator" is allowed to close the session and its channels 
> after receiving the {{CompleteMessage}}.
> By doing so, the message exchange logic would be easier to reason about, 
> which is overall a win anyway.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15666) Race condition when completing stream sessions

2020-04-12 Thread ZhaoYang (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhaoYang updated CASSANDRA-15666:
-
Test and Documentation Plan: 
Added interceptor to verify stream messages and state transition.

 CI: [https://circleci.com/workflow-run/deca618c-1f9c-4e3b-b4f1-f0fb1a8aac7f]


  dtest failure "repair_test.py" is fixed in 

  was:
Added interceptor to verify stream messages and state transition.

 CI pending: 
[https://circleci.com/workflow-run/54fa1412-65ec-4580-9f62-a81eaa0e6ec2]
 


> Race condition when completing stream sessions
> --
>
> Key: CASSANDRA-15666
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15666
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Streaming and Messaging
>Reporter: Sergio Bossa
>Assignee: ZhaoYang
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{StreamSession#prepareAsync()}} executes, as the name implies, 
> asynchronously from the IO thread: this opens up for race conditions between 
> the sending of the {{PrepareSynAckMessage}} and the call to 
> {{StreamSession#maybeCompleted()}}. I.e., the following could happen:
> 1) Node A sends {{PrepareSynAckMessage}} from the {{prepareAsync()}} thread.
> 2) Node B receives it and starts streaming.
> 3) Node A receives the streamed file and sends {{ReceivedMessage}}.
> 4) At this point, if this was the only file to stream, both nodes are ready 
> to close the session via {{maybeCompleted()}}, but:
> a) Node A will call it twice from both the IO thread and the thread at #1, 
> closing the session and its channels.
> b) Node B will attempt to send a {{CompleteMessage}}, but will fail because 
> the session has been closed in the meantime.
> There are other subtle variations of the pattern above, depending on the 
> order of concurrently sent/received messages.
> I believe the best fix would be to modify the message exchange so that:
> 1) Only the "follower" is allowed to send the {{CompleteMessage}}.
> 2) Only the "initiator" is allowed to close the session and its channels 
> after receiving the {{CompleteMessage}}.
> By doing so, the message exchange logic would be easier to reason about, 
> which is overall a win anyway.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15666) Race condition when completing stream sessions

2020-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated CASSANDRA-15666:
---
Labels: pull-request-available  (was: )

> Race condition when completing stream sessions
> --
>
> Key: CASSANDRA-15666
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15666
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Streaming and Messaging
>Reporter: Sergio Bossa
>Assignee: ZhaoYang
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0
>
>
> {{StreamSession#prepareAsync()}} executes, as the name implies, 
> asynchronously from the IO thread: this opens up for race conditions between 
> the sending of the {{PrepareSynAckMessage}} and the call to 
> {{StreamSession#maybeCompleted()}}. I.e., the following could happen:
> 1) Node A sends {{PrepareSynAckMessage}} from the {{prepareAsync()}} thread.
> 2) Node B receives it and starts streaming.
> 3) Node A receives the streamed file and sends {{ReceivedMessage}}.
> 4) At this point, if this was the only file to stream, both nodes are ready 
> to close the session via {{maybeCompleted()}}, but:
> a) Node A will call it twice from both the IO thread and the thread at #1, 
> closing the session and its channels.
> b) Node B will attempt to send a {{CompleteMessage}}, but will fail because 
> the session has been closed in the meantime.
> There are other subtle variations of the pattern above, depending on the 
> order of concurrently sent/received messages.
> I believe the best fix would be to modify the message exchange so that:
> 1) Only the "follower" is allowed to send the {{CompleteMessage}}.
> 2) Only the "initiator" is allowed to close the session and its channels 
> after receiving the {{CompleteMessage}}.
> By doing so, the message exchange logic would be easier to reason about, 
> which is overall a win anyway.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15666) Race condition when completing stream sessions

2020-04-12 Thread ZhaoYang (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhaoYang updated CASSANDRA-15666:
-
Source Control Link: [patch|https://github.com/apache/cassandra/pull/497], 
[dtest|https://github.com/apache/cassandra-dtest/pull/63]  (was: 
https://github.com/apache/cassandra/pull/497)

> Race condition when completing stream sessions
> --
>
> Key: CASSANDRA-15666
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15666
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Streaming and Messaging
>Reporter: Sergio Bossa
>Assignee: ZhaoYang
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{StreamSession#prepareAsync()}} executes, as the name implies, 
> asynchronously from the IO thread: this opens up for race conditions between 
> the sending of the {{PrepareSynAckMessage}} and the call to 
> {{StreamSession#maybeCompleted()}}. I.e., the following could happen:
> 1) Node A sends {{PrepareSynAckMessage}} from the {{prepareAsync()}} thread.
> 2) Node B receives it and starts streaming.
> 3) Node A receives the streamed file and sends {{ReceivedMessage}}.
> 4) At this point, if this was the only file to stream, both nodes are ready 
> to close the session via {{maybeCompleted()}}, but:
> a) Node A will call it twice from both the IO thread and the thread at #1, 
> closing the session and its channels.
> b) Node B will attempt to send a {{CompleteMessage}}, but will fail because 
> the session has been closed in the meantime.
> There are other subtle variations of the pattern above, depending on the 
> order of concurrently sent/received messages.
> I believe the best fix would be to modify the message exchange so that:
> 1) Only the "follower" is allowed to send the {{CompleteMessage}}.
> 2) Only the "initiator" is allowed to close the session and its channels 
> after receiving the {{CompleteMessage}}.
> By doing so, the message exchange logic would be easier to reason about, 
> which is overall a win anyway.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org