[ 
https://issues.apache.org/jira/browse/CASSANDRA-11841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15300957#comment-15300957
 ] 

Paulo Motta commented on CASSANDRA-11841:
-----------------------------------------

Basic idea is to replace {{streaming_socket_timeout_in_ms}} with a new property 
{{streaming_keep_alive_period_in_ms}}, with default period of 5 minutes.

The incoming socket timeout is set to {{2 * streaming_keep_alive_period_in_ms}} 
so if any of the peers does not receive any data for 2 keep-alive rounds (10 
minutes with default settings), the stream session fails with 
{{SocketTimeoutException}}. 

Each stream peer keeps a scheduled task with period of 
{{streaming_keep_alive_period_in_ms}} for the duration of the stream session 
sending a new {{KeepAlive}} message to the other peer. The task is intelligent 
enough to avoid sending a new keep-alive message if the previous was not yet 
sent, to avoid accumulating keep-alive messages while the node is active 
streaming a large file.

The feature is only enabled if the peer is on version >= 3.8, so stream 
protocol remains backward compatible. Otherwise it just falls back to 
{{streaming_socket_timeout_in_ms}} (that's why we must keep it as a hidden 
property until the next stream protocol version bump).

I added [dtests|https://github.com/pauloricardomg/cassandra-dtest/tree/11841] 
to check that the stream session remains active if the transfer of a single 
file takes longer than {{streaming_keep_alive_period_in_ms}} for bootstrap and 
replace_address. I also added an mixed version test to check the feature is not 
enabled when streaming with a peer with version < 3.8.

Patch and tests available below:

||trunk||dtest||
|[branch|https://github.com/apache/cassandra/compare/trunk...pauloricardomg:11841-trunk]|[branch|https://github.com/riptano/cassandra-dtest/compare/master...pauloricardomg:11841]|
|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-11841-trunk-testall/lastCompletedBuild/testReport/]|
|[dtest|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-11841-trunk-dtest/lastCompletedBuild/testReport/]|

ps: this is built on top of CASSANDRA-11840.

> Add keep-alive to stream protocol
> ---------------------------------
>
>                 Key: CASSANDRA-11841
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11841
>             Project: Cassandra
>          Issue Type: Sub-task
>            Reporter: Paulo Motta
>            Assignee: Paulo Motta
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to