[ https://issues.apache.org/jira/browse/CASSANDRA-11841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15300957#comment-15300957 ]
Paulo Motta commented on CASSANDRA-11841: ----------------------------------------- Basic idea is to replace {{streaming_socket_timeout_in_ms}} with a new property {{streaming_keep_alive_period_in_ms}}, with default period of 5 minutes. The incoming socket timeout is set to {{2 * streaming_keep_alive_period_in_ms}} so if any of the peers does not receive any data for 2 keep-alive rounds (10 minutes with default settings), the stream session fails with {{SocketTimeoutException}}. Each stream peer keeps a scheduled task with period of {{streaming_keep_alive_period_in_ms}} for the duration of the stream session sending a new {{KeepAlive}} message to the other peer. The task is intelligent enough to avoid sending a new keep-alive message if the previous was not yet sent, to avoid accumulating keep-alive messages while the node is active streaming a large file. The feature is only enabled if the peer is on version >= 3.8, so stream protocol remains backward compatible. Otherwise it just falls back to {{streaming_socket_timeout_in_ms}} (that's why we must keep it as a hidden property until the next stream protocol version bump). I added [dtests|https://github.com/pauloricardomg/cassandra-dtest/tree/11841] to check that the stream session remains active if the transfer of a single file takes longer than {{streaming_keep_alive_period_in_ms}} for bootstrap and replace_address. I also added an mixed version test to check the feature is not enabled when streaming with a peer with version < 3.8. Patch and tests available below: ||trunk||dtest|| |[branch|https://github.com/apache/cassandra/compare/trunk...pauloricardomg:11841-trunk]|[branch|https://github.com/riptano/cassandra-dtest/compare/master...pauloricardomg:11841]| |[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-11841-trunk-testall/lastCompletedBuild/testReport/]| |[dtest|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-11841-trunk-dtest/lastCompletedBuild/testReport/]| ps: this is built on top of CASSANDRA-11840. > Add keep-alive to stream protocol > --------------------------------- > > Key: CASSANDRA-11841 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11841 > Project: Cassandra > Issue Type: Sub-task > Reporter: Paulo Motta > Assignee: Paulo Motta > -- This message was sent by Atlassian JIRA (v6.3.4#6332)