[ 
https://issues.apache.org/jira/browse/CASSANDRA-11841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15421792#comment-15421792
 ] 

Paulo Motta commented on CASSANDRA-11841:
-----------------------------------------

+1 to NPE fix when checking version.

Regarding the race on {{StreamingTransferTest}}, the current fix is not 
sufficient because it only schedule keep-alive on the initiator (which calls 
{{onInitializationComplete}}), what break dtests. I investigated the race and 
found out that this happens because we send the init message *after*  starting 
the connection handler thread, so the keep-alive message may be sent 
concurrently with the stream init message, breaking initialization on the 
receiver side.

In order to avoid scheduling the keep-alive test in both 
{{onInitializationComplete()}} and {{prepare()}}, I restored the previous 
approach of scheduling it on {{StreamSession.init()}} and updated 
{{ConnectionHandler}} to send the init message before starting the message 
handler thread to avoid races during initialization. Updated patch and CI runs 
available below (still running):

||trunk||dtest||
|[branch|https://github.com/apache/cassandra/compare/trunk...pauloricardomg:trunk-11841]|[branch|https://github.com/riptano/cassandra-dtest/compare/master...pauloricardomg:11841]|
|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-trunk-11841-testall/lastCompletedBuild/testReport/]|
|[dtest|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-trunk-11841-dtest/lastCompletedBuild/testReport/]|

Also rebased dtests and created 
[PR|https://github.com/riptano/cassandra-dtest/pull/1239].

During the investigation of the {{StreamingTransferTest}} race, I noticed that 
if an unchecked exception is thrown during {{IncomingStreamingConnection.run}} 
or {{MessaginService.SocketThread.run}}, the socket is not closed properly, 
what may leave hanging unclosed sockets which could be very tricky to identify, 
so I added an additional ninja commit to catch {{Throwable}} instead of 
{{IOException}} and cleanup the socket properly. Since this fix might be 
relevant for other versions, I cherry-pick this specific commit and submitted 
CI runs so it can be ninja-ed safely:

||2.2||3.0||3.9||
|[branch|https://github.com/apache/cassandra/compare/cassandra-2.2...pauloricardomg:2.2-closesocketonthrowable]|[branch|https://github.com/apache/cassandra/compare/cassandra-3.0...pauloricardomg:3.0-closesocketonthrowable]|[branch|https://github.com/apache/cassandra/compare/cassandra-3.9...pauloricardomg:3.9-closesocketonthrowable]|
|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.2-closesocketonthrowable-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.0-closesocketonthrowable-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.9-closesocketonthrowable-testall/lastCompletedBuild/testReport/]|
|[dtest|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.2-closesocketonthrowable-dtest/lastCompletedBuild/testReport/]|[dtest|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.0-closesocketonthrowable-dtest/lastCompletedBuild/testReport/]|[dtest|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.9-closesocketonthrowable-dtest/lastCompletedBuild/testReport/]|

> Add keep-alive to stream protocol
> ---------------------------------
>
>                 Key: CASSANDRA-11841
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11841
>             Project: Cassandra
>          Issue Type: Sub-task
>            Reporter: Paulo Motta
>            Assignee: Paulo Motta
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to