[ 
https://issues.apache.org/jira/browse/CASSANDRA-11854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15351418#comment-15351418
 ] 

Paulo Motta commented on CASSANDRA-11854:
-----------------------------------------

bq. In SocketThread#close() we iterate over the connections and close each one, 
this will not work now since the connections will be empty, right? 

Actually this would only be empty if there are no ongoing stream sessions, 
otherwise active stream sessions would only be cleared after the 
{{StreamResultFuture}} was completed.

bq. Should we perhaps close() the socket when we remove it from group? They 
were only closed on server shutdown before so maybe we don't need to do that?

It's not safe to close the socket when a stream session is finished (failed or 
completed), because the outgoing stream handler may still need to send queued 
{{SessionComplete}} or {{SessionFailed}} messages after the 
{{StreamResultFuture}} returns. For this reason, there may be edge cases with 
the previous approach when there are still active streaming connections not 
closed by {{MessagingService.shutdown()}}.

In order to guarantee streaming sockets are only removed from 
{{SocketThread.connections}} after the {{(Incoming|Outgoing)MessageHandler}} 
are finished, the new approach attaches the {{IncomingStreamingConnection}} to 
the {{ConnectionHandler}}, so on {{ConnectionHandler.signalCloseDone()}} the 
{{IncomingStreamingConnection}} is closed, closing the socket and removing it 
from {{SocketThread.connections}}. I also added a unit test to verify that all 
sockets are cleared from {{SocketThread.connections}} after the stream session 
finishes.

New patch and tests below:

||2.1||2.2||3.0||trunk||
|[branch|https://github.com/apache/cassandra/compare/cassandra-2.1...pauloricardomg:2.1-11854-v3]|[branch|https://github.com/apache/cassandra/compare/cassandra-2.2...pauloricardomg:2.2-11854-v3]|[branch|https://github.com/apache/cassandra/compare/cassandra-3.0...pauloricardomg:3.0-11854-v3]|[branch|https://github.com/apache/cassandra/compare/trunk...pauloricardomg:trunk-11854-v3]|
|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.1-11854-v3-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.2-11854-v3-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.0-11854-v3-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-trunk-11854-v3-testall/lastCompletedBuild/testReport/]|
|[dtest|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.1-11854-v3-dtest/lastCompletedBuild/testReport/]|[dtest|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.2-11854-v3-dtest/lastCompletedBuild/testReport/]|[dtest|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.0-11854-v3-dtest/lastCompletedBuild/testReport/]|[dtest|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-trunk-11854-v3-dtest/lastCompletedBuild/testReport/]|

commit info: minor conflicts all the way up to trunk :(

> Remove finished streaming connections from MessagingService
> -----------------------------------------------------------
>
>                 Key: CASSANDRA-11854
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11854
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Paulo Motta
>            Assignee: Paulo Motta
>         Attachments: oom.png
>
>
> When a new {{IncomingStreamingConnection}} is created, [we register it in the 
> connections 
> map|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/MessagingService.java#L1109]
>  of {{MessagingService}}, but we [only remove it if there is an 
> exception|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/IncomingStreamingConnection.java#L83]
>  while attaching the socket to the stream session.
> On nodes with SSL and large number of vnodes, after many repair sessions 
> these old connections can accumulate and cause OOM (heap dump attached).
> The connection should be removed from the connections map after if it's 
> finished in order to be garbage collected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to