[ 
https://issues.apache.org/jira/browse/CASSANDRA-13608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tania S Engel updated CASSANDRA-13608:
--------------------------------------
    Attachment: Cassandra 3.10 Join with lots GC collection leads to socket 
closure and join hang.txt

> Connection closed/reopened during join causes Cassandra stream to close
> -----------------------------------------------------------------------
>
>                 Key: CASSANDRA-13608
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13608
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Streaming and Messaging
>         Environment: Cassandra 3.10. Windows Server 2016, 32GB ram, 2TB hard 
> disk, RAID10 with 4 spindles, 8 Cores
>            Reporter: Tania S Engel
>             Fix For: 3.10
>
>         Attachments: Cassandra 3.10 Join with lots GC collection leads to 
> socket closure and join hang.mht, Cassandra 3.10 Join with lots GC collection 
> leads to socket closure and join hang.pdf, Cassandra 3.10 Join with lots GC 
> collection leads to socket closure and join hang.txt
>
>
> We start a JOIN bootstrap. Primary seed node streams to the replica. The 
> replica requires some GC cleanup and experiences frequent pauses including a 
> 12 second old gen cleanup following a memTable flush. Both replica and 
> primary show _MessagingService IOException: An existing connection was 
> forcibly closed by the remote host_. The replica MessagingService-Outgoing 
> reestablishes the connection immediately but the primary 
> StreamKeepAliveExecutor throws a _java.RuntimeException: Outgoing stream 
> handler has been closed_. >From that point forward, the replica stays in JOIN 
> mode, sending keeping alive to the primary. The primary receives the keep 
> alive, but does not send its own and it repeatedly fails to send a hints file 
> to the replica. It seems this limping condition would continue indefinitely, 
> but stops as we stop the replica Cassandra. If we restart the replica 
> Cassandra the JOIN picks up again but fails with _java.io.IOException: 
> Corrupt value length 355151036 encountered, as it exceeds the maximum of 
> 268435456, which is set via max_value_size_in_mb in cassandra.yaml_. We have 
> not increased this value as we do not have values that large in our data so 
> we presume it is indeed corrupt and moving past it would not be a good idea. 
> Please see the attachment for details.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to