[ 
https://issues.apache.org/jira/browse/CASSANDRA-14930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644327#comment-17644327
 ] 

Stefan Miklosovic commented on CASSANDRA-14930:
-----------------------------------------------

I went through the PR and the problem I see is that it does not work for nodes 
which stay in the ring. If I have 3 nodes and I decommission one of them, the 
nodes to stay will see that node as unreachable. The current logic checks this 
(1) but "!unreachableEndpoints.contains(ep)" will be evaluated as false, 
because that endpoint is put back among unreachables here (2). So it will not 
call that "destroyConnectionPool" method and it logs that "Not destroying 
messaging connection to xyz due to endpoint starting to gossip again" which is 
obviously not true.

I am not completely sure how to go around this, maybe we could just leave that 
unreachable.contains() check out?

Branch for 3.0 with my so-far changes is here (3). I fixed one possible NPE, 
you ll see that.

(1) 
https://github.com/jasonstack/cassandra/blob/994b46b6882d7847f2da839968f52dbadb57fe1e/src/java/org/apache/cassandra/gms/Gossiper.java#L441
(2) 
https://github.com/jasonstack/cassandra/blob/994b46b6882d7847f2da839968f52dbadb57fe1e/src/java/org/apache/cassandra/gms/Gossiper.java#L1044
(3) https://github.com/instaclustr/cassandra/tree/CASSANDRA-14930

> decommission may cause timeout because messaging backlog is cleared 
> --------------------------------------------------------------------
>
>                 Key: CASSANDRA-14930
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14930
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Legacy/Coordination, Legacy/Core
>            Reporter: Zhao Yang
>            Assignee: Stefan Miklosovic
>            Priority: Normal
>             Fix For: 3.0.x, 3.11.x
>
>
> On a 3-node cluster with RF=2, decommissioning a node may cause quorum write 
> timeout because messaging backlog to decommissioned node is cleared via 
> {{Gossiper#removeEndpoint() -> OutboundTcpConnection#closeSocket()}}.
>  (Timeout is less likely to happen with RF=3, because we can afford one less 
> response)
> {code:java}
> What happened:
> 1. [WriteStage] before the leaving node is removed from tokenmetadata, the 
> write endpoints are generated ( leaving endpoint is included )
> 2. [GossipStage] the leaving node is removed from tokenmetadata, no more 
> future write handler will include leaving endpoints
> 3. [WriteStage] write handlers sends messages to messaging-service backlog
> 4. [GossipStage] messaging-service backlog is cleared, messages are not sent 
> and connection closed
> 5. [WriteStage] write time out
>  {code}
> |patch|
> |[3.0|https://github.com/jasonstack/cassandra/commits/decommission_timeout_3.0]|
> |[3.11|https://github.com/jasonstack/cassandra/commits/decommission_timeout_3.11]|
> We can avoid it by delaying to destroy messaging connection so that messages 
> are sent and responded. This patch also avoids reopening already closed 
> connection on {{MessagingService#convict()}}.
>  New messaging framework rewrite in {{Trunk}} avoids the issues by not 
> clearing messaging backlog.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to