[ 
https://issues.apache.org/jira/browse/CASSANDRA-19187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17803731#comment-17803731
 ] 

Brandon Williams edited comment on CASSANDRA-19187 at 1/6/24 2:34 AM:
----------------------------------------------------------------------

||Branch||CI||
|[4.0|https://github.com/driftx/cassandra/tree/CASSANDRA-19187-test2]|[repeated|https://app.circleci.com/pipelines/github/driftx/cassandra/1450/workflows/c84aa22d-5bad-4074-a0d7-5811412ca379/jobs/70537]|
|[4.1|https://github.com/driftx/cassandra/tree/CASSANDRA-19187-4.1]|[repeated|https://app.circleci.com/pipelines/github/driftx/cassandra/1452/workflows/ca3174ea-f163-4aef-bc48-d3da4081b8a8/jobs/70748]|
|[5.0|https://github.com/driftx/cassandra/tree/CASSANDRA-19187-5.0]|[repeated|https://app.circleci.com/pipelines/github/driftx/cassandra/1451/workflows/0629e608-44b2-44c6-aace-0c63e198bb7c/jobs/70749]|

Everything passed, +1 from me again.


was (Author: brandon.williams):
||Branch||CI||
|[4.0|https://github.com/driftx/cassandra/tree/CASSANDRA-19187-test2]|[repeated|https://app.circleci.com/pipelines/github/driftx/cassandra/1450/workflows/c84aa22d-5bad-4074-a0d7-5811412ca379/jobs/70537]|
|[4.1|https://github.com/driftx/cassandra/tree/CASSANDRA-19187-4.1]|[repeated|https://app.circleci.com/pipelines/github/driftx/cassandra/1452/workflows/ca3174ea-f163-4aef-bc48-d3da4081b8a8]|
|[5.0|https://github.com/driftx/cassandra/tree/CASSANDRA-19187-5.0]|[repeated|https://app.circleci.com/pipelines/github/driftx/cassandra/1451/workflows/0629e608-44b2-44c6-aace-0c63e198bb7c]|

Everything passed, +1 from me again.

> nodetool assassinate may cause thread serialization for that node
> -----------------------------------------------------------------
>
>                 Key: CASSANDRA-19187
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-19187
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Cluster/Membership
>            Reporter: Runtian Liu
>            Assignee: Runtian Liu
>            Priority: Normal
>             Fix For: 4.0.x, 4.1.x, 5.0.x
>
>
> When assassinate an ip address that is not in the gossip map, a "corrupted" 
> entry will be inserted into the gossip map. 
> [(1)|https://github.com/apache/cassandra/blob/cassandra-4.0/src/java/org/apache/cassandra/gms/Gossiper.java#L810]
> For example, if we do "nodetool assassinate 10.1.1.1"
> we will get an entry like below by running "nodetool gossipinfo":
>  
> {code:java}
> /10.1.1.1
>   generation:1702006511
>   heartbeat:9999
>   STATUS:209516:LEFT,-8393921141401589197,1702265651923
>   STATUS_WITH_PORT:209515:LEFT,-8393921141401589197,1702265651923
>   TOKENS: not present {code}
>  
> This entry in endpointStateMap will cause issue for 
> [isUpgradingFromVersionLowerThan|https://github.com/apache/cassandra/blob/cassandra-4.0/src/java/org/apache/cassandra/gms/Gossiper.java#L2284]
>  function. Because the 
> [upgradeFromVersionSupplier|https://github.com/apache/cassandra/blob/cassandra-4.0/src/java/org/apache/cassandra/gms/Gossiper.java#L191]
>  supplier will always set the 
> [allHostsHaveKnownVersion|https://github.com/apache/cassandra/blob/cassandra-4.0/src/java/org/apache/cassandra/gms/Gossiper.java#L216]
>  flag to false so no memoized value will be returned. The "get" function will 
> always require a lock from this 
> [line|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/utils/ExpiringMemoizingSupplier.java#L66].
> If application is using "fetchAll", the native-transport-requests thread will 
> hit this 
> [line|https://github.com/apache/cassandra/blob/cassandra-4.0/src/java/org/apache/cassandra/db/filter/ColumnFilter.java#L574].
>  This means all the native-transport-requests thread is serialized, also, the 
> lock is shared by GossipStage threads. It means if a node in a cluster with 
> the corrupted gossip map is restart, the node will run into this problem.
> To fix the issue,
>  # Why we want to add a dummy entry for nodetool assassinate if the endpoint 
> is not in the map anymore. Should we do nothing or throw exception if the 
> node is not in the gossip map anymore?
>  # Before checking if a version is null, we should make sure the node is not 
> a dead node. A decommissioned node, a left node should not be considered part 
> of the cluster anymore when calculating "upgradeInProgressPossible"
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to