[ https://issues.apache.org/jira/browse/CASSANDRA-19187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17797166#comment-17797166 ]
Brandon Williams commented on CASSANDRA-19187: ---------------------------------------------- ||Branch||CI|| |[4.0|https://github.com/driftx/cassandra/tree/CASSANDRA-19187-4.0]|[j8|https://app.circleci.com/pipelines/github/driftx/cassandra/1417/workflows/df7032b2-c33b-4960-936e-fa817131de34], [j11|https://app.circleci.com/pipelines/github/driftx/cassandra/1417/workflows/d42eb7de-0b45-492c-ab20-7143c4e33537]| |[4.1|https://github.com/driftx/cassandra/tree/CASSANDRA-19187-4.1]|[j8|https://app.circleci.com/pipelines/github/driftx/cassandra/1419/workflows/11ffc188-d516-4287-a78d-e213e38eb055], [j11|https://app.circleci.com/pipelines/github/driftx/cassandra/1419/workflows/a75fa383-0e9c-429a-ba52-962e04bf6cfc]| |[5.0|https://github.com/driftx/cassandra/tree/CASSANDRA-19187-5.0]|[j11|https://app.circleci.com/pipelines/github/driftx/cassandra/1418/workflows/1041ab90-3cf4-46a4-9062-6cc1d3d20741], [j17|https://app.circleci.com/pipelines/github/driftx/cassandra/1418/workflows/5d108993-69f8-4bb8-a616-93a1d904ff1a]| > nodetool assassinate may cause thread serialization for that node > ----------------------------------------------------------------- > > Key: CASSANDRA-19187 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19187 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Membership > Reporter: Runtian Liu > Assignee: Runtian Liu > Priority: Normal > Fix For: 4.0.x, 4.1.x, 5.0.x > > > When assassinate an ip address that is not in the gossip map, a "corrupted" > entry will be inserted into the gossip map. > [(1)|https://github.com/apache/cassandra/blob/cassandra-4.0/src/java/org/apache/cassandra/gms/Gossiper.java#L810] > For example, if we do "nodetool assassinate 10.1.1.1" > we will get an entry like below by running "nodetool gossipinfo": > > {code:java} > /10.1.1.1 > generation:1702006511 > heartbeat:9999 > STATUS:209516:LEFT,-8393921141401589197,1702265651923 > STATUS_WITH_PORT:209515:LEFT,-8393921141401589197,1702265651923 > TOKENS: not present {code} > > This entry in endpointStateMap will cause issue for > [isUpgradingFromVersionLowerThan|https://github.com/apache/cassandra/blob/cassandra-4.0/src/java/org/apache/cassandra/gms/Gossiper.java#L2284] > function. Because the > [upgradeFromVersionSupplier|https://github.com/apache/cassandra/blob/cassandra-4.0/src/java/org/apache/cassandra/gms/Gossiper.java#L191] > supplier will always set the > [allHostsHaveKnownVersion|https://github.com/apache/cassandra/blob/cassandra-4.0/src/java/org/apache/cassandra/gms/Gossiper.java#L216] > flag to false so no memoized value will be returned. The "get" function will > always require a lock from this > [line|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/utils/ExpiringMemoizingSupplier.java#L66]. > If application is using "fetchAll", the native-transport-requests thread will > hit this > [line|https://github.com/apache/cassandra/blob/cassandra-4.0/src/java/org/apache/cassandra/db/filter/ColumnFilter.java#L574]. > This means all the native-transport-requests thread is serialized, also, the > lock is shared by GossipStage threads. It means if a node in a cluster with > the corrupted gossip map is restart, the node will run into this problem. > To fix the issue, > # Why we want to add a dummy entry for nodetool assassinate if the endpoint > is not in the map anymore. Should we do nothing or throw exception if the > node is not in the gossip map anymore? > # Before checking if a version is null, we should make sure the node is not > a dead node. A decommissioned node, a left node should not be considered part > of the cluster anymore when calculating "upgradeInProgressPossible" > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org