[
https://issues.apache.org/jira/browse/CASSANDRA-19983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17891003#comment-17891003
]
Brandon Williams edited comment on CASSANDRA-19983 at 10/18/24 6:31 PM:
------------------------------------------------------------------------
Agree with the minor nits Mick raised, but overall I'm fond of this solution
since it solves the problem with very minimal risk - a node sending its full
info is not new, just less common than before this patch. I did a minor
backport for 4.0 and 5.0 applied cleanly, let's see how CI fares:
||Branch||CI||
|[4.0|https://github.com/driftx/cassandra/tree/CASSANDRA-19983-4.0]|[j8|https://app.circleci.com/pipelines/github/driftx/cassandra/1742/workflows/f8cc002e-1d78-4c6b-b226-a2a06d475b63],
[j11|https://app.circleci.com/pipelines/github/driftx/cassandra/1742/workflows/9f2f3b54-9f40-44b0-b819-637dd6b6838c]|
|[4.1|https://github.com/driftx/cassandra/tree/CASSANDRA-19983-4.1]|[j8|https://app.circleci.com/pipelines/github/driftx/cassandra/1741/workflows/2d93e465-3b53-48fa-bbe8-eafae5b94d8f],
[j11|https://app.circleci.com/pipelines/github/driftx/cassandra/1741/workflows/60a072b1-4be8-4a17-bede-8f67af0dec11]|
|[5.0|https://github.com/driftx/cassandra/tree/CASSANDRA-19983-5.0]|[j11|https://app.circleci.com/pipelines/github/driftx/cassandra/1744/workflows/7c7086cb-86f4-4527-bf6d-1d674985a1f9],
[j17|https://app.circleci.com/pipelines/github/driftx/cassandra/1744/workflows/2047971f-0a52-459b-b7cc-ce915fb8457e]|
was (Author: brandon.williams):
Agree with the minor nits Mick raised, but overall I'm fond of this solution
since it solves the problem with very minimal risk - a node sending its full
info is not new, just less common than before this patch. I did a minor
backport for 4.0 and 5.0 applied cleanly, let's see how CI fares:
||Branch||CI||
|[4.0|https://github.com/driftx/cassandra/tree/CASSANDRA-19983-4.0]|[j8|https://app.circleci.com/pipelines/github/driftx/cassandra/1742/workflows/f8cc002e-1d78-4c6b-b226-a2a06d475b63],
[j11|https://app.circleci.com/pipelines/github/driftx/cassandra/1742/workflows/9f2f3b54-9f40-44b0-b819-637dd6b6838c]|
|[4.1|https://github.com/driftx/cassandra/tree/CASSANDRA-19983-4.1]|[j8|https://app.circleci.com/pipelines/github/driftx/cassandra/1741/workflows/2d93e465-3b53-48fa-bbe8-eafae5b94d8f],
[j11|https://app.circleci.com/pipelines/github/driftx/cassandra/1741/workflows/60a072b1-4be8-4a17-bede-8f67af0dec11]|
|[5.0|https://github.com/driftx/cassandra/tree/CASSANDRA-19983-5.0]|[j11|https://app.circleci.com/pipelines/github/driftx/cassandra/1743/workflows/07324810-ecc0-4438-be2e-f0f6a50b4dd0],
[j17|https://app.circleci.com/pipelines/github/driftx/cassandra/1743/workflows/00f43518-da6c-4613-afd6-f36cb5b7f8a3]|
> Cassandra gossip issue for large cluster
> ----------------------------------------
>
> Key: CASSANDRA-19983
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19983
> Project: Cassandra
> Issue Type: Bug
> Components: Cluster/Gossip
> Reporter: Runtian Liu
> Assignee: Runtian Liu
> Priority: Urgent
> Fix For: 4.0.x, 4.1.x, 5.0.x
>
>
> When adding a new node to a cluster, we see a lot of nodes reporting below
> error:
> {code:java}
> java.lang.NullPointerException: null at
> o.a.cassandra.gms.Gossiper.getHostId(Gossiper.java:1378) at
> o.a.cassandra.gms.Gossiper.getHostId(Gossiper.java:1373) at
> o.a.c.service.StorageService.handleStateBootstrap(StorageService.java:3088)
> at o.a.c.service.StorageService.onChange(StorageService.java:2783) at
> o.a.cassandra.gms.Gossiper.doOnChangeNotifications(Gossiper.java:1851) at
> o.a.cassandra.gms.Gossiper.applyNewStates(Gossiper.java:1816) at
> o.a.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:1749) at
> o.a.c.g.GossipDigestAckVerbHandler.doVerb(GossipDigestAckVerbHandler.java:81)
> at o.a.cassandra.net.InboundSink.lambda$new$0(InboundSink.java:79) at
> o.a.cassandra.net.InboundSink.accept(InboundSink.java:98) at
> o.a.cassandra.net.InboundSink.accept(InboundSink.java:46) at
> o.a.c.n.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:430)
> at o.a.c.c.ExecutionFailure$1.run(ExecutionFailure.java:133) at
> j.u.c.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at
> j.u.c.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at
> i.n.u.c.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at
> java.lang.Thread.run(Thread.java:829){code}
> After some investigation of this issue, the existing nodes of the cluster
> have removed the new node as a fat client. The reason for this is the new
> node is busy with gossip and the gossip queue has a lot of task piling up.
> The gossip state for the new node on the existing host is:
>
>
> {code:java}
> /1.1.1.1
> generation:1727479926
> heartbeat:25
> LOAD:20:31174.0
> SCHEMA:16:59adb24e-f3cd-3e02-97f0-5b395827453f
> DC:12:dc1
> RACK:14:0
> RELEASE_VERSION:5:4.1.3
> NET_VERSION:1:12
> HOST_ID:2:b9cc4587-68f5-4bb6-a933-fd0c77a064dc
> INTERNAL_ADDRESS_AND_PORT:8:1.1.1.1:7000
> NATIVE_ADDRESS_AND_PORT:3:1.1.1.1:9042
> SSTABLE_VERSIONS:6:big-nb
> TOKENS: not present {code}
> Later this endpoint is removed from gossip endpointstate map because it is
> treated as a fat client.
>
>
> {code:java}
> FatClient /1.1.1.1:7000 has been silent for 30000ms, removing from gossip
> {code}
> But before it is removed from gossip, the node may have send gossip sync
> message to the new node asking for gossip info for this new node with
> heartbeat version larger than 20 in this example.
>
> The new node gossip queue has too many task to be processed, so it cannot
> process this request immediately. When it send the gossip ack request back to
> the existing node, the node has removed the gossip info about the new node.
> So the gossip will look like below on some existing node:
> {code:java}
> /1.1.1.1
> generation:1727479926
> heartbeat:229
> LOAD:200:3.0
> SCHEMA:203:59adb24e-f3cd-3e02-97f0-5b395827453f {code}
> All the information relate to DC/Rack/Host ID is gone.
> When the new node later get gossip settled and modified the local state as
> BOOT and decided its token. The existing node will receive the STATUS and
> TOKEN info, then the gossip state will become:
> {code:java}
> /1.1.1.1
> generation:1727479926
> heartbeat:329
> LOAD:300:3.0
> SCHEMA:303:59adb24e-f3cd-3e02-97f0-5b395827453f
> STATUS_WITH_PORT:308:BOOT,-142070360466566106
> TOKENS:309:<hidden>{code}
> When the existing node process this bootstrap event, we will see the NPE due
> to host_id missing.
> This issue will create consistency problem because for large clusters, a lot
> of nodes will consider the joining nodes a remote DC nodes if DC info is
> missing.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]