[
https://issues.apache.org/jira/browse/CASSANDRA-19983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17887397#comment-17887397
]
Brandon Williams commented on CASSANDRA-19983:
----------------------------------------------
After CASSANDRA-15439 you can set the fat client timeout explicitly which may
help this.
> Cassandra gossip issue for large cluster
> ----------------------------------------
>
> Key: CASSANDRA-19983
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19983
> Project: Cassandra
> Issue Type: Bug
> Components: Cluster/Gossip
> Reporter: Runtian Liu
> Priority: Normal
>
> When adding a new node to a cluster, we see a lot of nodes reporting below
> error:
> {code:java}
> java.lang.NullPointerException: null at
> o.a.cassandra.gms.Gossiper.getHostId(Gossiper.java:1378) at
> o.a.cassandra.gms.Gossiper.getHostId(Gossiper.java:1373) at
> o.a.c.service.StorageService.handleStateBootstrap(StorageService.java:3088)
> at o.a.c.service.StorageService.onChange(StorageService.java:2783) at
> o.a.cassandra.gms.Gossiper.doOnChangeNotifications(Gossiper.java:1851) at
> o.a.cassandra.gms.Gossiper.applyNewStates(Gossiper.java:1816) at
> o.a.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:1749) at
> o.a.c.g.GossipDigestAckVerbHandler.doVerb(GossipDigestAckVerbHandler.java:81)
> at o.a.cassandra.net.InboundSink.lambda$new$0(InboundSink.java:79) at
> o.a.cassandra.net.InboundSink.accept(InboundSink.java:98) at
> o.a.cassandra.net.InboundSink.accept(InboundSink.java:46) at
> o.a.c.n.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:430)
> at o.a.c.c.ExecutionFailure$1.run(ExecutionFailure.java:133) at
> j.u.c.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at
> j.u.c.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at
> i.n.u.c.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at
> java.lang.Thread.run(Thread.java:829){code}
> After some investigation of this issue, the existing nodes of the cluster
> have removed the new node as a fat client. The reason for this is the new
> node is busy with gossip and the gossip queue has a lot of task piling up.
> The gossip state for the new node on the existing host is:
>
>
> {code:java}
> /1.1.1.1
> generation:1727479926
> heartbeat:25
> LOAD:20:31174.0
> SCHEMA:16:59adb24e-f3cd-3e02-97f0-5b395827453f
> DC:12:dc1
> RACK:14:0
> RELEASE_VERSION:5:4.1.3
> NET_VERSION:1:12
> HOST_ID:2:b9cc4587-68f5-4bb6-a933-fd0c77a064dc
> INTERNAL_ADDRESS_AND_PORT:8:1.1.1.1:7000
> NATIVE_ADDRESS_AND_PORT:3:1.1.1.1:9042
> SSTABLE_VERSIONS:6:big-nb
> TOKENS: not present {code}
> Later this endpoint is removed from gossip endpointstate map because it is
> treated as a fat client.
>
>
> {code:java}
> FatClient /1.1.1.1:7000 has been silent for 30000ms, removing from gossip
> {code}
> But before it is removed from gossip, the node may have send gossip sync
> message to the new node asking for gossip info for this new node with
> heartbeat version larger than 20 in this example.
>
> The new node gossip queue has too many task to be processed, so it cannot
> process this request immediately. When it send the gossip ack request back to
> the existing node, the node has removed the gossip info about the new node.
> So the gossip will look like below on some existing node:
> {code:java}
> /1.1.1.1
> generation:1727479926
> heartbeat:229
> LOAD:200:3.0
> SCHEMA:203:59adb24e-f3cd-3e02-97f0-5b395827453f {code}
> All the information relate to DC/Rack/Host ID is gone.
> When the new node later get gossip settled and modified the local state as
> BOOT and decided its token. The existing node will receive the STATUS and
> TOKEN info, then the gossip state will become:
> {code:java}
> /1.1.1.1
> generation:1727479926
> heartbeat:329
> LOAD:300:3.0
> SCHEMA:303:59adb24e-f3cd-3e02-97f0-5b395827453f
> STATUS_WITH_PORT:308:BOOT,-142070360466566106
> TOKENS:309:<hidden>{code}
> When the existing node process this bootstrap event, we will see the NPE due
> to host_id missing.
> This issue will create consistency problem because for large clusters, a lot
> of nodes will consider the joining nodes a remote DC nodes if DC info is
> missing.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]