[jira] [Comment Edited] (CASSANDRA-19983) Cassandra gossip issue for large cluster

Brandon Williams (Jira) Fri, 18 Oct 2024 11:51:04 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-19983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17891003#comment-17891003
 ]


Brandon Williams edited comment on CASSANDRA-19983 at 10/18/24 6:31 PM:
------------------------------------------------------------------------

Agree with the minor nits Mick raised, but overall I'm fond of this solution 
since it solves the problem with very minimal risk - a node sending its full 
info is not new, just less common than before this patch.  I did a minor 
backport for 4.0 and 5.0 applied cleanly, let's see how CI fares:

||Branch||CI||
|[4.0|https://github.com/driftx/cassandra/tree/CASSANDRA-19983-4.0]|[j8|https://app.circleci.com/pipelines/github/driftx/cassandra/1742/workflows/f8cc002e-1d78-4c6b-b226-a2a06d475b63],
 
[j11|https://app.circleci.com/pipelines/github/driftx/cassandra/1742/workflows/9f2f3b54-9f40-44b0-b819-637dd6b6838c]|
|[4.1|https://github.com/driftx/cassandra/tree/CASSANDRA-19983-4.1]|[j8|https://app.circleci.com/pipelines/github/driftx/cassandra/1741/workflows/2d93e465-3b53-48fa-bbe8-eafae5b94d8f],
 
[j11|https://app.circleci.com/pipelines/github/driftx/cassandra/1741/workflows/60a072b1-4be8-4a17-bede-8f67af0dec11]|
|[5.0|https://github.com/driftx/cassandra/tree/CASSANDRA-19983-5.0]|[j11|https://app.circleci.com/pipelines/github/driftx/cassandra/1744/workflows/7c7086cb-86f4-4527-bf6d-1d674985a1f9],
 
[j17|https://app.circleci.com/pipelines/github/driftx/cassandra/1744/workflows/2047971f-0a52-459b-b7cc-ce915fb8457e]|



was (Author: brandon.williams):
Agree with the minor nits Mick raised, but overall I'm fond of this solution 
since it solves the problem with very minimal risk - a node sending its full 
info is not new, just less common than before this patch.  I did a minor 
backport for 4.0 and 5.0 applied cleanly, let's see how CI fares:

||Branch||CI||
|[4.0|https://github.com/driftx/cassandra/tree/CASSANDRA-19983-4.0]|[j8|https://app.circleci.com/pipelines/github/driftx/cassandra/1742/workflows/f8cc002e-1d78-4c6b-b226-a2a06d475b63],
 
[j11|https://app.circleci.com/pipelines/github/driftx/cassandra/1742/workflows/9f2f3b54-9f40-44b0-b819-637dd6b6838c]|
|[4.1|https://github.com/driftx/cassandra/tree/CASSANDRA-19983-4.1]|[j8|https://app.circleci.com/pipelines/github/driftx/cassandra/1741/workflows/2d93e465-3b53-48fa-bbe8-eafae5b94d8f],
 
[j11|https://app.circleci.com/pipelines/github/driftx/cassandra/1741/workflows/60a072b1-4be8-4a17-bede-8f67af0dec11]|
|[5.0|https://github.com/driftx/cassandra/tree/CASSANDRA-19983-5.0]|[j11|https://app.circleci.com/pipelines/github/driftx/cassandra/1743/workflows/07324810-ecc0-4438-be2e-f0f6a50b4dd0],
 
[j17|https://app.circleci.com/pipelines/github/driftx/cassandra/1743/workflows/00f43518-da6c-4613-afd6-f36cb5b7f8a3]|


> Cassandra gossip issue for large cluster
> ----------------------------------------
>
>                 Key: CASSANDRA-19983
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-19983
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Cluster/Gossip
>            Reporter: Runtian Liu
>            Assignee: Runtian Liu
>            Priority: Urgent
>             Fix For: 4.0.x, 4.1.x, 5.0.x
>
>
> When adding a new node to a cluster, we see a lot of nodes reporting below 
> error:
> {code:java}
> java.lang.NullPointerException: null at 
> o.a.cassandra.gms.Gossiper.getHostId(Gossiper.java:1378) at 
> o.a.cassandra.gms.Gossiper.getHostId(Gossiper.java:1373) at 
> o.a.c.service.StorageService.handleStateBootstrap(StorageService.java:3088) 
> at o.a.c.service.StorageService.onChange(StorageService.java:2783) at 
> o.a.cassandra.gms.Gossiper.doOnChangeNotifications(Gossiper.java:1851) at 
> o.a.cassandra.gms.Gossiper.applyNewStates(Gossiper.java:1816) at 
> o.a.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:1749) at 
> o.a.c.g.GossipDigestAckVerbHandler.doVerb(GossipDigestAckVerbHandler.java:81) 
> at o.a.cassandra.net.InboundSink.lambda$new$0(InboundSink.java:79) at 
> o.a.cassandra.net.InboundSink.accept(InboundSink.java:98) at 
> o.a.cassandra.net.InboundSink.accept(InboundSink.java:46) at 
> o.a.c.n.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:430)
>  at o.a.c.c.ExecutionFailure$1.run(ExecutionFailure.java:133) at 
> j.u.c.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at 
> j.u.c.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at 
> i.n.u.c.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at 
> java.lang.Thread.run(Thread.java:829){code}
> After some investigation of this issue, the existing nodes of the cluster 
> have removed the new node as a fat client. The reason for this is the new 
> node is busy with gossip and the gossip queue has a lot of task piling up. 
> The gossip state for the new node on the existing host is:
>  
>  
> {code:java}
> /1.1.1.1
>   generation:1727479926
>   heartbeat:25
>   LOAD:20:31174.0
>   SCHEMA:16:59adb24e-f3cd-3e02-97f0-5b395827453f
>   DC:12:dc1
>   RACK:14:0
>   RELEASE_VERSION:5:4.1.3
>   NET_VERSION:1:12
>   HOST_ID:2:b9cc4587-68f5-4bb6-a933-fd0c77a064dc
>   INTERNAL_ADDRESS_AND_PORT:8:1.1.1.1:7000
>   NATIVE_ADDRESS_AND_PORT:3:1.1.1.1:9042
>   SSTABLE_VERSIONS:6:big-nb
>   TOKENS: not present {code}
> Later this endpoint is removed from gossip endpointstate map because it is 
> treated as a fat client.
>  
>  
> {code:java}
> FatClient /1.1.1.1:7000 has been silent for 30000ms, removing from gossip 
> {code}
> But before it is removed from gossip, the node may have send gossip sync 
> message to the new node asking for gossip info for this new node with 
> heartbeat version larger than 20 in this example.
>  
> The new node gossip queue has too many task to be processed, so it cannot 
> process this request immediately. When it send the gossip ack request back to 
> the existing node, the node has removed the gossip info about the new node. 
> So the gossip will look like below on some existing node:
> {code:java}
> /1.1.1.1 
>   generation:1727479926 
>   heartbeat:229 
>   LOAD:200:3.0 
>   SCHEMA:203:59adb24e-f3cd-3e02-97f0-5b395827453f {code}
> All the information relate to DC/Rack/Host ID is gone.
> When the new node later get gossip settled and modified the local state as 
> BOOT and decided its token. The existing node will receive the STATUS and 
> TOKEN info, then the gossip state will become:
> {code:java}
> /1.1.1.1 
>   generation:1727479926 
>   heartbeat:329 
>   LOAD:300:3.0 
>   SCHEMA:303:59adb24e-f3cd-3e02-97f0-5b395827453f
>   STATUS_WITH_PORT:308:BOOT,-142070360466566106
>   TOKENS:309:<hidden>{code}
> When the existing node process this bootstrap event, we will see the NPE due 
> to host_id missing.
> This issue will create consistency problem because for large clusters, a lot 
> of nodes will consider the joining nodes a remote DC nodes if DC info is 
> missing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (CASSANDRA-19983) Cassandra gossip issue for large cluster

Reply via email to