[jira] [Updated] (CASSANDRA-19983) Cassandra gossip issue for large cluster

Runtian Liu (Jira) Mon, 07 Oct 2024 08:54:06 -0700


     [ 
https://issues.apache.org/jira/browse/CASSANDRA-19983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Runtian Liu updated CASSANDRA-19983:
------------------------------------
    Description: 
When adding a new node to a cluster, we see a lot of nodes reporting below 
error:
{code:java}
java.lang.NullPointerException: null at 
o.a.cassandra.gms.Gossiper.getHostId(Gossiper.java:1378) at 
o.a.cassandra.gms.Gossiper.getHostId(Gossiper.java:1373) at 
o.a.c.service.StorageService.handleStateBootstrap(StorageService.java:3088) at 
o.a.c.service.StorageService.onChange(StorageService.java:2783) at 
o.a.cassandra.gms.Gossiper.doOnChangeNotifications(Gossiper.java:1851) at 
o.a.cassandra.gms.Gossiper.applyNewStates(Gossiper.java:1816) at 
o.a.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:1749) at 
o.a.c.g.GossipDigestAckVerbHandler.doVerb(GossipDigestAckVerbHandler.java:81) 
at o.a.cassandra.net.InboundSink.lambda$new$0(InboundSink.java:79) at 
o.a.cassandra.net.InboundSink.accept(InboundSink.java:98) at 
o.a.cassandra.net.InboundSink.accept(InboundSink.java:46) at 
o.a.c.n.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:430)
 at o.a.c.c.ExecutionFailure$1.run(ExecutionFailure.java:133) at 
j.u.c.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at 
j.u.c.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at 
i.n.u.c.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at 
java.lang.Thread.run(Thread.java:829){code}
After some investigation of this issue, the existing nodes of the cluster have 
removed the new node as a fat client. The reason for this is the new node is 
busy with gossip and the gossip queue has a lot of task piling up. The gossip 
state for the new node on the existing host is:

 

 
{code:java}
/1.1.1.1
  generation:1727479926
  heartbeat:25
  LOAD:20:31174.0
  SCHEMA:16:59adb24e-f3cd-3e02-97f0-5b395827453f
  DC:12:dc1
  RACK:14:0
  RELEASE_VERSION:5:4.1.3
  NET_VERSION:1:12
  HOST_ID:2:b9cc4587-68f5-4bb6-a933-fd0c77a064dc
  INTERNAL_ADDRESS_AND_PORT:8:1.1.1.1:7000
  NATIVE_ADDRESS_AND_PORT:3:1.1.1.1:9042
  SSTABLE_VERSIONS:6:big-nb
  TOKENS: not present {code}
Later this endpoint is removed from gossip endpointstate map because it is 
treated as a fat client.

 

 
{code:java}
FatClient /1.1.1.1:7000 has been silent for 30000ms, removing from gossip {code}
But before it is removed from gossip, the node may have send gossip sync 
message to the new node asking for gossip info for this new node with heartbeat 
version larger than 20 in this example.

 

The new node gossip queue has too many task to be processed, so it cannot 
process this request immediately. When it send the gossip ack request back to 
the existing node, the node has removed the gossip info about the new node. So 
the gossip will look like below on some existing node:
{code:java}
/1.1.1.1 
  generation:1727479926 
  heartbeat:229 
  LOAD:200:3.0 
  SCHEMA:203:59adb24e-f3cd-3e02-97f0-5b395827453f {code}
All the information relate to DC/Rack/Host ID is gone.

When the new node later get gossip settled and modified the local state as BOOT 
and decided its token. The existing node will receive the STATUS and TOKEN 
info, then the gossip state will become:
{code:java}
/1.1.1.1 
  generation:1727479926 
  heartbeat:329 
  LOAD:300:3.0 
  SCHEMA:303:59adb24e-f3cd-3e02-97f0-5b395827453f
  STATUS_WITH_PORT:308:BOOT,-142070360466566106
  TOKENS:309:<hidden>{code}
When the existing node process this bootstrap event, we will see the NPE due to 
host_id missing.

This issue will create consistency problem because for large clusters, a lot of 
nodes will consider the joining nodes a remote DC nodes if DC info is missing.

  was:
When adding a new node to a cluster, we see a lot of nodes reporting below 
error:
{code:java}
java.lang.NullPointerException: null at 
o.a.cassandra.gms.Gossiper.getHostId(Gossiper.java:1378) at 
o.a.cassandra.gms.Gossiper.getHostId(Gossiper.java:1373) at 
o.a.c.service.StorageService.handleStateBootstrap(StorageService.java:3088) at 
o.a.c.service.StorageService.onChange(StorageService.java:2783) at 
o.a.cassandra.gms.Gossiper.doOnChangeNotifications(Gossiper.java:1851) at 
o.a.cassandra.gms.Gossiper.applyNewStates(Gossiper.java:1816) at 
o.a.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:1749) at 
o.a.c.g.GossipDigestAckVerbHandler.doVerb(GossipDigestAckVerbHandler.java:81) 
at o.a.cassandra.net.InboundSink.lambda$new$0(InboundSink.java:79) at 
o.a.cassandra.net.InboundSink.accept(InboundSink.java:98) at 
o.a.cassandra.net.InboundSink.accept(InboundSink.java:46) at 
o.a.c.n.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:430)
 at o.a.c.c.ExecutionFailure$1.run(ExecutionFailure.java:133) at 
j.u.c.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at 
j.u.c.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at 
i.n.u.c.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at 
java.lang.Thread.run(Thread.java:829){code}
After some investigation of this issue, the existing nodes of the cluster have 
removed the new node as a fat client. The reason for this is the new node is 
busy with gossip and the gossip queue has a lot of task piling up. The gossip 
state for the new node on the existing host is:

 

 
{code:java}
/1.1.1.1
  generation:1727479926
  heartbeat:25
  LOAD:20:31174.0
  SCHEMA:16:59adb24e-f3cd-3e02-97f0-5b395827453f
  DC:12:dc1
  RACK:14:0
  RELEASE_VERSION:5:4.1.3
  NET_VERSION:1:12
  HOST_ID:2:b9cc4587-68f5-4bb6-a933-fd0c77a064dc
  INTERNAL_ADDRESS_AND_PORT:8:1.1.1.1:7000
  NATIVE_ADDRESS_AND_PORT:3:1.1.1.1:9042
  SSTABLE_VERSIONS:6:big-nb
  TOKENS: not present {code}
Later this endpoint is removed from gossip endpointstate map because it is 
treated as a fat client.

 

 
{code:java}
FatClient /1.1.1.1:7000 has been silent for 30000ms, removing from gossip {code}
But before it is removed from gossip, the node may have send gossip sync 
message to the new node asking for gossip info for this new node with heartbeat 
version larger than 20 in this example.

 

The new node gossip queue has too many task to be processed, so it cannot 
process this request immediately. When it send the gossip ack request back to 
the existing node, the node has removed the gossip info about the new node. So 
the gossip will look like below on some existing node:
{code:java}
/1.1.1.1 
  generation:1727479926 
  heartbeat:229 
  LOAD:200:3.0 
  SCHEMA:203:59adb24e-f3cd-3e02-97f0-5b395827453f {code}
All the information relate to DC/Rack/Host ID is gone.

When the new node later get gossip settled and modified the local state as BOOT 
and decided its token. The existing node will receive the STATUS and TOKEN 
info, then the gossip state will become:
{code:java}
/1.1.1.1 
  generation:1727479926 
  heartbeat:329 
  LOAD:300:3.0 
  SCHEMA:303:59adb24e-f3cd-3e02-97f0-5b395827453f
  STATUS_WITH_PORT:308:NORMAL,-142070360466566106
  TOKENS:309:<hidden>{code}
When the existing node process this bootstrap event, we will see the NPE due to 
host_id missing.

This issue will create consistency problem because for large clusters, a lot of 
nodes will consider the joining nodes a remote DC nodes if DC info is missing.


> Cassandra gossip issue for large cluster
> ----------------------------------------
>
>                 Key: CASSANDRA-19983
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-19983
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Cluster/Gossip
>            Reporter: Runtian Liu
>            Priority: Normal
>
> When adding a new node to a cluster, we see a lot of nodes reporting below 
> error:
> {code:java}
> java.lang.NullPointerException: null at 
> o.a.cassandra.gms.Gossiper.getHostId(Gossiper.java:1378) at 
> o.a.cassandra.gms.Gossiper.getHostId(Gossiper.java:1373) at 
> o.a.c.service.StorageService.handleStateBootstrap(StorageService.java:3088) 
> at o.a.c.service.StorageService.onChange(StorageService.java:2783) at 
> o.a.cassandra.gms.Gossiper.doOnChangeNotifications(Gossiper.java:1851) at 
> o.a.cassandra.gms.Gossiper.applyNewStates(Gossiper.java:1816) at 
> o.a.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:1749) at 
> o.a.c.g.GossipDigestAckVerbHandler.doVerb(GossipDigestAckVerbHandler.java:81) 
> at o.a.cassandra.net.InboundSink.lambda$new$0(InboundSink.java:79) at 
> o.a.cassandra.net.InboundSink.accept(InboundSink.java:98) at 
> o.a.cassandra.net.InboundSink.accept(InboundSink.java:46) at 
> o.a.c.n.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:430)
>  at o.a.c.c.ExecutionFailure$1.run(ExecutionFailure.java:133) at 
> j.u.c.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at 
> j.u.c.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at 
> i.n.u.c.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at 
> java.lang.Thread.run(Thread.java:829){code}
> After some investigation of this issue, the existing nodes of the cluster 
> have removed the new node as a fat client. The reason for this is the new 
> node is busy with gossip and the gossip queue has a lot of task piling up. 
> The gossip state for the new node on the existing host is:
>  
>  
> {code:java}
> /1.1.1.1
>   generation:1727479926
>   heartbeat:25
>   LOAD:20:31174.0
>   SCHEMA:16:59adb24e-f3cd-3e02-97f0-5b395827453f
>   DC:12:dc1
>   RACK:14:0
>   RELEASE_VERSION:5:4.1.3
>   NET_VERSION:1:12
>   HOST_ID:2:b9cc4587-68f5-4bb6-a933-fd0c77a064dc
>   INTERNAL_ADDRESS_AND_PORT:8:1.1.1.1:7000
>   NATIVE_ADDRESS_AND_PORT:3:1.1.1.1:9042
>   SSTABLE_VERSIONS:6:big-nb
>   TOKENS: not present {code}
> Later this endpoint is removed from gossip endpointstate map because it is 
> treated as a fat client.
>  
>  
> {code:java}
> FatClient /1.1.1.1:7000 has been silent for 30000ms, removing from gossip 
> {code}
> But before it is removed from gossip, the node may have send gossip sync 
> message to the new node asking for gossip info for this new node with 
> heartbeat version larger than 20 in this example.
>  
> The new node gossip queue has too many task to be processed, so it cannot 
> process this request immediately. When it send the gossip ack request back to 
> the existing node, the node has removed the gossip info about the new node. 
> So the gossip will look like below on some existing node:
> {code:java}
> /1.1.1.1 
>   generation:1727479926 
>   heartbeat:229 
>   LOAD:200:3.0 
>   SCHEMA:203:59adb24e-f3cd-3e02-97f0-5b395827453f {code}
> All the information relate to DC/Rack/Host ID is gone.
> When the new node later get gossip settled and modified the local state as 
> BOOT and decided its token. The existing node will receive the STATUS and 
> TOKEN info, then the gossip state will become:
> {code:java}
> /1.1.1.1 
>   generation:1727479926 
>   heartbeat:329 
>   LOAD:300:3.0 
>   SCHEMA:303:59adb24e-f3cd-3e02-97f0-5b395827453f
>   STATUS_WITH_PORT:308:BOOT,-142070360466566106
>   TOKENS:309:<hidden>{code}
> When the existing node process this bootstrap event, we will see the NPE due 
> to host_id missing.
> This issue will create consistency problem because for large clusters, a lot 
> of nodes will consider the joining nodes a remote DC nodes if DC info is 
> missing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (CASSANDRA-19983) Cassandra gossip issue for large cluster

Reply via email to