[jira] [Commented] (CASSANDRA-19361) fix node info NPE when ClusterMetadata is null

Sam Tunnicliffe (Jira) Mon, 05 Feb 2024 01:22:04 -0800


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-19361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17814292#comment-17814292
 ]


Sam Tunnicliffe commented on CASSANDRA-19361:
---------------------------------------------

>From the info in the description and the attached text file, it looks as 
>though the 4th node is not communicating with the existing nodes. Can you 
>attach the full log from the fourth node?

I can't reproduce this with ccm, how are you configuring/running the instances?

The executions of {{nodetool info}} in the description, are those are being run 
against node4? Are they being executed while the node is bootstrapping? 
{quote}server 1 cannot execute node info and cql shell, server 2 and 3 can do 
it. 
{quote}
Does this only start to happen _after_ node4 is started? Can you run {{nodetool 
info}} and cqlsh on node1 before adding node4?

> fix node info NPE when ClusterMetadata is null
> ----------------------------------------------
>
>                 Key: CASSANDRA-19361
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-19361
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Tool/nodetool, Transactional Cluster Metadata
>            Reporter: Ling Mao
>            Assignee: Ling Mao
>            Priority: Normal
>             Fix For: 5.0.x
>
>         Attachments: CASSANDRA-19361-stack-error.txt
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> h3. How
>  
> I create an ensemble with 3 nodes(It works well), then I add the fourth node 
> to join the party. 
> when executing nodetool info, get the following exception:
> {code:java}
> ➜  bin ./nodetool info
> java.lang.NullPointerException at 
> org.apache.cassandra.service.StorageService.operationMode(StorageService.java:3744)
>  at 
> org.apache.cassandra.service.StorageService.isBootstrapFailed(StorageService.java:3810)
>  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method) at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.base/java.lang.reflect.Method.invoke(Method.java:566) at 
> sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71)   
> ➜  bin ./nodetool info 
> WARN  [InternalResponseStage:152] 2024-02-02 11:45:15,731 
> RemoteProcessor.java:213 - Got error from /127.0.0.4:7000: TIMEOUT when 
> sending TCM_COMMIT_REQ, retrying on 
> CandidateIterator{candidates=[/127.0.0.4:7000], checkLive=true} error: null 
> -- StackTrace -- java.lang.NullPointerException at 
> org.apache.cassandra.service.StorageService.getLocalHostId(StorageService.java:1904)
>  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method) at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.base/java.lang.reflect.Method.invoke(Method.java:566) at 
> sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71) at 
> jdk.internal.reflect.GeneratedMethodAccessor1.invoke(Unknown Source) at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.base/java.lang.reflect.Method.invoke(Method.java:566) at 
> java.base/sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:260){code}
> server 1 cannot execute node info and cql shell, server 2 and 3 can do it. 
> Try to query the system prefix tables, I attach stack error log for the 
> further debugging. Cannot find a way to recover. After deleting data(losing 
> all data), restart and everything became OK
> {code:java}
> ➜  bin ./nodetool status
> Datacenter: datacenter1
> =======================
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address    Load  Tokens  Owns (effective)  Host ID                        
>        Rack
> UN  127.0.0.2  ?     16      51.2%             
> 6d194555-f6eb-41d0-c000-000000000002  rack1
> DN  127.0.0.4  ?     16      48.8%             
> 6d194555-f6eb-41d0-c000-000000000001  rack1{code}
> h3. When
>  
> It was introduced by the Patch: CEP-21. Anyway, the NPE check is needed to 
> protect its propagation anywhere
> {code:java}
> Implementation of Transactional Cluster Metadata as described in CEP-21
> Hash: ae084237
>  
> code diff:
>  
>     public String getLocalHostId()
>      {
> -        UUID id = getLocalHostUUID();
> -        return id != null ? id.toString() : null;
> +        return getLocalHostUUID().toString();
>      }
>  
>      public UUID getLocalHostUUID()
>      {
> -        UUID id = 
> getTokenMetadata().getHostId(FBUtilities.getBroadcastAddressAndPort());
> -        if (id != null)
> -            return id;
> -        // this condition is to prevent accessing the tables when the node 
> is not started yet, and in particular,
> -        // when it is not going to be started at all (e.g. when running some 
> unit tests or client tools).
> -        else if ((DatabaseDescriptor.isDaemonInitialized() || 
> DatabaseDescriptor.isToolInitialized()) && CommitLog.instance.isStarted())
> -            return SystemKeyspace.getLocalHostId();
> -
> -        return null;
> +        // Metadata collector requires using local host id, and flush of 
> IndexInfo may race with
> +        // creation and initialization of cluster metadata service. Metadata 
> collector does accept
> +        // null localhost ID values, it's just that TokenMetadata was 
> created earlier.
> +        ClusterMetadata metadata = ClusterMetadata.currentNullable();
> +        if (metadata == null || 
> metadata.directory.peerId(getBroadcastAddressAndPort()) == null)
> +            return null;
> +        return 
> metadata.directory.peerId(getBroadcastAddressAndPort()).toUUID();
>      } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-19361) fix node info NPE when ClusterMetadata is null

Reply via email to