[ https://issues.apache.org/jira/browse/CASSANDRA-19361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ling Mao updated CASSANDRA-19361: --------------------------------- Description: h3. How I create an ensemble with 3 nodes(It works well), then I add the fourth node to join the party. when executing nodetool info, get the following exception: {code:java} ➜ bin ./nodetool info java.lang.NullPointerException at org.apache.cassandra.service.StorageService.operationMode(StorageService.java:3744) at org.apache.cassandra.service.StorageService.isBootstrapFailed(StorageService.java:3810) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71) ➜ bin ./nodetool info WARN [InternalResponseStage:152] 2024-02-02 11:45:15,731 RemoteProcessor.java:213 - Got error from /127.0.0.4:7000: TIMEOUT when sending TCM_COMMIT_REQ, retrying on CandidateIterator{candidates=[/127.0.0.4:7000], checkLive=true} error: null -- StackTrace -- java.lang.NullPointerException at org.apache.cassandra.service.StorageService.getLocalHostId(StorageService.java:1904) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71) at jdk.internal.reflect.GeneratedMethodAccessor1.invoke(Unknown Source) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at java.base/sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:260){code} server 1 cannot execute node info and cql shell, server 2 and 3 can do it. Try to query the system prefix tables, I attach stack error log for the further debugging. Cannot find a way to recover. After deleting data(losing all data), restart and everything became OK {code:java} ➜ bin ./nodetool status Datacenter: datacenter1 ======================= Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 127.0.0.2 ? 16 51.2% 6d194555-f6eb-41d0-c000-000000000002 rack1 DN 127.0.0.4 ? 16 48.8% 6d194555-f6eb-41d0-c000-000000000001 rack1{code} h3. When It was introduced by the Patch: CEP-21. Anyway, the NPE check is needed to protect its propagation anywhere {code:java} Implementation of Transactional Cluster Metadata as described in CEP-21 Hash: ae084237 code diff: public String getLocalHostId() { - UUID id = getLocalHostUUID(); - return id != null ? id.toString() : null; + return getLocalHostUUID().toString(); } public UUID getLocalHostUUID() { - UUID id = getTokenMetadata().getHostId(FBUtilities.getBroadcastAddressAndPort()); - if (id != null) - return id; - // this condition is to prevent accessing the tables when the node is not started yet, and in particular, - // when it is not going to be started at all (e.g. when running some unit tests or client tools). - else if ((DatabaseDescriptor.isDaemonInitialized() || DatabaseDescriptor.isToolInitialized()) && CommitLog.instance.isStarted()) - return SystemKeyspace.getLocalHostId(); - - return null; + // Metadata collector requires using local host id, and flush of IndexInfo may race with + // creation and initialization of cluster metadata service. Metadata collector does accept + // null localhost ID values, it's just that TokenMetadata was created earlier. + ClusterMetadata metadata = ClusterMetadata.currentNullable(); + if (metadata == null || metadata.directory.peerId(getBroadcastAddressAndPort()) == null) + return null; + return metadata.directory.peerId(getBroadcastAddressAndPort()).toUUID(); } {code} was: h3. How I create an ensemble with 3 nodes(It works well), then I add the fourth node to join the party. when executing nodetool info, get the following exception: {code:java} ➜ bin ./nodetool info java.lang.NullPointerException at org.apache.cassandra.service.StorageService.operationMode(StorageService.java:3744) at org.apache.cassandra.service.StorageService.isBootstrapFailed(StorageService.java:3810) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71) ➜ bin ./nodetool info WARN [InternalResponseStage:152] 2024-02-02 11:45:15,731 RemoteProcessor.java:213 - Got error from /127.0.0.4:7000: TIMEOUT when sending TCM_COMMIT_REQ, retrying on CandidateIterator{candidates=[/127.0.0.4:7000], checkLive=true} error: null -- StackTrace -- java.lang.NullPointerException at org.apache.cassandra.service.StorageService.getLocalHostId(StorageService.java:1904) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71) at jdk.internal.reflect.GeneratedMethodAccessor1.invoke(Unknown Source) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at java.base/sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:260){code} server 1 cannot execute node info and cql shell, server 2 and 3 can do it. Try to query the system prefix tables, I attach stack error log for the further debugging. Cannot find a way to recover. After deleting data(losing all data), restart and everything became OK {code:java} ➜ bin ./nodetool status Datacenter: datacenter1 ======================= Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 127.0.0.2 ? 16 51.2% 6d194555-f6eb-41d0-c000-000000000002 rack1 DN 127.0.0.4 ? 16 48.8% 6d194555-f6eb-41d0-c000-000000000001 rack1{code} h3. When It was introduced by the Patch: CEP-21. Anyway, the NPE check is needed to protect its propagation anywhere {code:java} Implementation of Transactional Cluster Metadata as described in CEP-21 Hash: ae084237 code diff: public String getLocalHostId() { - UUID id = getLocalHostUUID(); - return id != null ? id.toString() : null; + return getLocalHostUUID().toString(); } public UUID getLocalHostUUID() { - UUID id = getTokenMetadata().getHostId(FBUtilities.getBroadcastAddressAndPort()); - if (id != null) - return id; - // this condition is to prevent accessing the tables when the node is not started yet, and in particular, - // when it is not going to be started at all (e.g. when running some unit tests or client tools). - else if ((DatabaseDescriptor.isDaemonInitialized() || DatabaseDescriptor.isToolInitialized()) && CommitLog.instance.isStarted()) - return SystemKeyspace.getLocalHostId(); - - return null; + // Metadata collector requires using local host id, and flush of IndexInfo may race with + // creation and initialization of cluster metadata service. Metadata collector does accept + // null localhost ID values, it's just that TokenMetadata was created earlier. + ClusterMetadata metadata = ClusterMetadata.currentNullable(); + if (metadata == null || metadata.directory.peerId(getBroadcastAddressAndPort()) == null) + return null; + return metadata.directory.peerId(getBroadcastAddressAndPort()).toUUID(); } {code} > fix node info NPE when ClusterMetadata is null > ---------------------------------------------- > > Key: CASSANDRA-19361 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19361 > Project: Cassandra > Issue Type: Bug > Components: Tool/nodetool, Transactional Cluster Metadata > Reporter: Ling Mao > Assignee: Ling Mao > Priority: Normal > Fix For: 5.0.x > > > h3. How > > I create an ensemble with 3 nodes(It works well), then I add the fourth node > to join the party. > when executing nodetool info, get the following exception: > {code:java} > ➜ bin ./nodetool info > java.lang.NullPointerException at > org.apache.cassandra.service.StorageService.operationMode(StorageService.java:3744) > at > org.apache.cassandra.service.StorageService.isBootstrapFailed(StorageService.java:3810) > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) at > sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71) > ➜ bin ./nodetool info > WARN [InternalResponseStage:152] 2024-02-02 11:45:15,731 > RemoteProcessor.java:213 - Got error from /127.0.0.4:7000: TIMEOUT when > sending TCM_COMMIT_REQ, retrying on > CandidateIterator{candidates=[/127.0.0.4:7000], checkLive=true} error: null > -- StackTrace -- java.lang.NullPointerException at > org.apache.cassandra.service.StorageService.getLocalHostId(StorageService.java:1904) > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) at > sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71) at > jdk.internal.reflect.GeneratedMethodAccessor1.invoke(Unknown Source) at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) at > java.base/sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:260){code} > server 1 cannot execute node info and cql shell, server 2 and 3 can do it. > Try to query the system prefix tables, I attach stack error log for the > further debugging. Cannot find a way to recover. After deleting data(losing > all data), restart and everything became OK > {code:java} > ➜ bin ./nodetool status > Datacenter: datacenter1 > ======================= > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- Address Load Tokens Owns (effective) Host ID > Rack > UN 127.0.0.2 ? 16 51.2% > 6d194555-f6eb-41d0-c000-000000000002 rack1 > DN 127.0.0.4 ? 16 48.8% > 6d194555-f6eb-41d0-c000-000000000001 rack1{code} > h3. When > > It was introduced by the Patch: CEP-21. Anyway, the NPE check is needed to > protect its propagation anywhere > {code:java} > Implementation of Transactional Cluster Metadata as described in CEP-21 > Hash: ae084237 > > code diff: > > public String getLocalHostId() > { > - UUID id = getLocalHostUUID(); > - return id != null ? id.toString() : null; > + return getLocalHostUUID().toString(); > } > > public UUID getLocalHostUUID() > { > - UUID id = > getTokenMetadata().getHostId(FBUtilities.getBroadcastAddressAndPort()); > - if (id != null) > - return id; > - // this condition is to prevent accessing the tables when the node > is not started yet, and in particular, > - // when it is not going to be started at all (e.g. when running some > unit tests or client tools). > - else if ((DatabaseDescriptor.isDaemonInitialized() || > DatabaseDescriptor.isToolInitialized()) && CommitLog.instance.isStarted()) > - return SystemKeyspace.getLocalHostId(); > - > - return null; > + // Metadata collector requires using local host id, and flush of > IndexInfo may race with > + // creation and initialization of cluster metadata service. Metadata > collector does accept > + // null localhost ID values, it's just that TokenMetadata was > created earlier. > + ClusterMetadata metadata = ClusterMetadata.currentNullable(); > + if (metadata == null || > metadata.directory.peerId(getBroadcastAddressAndPort()) == null) > + return null; > + return > metadata.directory.peerId(getBroadcastAddressAndPort()).toUUID(); > } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org