[ https://issues.apache.org/jira/browse/CASSANDRA-19361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17814292#comment-17814292 ]
Sam Tunnicliffe commented on CASSANDRA-19361: --------------------------------------------- >From the info in the description and the attached text file, it looks as >though the 4th node is not communicating with the existing nodes. Can you >attach the full log from the fourth node? I can't reproduce this with ccm, how are you configuring/running the instances? The executions of {{nodetool info}} in the description, are those are being run against node4? Are they being executed while the node is bootstrapping? {quote}server 1 cannot execute node info and cql shell, server 2 and 3 can do it. {quote} Does this only start to happen _after_ node4 is started? Can you run {{nodetool info}} and cqlsh on node1 before adding node4? > fix node info NPE when ClusterMetadata is null > ---------------------------------------------- > > Key: CASSANDRA-19361 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19361 > Project: Cassandra > Issue Type: Bug > Components: Tool/nodetool, Transactional Cluster Metadata > Reporter: Ling Mao > Assignee: Ling Mao > Priority: Normal > Fix For: 5.0.x > > Attachments: CASSANDRA-19361-stack-error.txt > > Time Spent: 10m > Remaining Estimate: 0h > > h3. How > > I create an ensemble with 3 nodes(It works well), then I add the fourth node > to join the party. > when executing nodetool info, get the following exception: > {code:java} > ➜ bin ./nodetool info > java.lang.NullPointerException at > org.apache.cassandra.service.StorageService.operationMode(StorageService.java:3744) > at > org.apache.cassandra.service.StorageService.isBootstrapFailed(StorageService.java:3810) > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) at > sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71) > ➜ bin ./nodetool info > WARN [InternalResponseStage:152] 2024-02-02 11:45:15,731 > RemoteProcessor.java:213 - Got error from /127.0.0.4:7000: TIMEOUT when > sending TCM_COMMIT_REQ, retrying on > CandidateIterator{candidates=[/127.0.0.4:7000], checkLive=true} error: null > -- StackTrace -- java.lang.NullPointerException at > org.apache.cassandra.service.StorageService.getLocalHostId(StorageService.java:1904) > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) at > sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71) at > jdk.internal.reflect.GeneratedMethodAccessor1.invoke(Unknown Source) at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) at > java.base/sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:260){code} > server 1 cannot execute node info and cql shell, server 2 and 3 can do it. > Try to query the system prefix tables, I attach stack error log for the > further debugging. Cannot find a way to recover. After deleting data(losing > all data), restart and everything became OK > {code:java} > ➜ bin ./nodetool status > Datacenter: datacenter1 > ======================= > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- Address Load Tokens Owns (effective) Host ID > Rack > UN 127.0.0.2 ? 16 51.2% > 6d194555-f6eb-41d0-c000-000000000002 rack1 > DN 127.0.0.4 ? 16 48.8% > 6d194555-f6eb-41d0-c000-000000000001 rack1{code} > h3. When > > It was introduced by the Patch: CEP-21. Anyway, the NPE check is needed to > protect its propagation anywhere > {code:java} > Implementation of Transactional Cluster Metadata as described in CEP-21 > Hash: ae084237 > > code diff: > > public String getLocalHostId() > { > - UUID id = getLocalHostUUID(); > - return id != null ? id.toString() : null; > + return getLocalHostUUID().toString(); > } > > public UUID getLocalHostUUID() > { > - UUID id = > getTokenMetadata().getHostId(FBUtilities.getBroadcastAddressAndPort()); > - if (id != null) > - return id; > - // this condition is to prevent accessing the tables when the node > is not started yet, and in particular, > - // when it is not going to be started at all (e.g. when running some > unit tests or client tools). > - else if ((DatabaseDescriptor.isDaemonInitialized() || > DatabaseDescriptor.isToolInitialized()) && CommitLog.instance.isStarted()) > - return SystemKeyspace.getLocalHostId(); > - > - return null; > + // Metadata collector requires using local host id, and flush of > IndexInfo may race with > + // creation and initialization of cluster metadata service. Metadata > collector does accept > + // null localhost ID values, it's just that TokenMetadata was > created earlier. > + ClusterMetadata metadata = ClusterMetadata.currentNullable(); > + if (metadata == null || > metadata.directory.peerId(getBroadcastAddressAndPort()) == null) > + return null; > + return > metadata.directory.peerId(getBroadcastAddressAndPort()).toUUID(); > } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org