[ https://issues.apache.org/jira/browse/CASSANDRA-16159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mohamed Zafraan reassigned CASSANDRA-16159: ------------------------------------------- Assignee: Mohamed Zafraan (was: Shubham Arora) > Reduce the Severity of Errors Reported in FailureDetector#isAlive() > ------------------------------------------------------------------- > > Key: CASSANDRA-16159 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16159 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Gossip > Reporter: Caleb Rackliffe > Assignee: Mohamed Zafraan > Priority: Normal > Fix For: 4.0-rc > > > Noticed the following error in the failure detector during a host replacement: > {noformat} > java.lang.IllegalArgumentException: Unknown endpoint: 10.38.178.98:7000 > at > org.apache.cassandra.gms.FailureDetector.isAlive(FailureDetector.java:281) > at > org.apache.cassandra.service.StorageService.handleStateBootreplacing(StorageService.java:2502) > at > org.apache.cassandra.service.StorageService.onChange(StorageService.java:2182) > at > org.apache.cassandra.service.StorageService.onJoin(StorageService.java:3145) > at > org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:1242) > at > org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:1368) > at > org.apache.cassandra.gms.GossipDigestAck2VerbHandler.doVerb(GossipDigestAck2VerbHandler.java:50) > at > org.apache.cassandra.net.InboundSink.lambda$new$0(InboundSink.java:77) > at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:93) > at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:44) > at > org.apache.cassandra.net.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:884) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > {noformat} > This particular error looks benign, given that even if it occurs, the node > continues to handle the {{BOOT_REPLACE}} state. There are two things we might > be able to do to improve {{FailureDetector#isAlive()}} though: > 1.) We don’t short circuit in the case that the endpoint in question is in > quarantine after being removed. It may be useful to check for this so we can > avoid logging an ERROR when the endpoint is clearly doomed/dead. (Quarantine > works great when the gossip message is _from_ a quarantined endpoint, but in > this case, that would be the new/replacing and not the old/replaced one.) > 2.) We can reduce the severity of the logging from ERROR to WARN and provide > better context around how to determine whether or not there’s actually a > problem. (ex. “If this occurs while trying to determine liveness for a node > that is currently being replaced, it can be safely ignored.”) -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org