[ https://issues.apache.org/jira/browse/CASSANDRA-19651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17876137#comment-17876137 ]
Dmitry Konstantinov commented on CASSANDRA-19651: ------------------------------------------------- It looks like if we start an instance without joining a ring (like we have for the 3rd node in the test): {code:java} withProperty(JOIN_RING, false, () -> newInstance.startup(cluster));{code} then there is no awaiting of gossip results in instance.startup logic, we just schedule a background gossip task in org.apache.cassandra.gms.Gossiper#start. To check it I have added a 5-second sleep to org.apache.cassandra.gms.Gossiper.GossipTask#run {code:java} private class GossipTask implements Runnable { public void run() { try { //wait on messaging service to start listening MessagingService.instance().waitUntilListening(); Thread.sleep(5000); // <=============================== taskLock.lock(); {code} and have got the NPE reproduced reliably for org.apache.cassandra.distributed.test.ring.BootstrapTest#bootstrapUnspecifiedResumeTest: {code:java} java.lang.NullPointerException: Cannot invoke "org.apache.cassandra.gms.EndpointState.getApplicationState(org.apache.cassandra.gms.ApplicationState)" because "state" is null at org.apache.cassandra.distributed.action.GossipHelper$PullSchemaFrom.lambda$accept$6adea493$1(GossipHelper.java:245) at org.apache.cassandra.distributed.impl.IsolatedExecutor.lambda$async$10(IsolatedExecutor.java:156) at org.apache.cassandra.concurrent.FutureTask$2.call(FutureTask.java:124) at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61) at org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.base/java.lang.Thread.run(Thread.java:840) {code} > idealCLWriteLatency metric reports the worst response time instead of the > time when ideal CL is satisfied > --------------------------------------------------------------------------------------------------------- > > Key: CASSANDRA-19651 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19651 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Observability > Reporter: Dmitry Konstantinov > Assignee: Dmitry Konstantinov > Priority: Normal > Fix For: 4.0.14, 5.0.1, 5.1, 4.1.7 > > Attachments: > ci_summary-cassandra-4.0-a75f6c3e81f677e50c0a0d467dd5dad672f923e3.html, > ci_summary-cassandra-4.1-1ed312f881c0c170c8833ff9fbf397ab8fc625cc.html, > ci_summary-cassandra-5.0-009f2982ac88d9c9bc0a7a7d29220f055aa7f11e.html, > ci_summary-trunk-da68729322515b4a7a698b73a0154ecdeb3abf39.html, > result_details-cassandra-4.0-a75f6c3e81f677e50c0a0d467dd5dad672f923e3.tar.gz, > result_details-cassandra-5.0-009f2982ac88d9c9bc0a7a7d29220f055aa7f11e.tar.gz, > result_details-trunk-da68729322515b4a7a698b73a0154ecdeb3abf39.tar.gz, > select-junit-tests-rerun-4.1.zip > > Time Spent: 20m > Remaining Estimate: 0h > > org.apache.cassandra.service.AbstractWriteResponseHandler: > {code:java} > private final void decrementResponseOrExpired() > { > int decrementedValue = responsesAndExpirations.decrementAndGet(); > if (decrementedValue == 0) > { > // The condition being signaled is a valid proxy for the CL being > achieved > // Only mark it as failed if the requested CL was achieved. > if (!condition.isSignalled() && requestedCLAchieved) > { > replicaPlan.keyspace().metric.writeFailedIdealCL.inc(); > } > else > { > > replicaPlan.keyspace().metric.idealCLWriteLatency.addNano(nanoTime() - > queryStartNanoTime); > } > } > } {code} > Actual result: responsesAndExpirations is a total number of replicas across > all DCs which does not depend on the ideal CL, so the metric value for > replicaPlan.keyspace().metric.idealCLWriteLatency is updated when we get the > latest response/timeout for all replicas. > Expected result: replicaPlan.keyspace().metric.idealCLWriteLatency is updated > when we get enough responses from replicas according to the ideal CL. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org