[ https://issues.apache.org/jira/browse/CASSANDRA-18707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17765063#comment-17765063 ]
Berenguer Blasi edited comment on CASSANDRA-18707 at 9/14/23 8:58 AM: ---------------------------------------------------------------------- The problem is that on a slow env the node does indeed take more than 70s to start so schema change agreement can't be reached. In the attached log [^TESTS-TestSuites.xml.xz] we can see the test starts and 70s later it fails which matches the [70s timeout|https://github.com/apache/cassandra/blob/cassandra-5.0/test/distributed/org/apache/cassandra/distributed/impl/AbstractCluster.java#L968] of the change monitor. If you grep the log for 'Schema updated' or 'd03783d7' you can see the propagation of schemas being correct but not being complete. Also the test goes silent for 47s which is probably the root cause blowing the timeouts: {noformat} INFO [node4_ScheduledTasks:1] node4 2023-09-13 02:08:29,935 StatusLogger.java:121 - system_auth.role_permissions 0,0 DEBUG [node1_ScheduledTasks:1] node1 2023-09-13 02:09:16,769 MigrationCoordinator.java:267 - Pulling unreceived schema versions... {noformat} We can only raise the timeout to 140s, see PR, if we agree the diagnostic is correct. Wdyt? was (Author: bereng): The problem is that on a slow env the node does indeed take more than 70s to start so schema change agreement can't be reached. In the attached log [^TESTS-TestSuites.xml.xz] we can see the test starts and 70s later it fails which matches the [70s timeout|https://github.com/apache/cassandra/blob/cassandra-5.0/test/distributed/org/apache/cassandra/distributed/impl/AbstractCluster.java#L968] of the change monitor. If you grep the log for 'Schema updated' or 'd03783d7' you can see the propagation of schemas being correct but not being complete. Also the test goes silent for 47s which is probably the root cause blowing the timeouts: {noformat} INFO [node4_ScheduledTasks:1] node4 2023-09-13 02:08:29,935 StatusLogger.java:121 - system_auth.role_permissions 0,0 DEBUG [node1_ScheduledTasks:1] node1 2023-09-13 02:09:16,769 MigrationCoordinator.java:267 - Pulling unreceived schema versions... {noformat} We can only raise the timeout to 140s if we agree the diagnostic is correct. Wdyt? > Test failure: > junit.framework.TestSuite.org.apache.cassandra.distributed.test.CASMultiDCTest-.jdk11 > > ---------------------------------------------------------------------------------------------------- > > Key: CASSANDRA-18707 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18707 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/java > Reporter: Ekaterina Dimitrova > Assignee: Berenguer Blasi > Priority: Normal > Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x > > Attachments: TESTS-TestSuites.xml.xz > > > Seen here: > [https://ci-cassandra.apache.org/job/Cassandra-trunk/1650/testReport/junit.framework/TestSuite/org_apache_cassandra_distributed_test_CASMultiDCTest__jdk11/] > h3. > {code:java} > Error Message > Schema agreement not reached. Schema versions of the instances: > [ef1c8e05-a06d-388d-a46d-53cc22a94762, 6c386108-1805-3985-b48e-8016012a0207, > 6c386108-1805-3985-b48e-8016012a0207, ef1c8e05-a06d-388d-a46d-53cc22a94762] > Stacktrace > java.lang.IllegalStateException: Schema agreement not reached. Schema > versions of the instances: [ef1c8e05-a06d-388d-a46d-53cc22a94762, > 6c386108-1805-3985-b48e-8016012a0207, 6c386108-1805-3985-b48e-8016012a0207, > ef1c8e05-a06d-388d-a46d-53cc22a94762] at > org.apache.cassandra.distributed.impl.AbstractCluster$ChangeMonitor.waitForCompletion(AbstractCluster.java:907) > at > org.apache.cassandra.distributed.impl.AbstractCluster.lambda$schemaChange$8(AbstractCluster.java:836) > at org.apache.cassandra.concurrent.FutureTask$1.call(FutureTask.java:96) at > org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61) at > org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71) at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.base/java.lang.Thread.run(Thread.java:829) > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org