[ 
https://issues.apache.org/jira/browse/CASSANDRA-18707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17765063#comment-17765063
 ] 

Berenguer Blasi edited comment on CASSANDRA-18707 at 9/14/23 8:58 AM:
----------------------------------------------------------------------

The problem is that on a slow env the node does indeed take more than 70s to 
start so schema change agreement can't be reached. In the attached log  
[^TESTS-TestSuites.xml.xz] we can see the test starts and 70s later it fails 
which matches the [70s 
timeout|https://github.com/apache/cassandra/blob/cassandra-5.0/test/distributed/org/apache/cassandra/distributed/impl/AbstractCluster.java#L968]
 of the change monitor.

If you grep the log for 'Schema updated' or 'd03783d7'  you can see the 
propagation of schemas being correct but not being complete. Also the test goes 
silent for 47s which is probably the root cause blowing the timeouts: 

{noformat}
INFO  [node4_ScheduledTasks:1] node4 2023-09-13 02:08:29,935 
StatusLogger.java:121 - system_auth.role_permissions                 0,0
DEBUG [node1_ScheduledTasks:1] node1 2023-09-13 02:09:16,769 
MigrationCoordinator.java:267 - Pulling unreceived schema versions...
{noformat}


We can only raise the timeout to 140s, see PR, if we agree the diagnostic is 
correct. Wdyt?


was (Author: bereng):
The problem is that on a slow env the node does indeed take more than 70s to 
start so schema change agreement can't be reached. In the attached log  
[^TESTS-TestSuites.xml.xz] we can see the test starts and 70s later it fails 
which matches the [70s 
timeout|https://github.com/apache/cassandra/blob/cassandra-5.0/test/distributed/org/apache/cassandra/distributed/impl/AbstractCluster.java#L968]
 of the change monitor.

If you grep the log for 'Schema updated' or 'd03783d7'  you can see the 
propagation of schemas being correct but not being complete. Also the test goes 
silent for 47s which is probably the root cause blowing the timeouts: 

{noformat}
INFO  [node4_ScheduledTasks:1] node4 2023-09-13 02:08:29,935 
StatusLogger.java:121 - system_auth.role_permissions                 0,0
DEBUG [node1_ScheduledTasks:1] node1 2023-09-13 02:09:16,769 
MigrationCoordinator.java:267 - Pulling unreceived schema versions...
{noformat}


We can only raise the timeout to 140s if we agree the diagnostic is correct. 
Wdyt?

> Test failure: 
> junit.framework.TestSuite.org.apache.cassandra.distributed.test.CASMultiDCTest-.jdk11
>  
> ----------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-18707
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18707
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Test/dtest/java
>            Reporter: Ekaterina Dimitrova
>            Assignee: Berenguer Blasi
>            Priority: Normal
>             Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x
>
>         Attachments: TESTS-TestSuites.xml.xz
>
>
> Seen here:
> [https://ci-cassandra.apache.org/job/Cassandra-trunk/1650/testReport/junit.framework/TestSuite/org_apache_cassandra_distributed_test_CASMultiDCTest__jdk11/]
> h3.  
> {code:java}
> Error Message
> Schema agreement not reached. Schema versions of the instances: 
> [ef1c8e05-a06d-388d-a46d-53cc22a94762, 6c386108-1805-3985-b48e-8016012a0207, 
> 6c386108-1805-3985-b48e-8016012a0207, ef1c8e05-a06d-388d-a46d-53cc22a94762]
> Stacktrace
> java.lang.IllegalStateException: Schema agreement not reached. Schema 
> versions of the instances: [ef1c8e05-a06d-388d-a46d-53cc22a94762, 
> 6c386108-1805-3985-b48e-8016012a0207, 6c386108-1805-3985-b48e-8016012a0207, 
> ef1c8e05-a06d-388d-a46d-53cc22a94762] at 
> org.apache.cassandra.distributed.impl.AbstractCluster$ChangeMonitor.waitForCompletion(AbstractCluster.java:907)
>  at 
> org.apache.cassandra.distributed.impl.AbstractCluster.lambda$schemaChange$8(AbstractCluster.java:836)
>  at org.apache.cassandra.concurrent.FutureTask$1.call(FutureTask.java:96) at 
> org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61) at 
> org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71) at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>  at java.base/java.lang.Thread.run(Thread.java:829)
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to