[jira] [Created] (CASSANDRA-19742) Cassandra Dtest Cluster is not fully up after upgrading and may fail on queries

ConfX (Jira) Mon, 01 Jul 2024 23:30:18 -0700

ConfX created CASSANDRA-19742:
---------------------------------

             Summary: Cassandra Dtest Cluster is not fully up after upgrading 
and may fail on queries
                 Key: CASSANDRA-19742
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-19742
             Project: Cassandra
          Issue Type: Bug
          Components: Test/dtest/java
            Reporter: ConfX



This may not be a bug, but the Cassandra Dtest framework can definitely be 
improved.
h2. What happened

In DTest framework, the cluster node may not fully up and become fully 
operational when the test logic in {{runAfterNodeUpgrade()}} is executed. This 
may cause expected behavior and even test flakiness.
h2. How to reproduce

Use the following upgrade tests as example, Put the following test under 
cassandra/test/distributed/org/apache/cassandra/distributed/upgrade/, and build 
dtest jars.
{code:java}
package org.apache.cassandra.distributed.upgrade;public class demoUpgradeTest 
extends UpgradeTestBase {
    @Test
    public void demoTest() throws Throwable {
    {
        new TestCase()
        .nodes(2)
        .nodesToUpgrade(1)
        .withConfig(c -> c.with(GOSSIP, 
NATIVE_PROTOCOL).set("drop_compact_storage_enabled", true))
        .upgradesToCurrentFrom(v3X)
        .setup((cluster) -> {
            cluster.schemaChange("CREATE KEYSPACE k WITH replication = 
{'class': 'SimpleStrategy', 'replication_factor': 1}");
            cluster.schemaChange("CREATE TABLE k.t ( k int, c int, total 
counter, PRIMARY KEY (k, c))");        })
        .runAfterNodeUpgrade((cluster, node) -> {
            ConsistencyLevel cl = ConsistencyLevel.ONE;
            String select = "SELECT total FROM k.t WHERE k = 1 AND c = ?";      
      for (int i = 1; i <= cluster.size(); i++)
            {
                ICoordinator coordinator = cluster.coordinator(i);              
  coordinator.execute("UPDATE k.t SET total = total + 1 WHERE k = 1 AND c = ?", 
cl, i);
                assertRows(coordinator.execute(select, cl, i), row(1L));        
        coordinator.execute("UPDATE k.t SET total = total - 4 WHERE k = 1 AND c 
= ?", cl, i);
                assertRows(coordinator.execute(select, cl, i), row(-3L));
            }
        }).run();
    }
} {code}
Run the test with:
{code:java}
$ ant test-jvm-dtest-some-Duse.jdk11=true 
-Dtest.name=org.apache.cassandra.distributed.upgrade.demoUpgradeTest {code}
You will see the following failure:
{code:java}
[junit-timeout] Testcase: 
demoTest(org.apache.cassandra.distributed.upgrade.demoUpgradeTest)-_jdk11:    
FAILED
[junit-timeout] Error in test '4.0.13 -> [4.1-alpha1]' while upgrading to 
'4.1-alpha1'; successful upgrades []
[junit-timeout] junit.framework.AssertionFailedError: Error in test '4.0.13 -> 
[4.1-alpha1]' while upgrading to '4.1-alpha1'; successful upgrades []
[junit-timeout]     at 
org.apache.cassandra.distributed.upgrade.UpgradeTestBase$TestCase.run(UpgradeTestBase.java:442)
[junit-timeout]     at 
org.apache.cassandra.distributed.upgrade.demoUpgradeTest.demoTest(demoUpgradeTest.java:62)
[junit-timeout]     at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[junit-timeout]     at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[junit-timeout]     at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[junit-timeout] Caused by: 
org.apache.cassandra.exceptions.UnavailableException: Cannot achieve 
consistency level ONE
[junit-timeout]     at 
org.apache.cassandra.exceptions.UnavailableException.create(UnavailableException.java:37)
[junit-timeout]     at 
org.apache.cassandra.exceptions.UnavailableException.create(UnavailableException.java:31)
[junit-timeout]     at 
org.apache.cassandra.service.StorageProxy.findSuitableReplica(StorageProxy.java:1617)
[junit-timeout]     at 
org.apache.cassandra.service.StorageProxy.mutateCounter(StorageProxy.java:1565)
[junit-timeout]     at 
org.apache.cassandra.service.StorageProxy.mutate(StorageProxy.java:809)
[junit-timeout]     at 
org.apache.cassandra.service.StorageProxy.mutateWithTriggers(StorageProxy.java:1054)
[junit-timeout]     at 
org.apache.cassandra.cql3.statements.ModificationStatement.executeWithoutCondition(ModificationStatement.java:476)
[junit-timeout]     at 
org.apache.cassandra.cql3.statements.ModificationStatement.execute(ModificationStatement.java:454)
[junit-timeout]     at 
org.apache.cassandra.distributed.impl.Coordinator.executeInternal(Coordinator.java:103)
[junit-timeout]     at 
org.apache.cassandra.distributed.impl.Coordinator.lambda$executeWithResult$0(Coordinator.java:65)
[junit-timeout]     at 
java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
[junit-timeout]     at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
[junit-timeout]     at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
[junit-timeout]     at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
[junit-timeout]     at java.base/java.lang.Thread.run(Thread.java:829) {code}
This is actually due to the fact that the upgraded node is not fully 
operational after a restart and directly executes the UPDATE statement. This 
can be manually fixed by adding a sleep in the beginning of runAfterNodeUpgrade 
like below:
{code:java}
        ...
        .runAfterNodeUpgrade((cluster, node) -> {
            // Wait for the node to be fully operational
            cluster.get(node).nodetool("status");            // Adding a small 
delay to ensure the node is fully integrated
            Thread.sleep(10000);
            ...
        }).run();
    }
} {code}
However, by design, the Dtest framework should wait for the node to be fully 
operational before executing the runAfterNodeUpgrade(). It would be good to add 
some waiting logic for this purpose to prevent such unexpected behavior from 
happening.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Created] (CASSANDRA-19742) Cassandra Dtest Cluster is not fully up after upgrading and may fail on queries

Reply via email to