[ 
https://issues.apache.org/jira/browse/CASSANDRA-19344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19344:
----------------------------------------
                        Authors: Marcus Eriksson, Sam Tunnicliffe  (was: Sam 
Tunnicliffe)
    Test and Documentation Plan: 
New and existing tests in attached CI results.
The few failures/errors in the attached results are known (CASSANDRA-19343) or 
look to be infra-related.

                         Status: Patch Available  (was: In Progress)

When an instance moves from a transient to a full replica for a given range, it 
must begin acting as a full replica for writes before it does so for reads. 
Otherwise, consistency can be violated as data streamed to the instance early 
in the operation can be removed by cleanup if it occurs before the instance 
assumes responsibility for full writes. Also, coordinators will route read 
requests to instances which may not have received all preceding writes, causing 
unnecessary read repair or potentially inconsistent results. 

The root cause of the flaky test failure which originally produced this issue 
was that in {{TransientRangeMovementTest::testRemoveNode}} cleanup happened to 
run on one instance before it had enacted the final step of the removal 
operation, leading it to remove more data than it should. 

> Test Failure: 
> org.apache.cassandra.distributed.test.TransientRangeMovementTest.testRemoveNode-_jdk17
> ----------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-19344
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-19344
>             Project: Cassandra
>          Issue Type: Bug
>          Components: CI
>            Reporter: Ekaterina Dimitrova
>            Assignee: Sam Tunnicliffe
>            Priority: Normal
>             Fix For: 5.x
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> The test can fail in two different ways:
> {code:java}
> junit.framework.AssertionFailedError: NOT IN CURRENT: 31 -- [(00,20), 
> (31,50)] at 
> org.apache.cassandra.distributed.test.TransientRangeMovementTest.assertAllContained(TransientRangeMovementTest.java:203)
>  at 
> org.apache.cassandra.distributed.test.TransientRangeMovementTest.testRemoveNode(TransientRangeMovementTest.java:183)
>  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method) at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
>  at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43){code}
> as in here - 
> [https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/2639/workflows/32b92ce7-5e9d-4efb-8362-d200d2414597/jobs/55139/tests#failed-test-0]
> and
> {code:java}
> junit.framework.AssertionFailedError: nodetool command [removenode, 
> 6d194555-f6eb-41d0-c000-000000000003, --force] was not successful stdout: 
> stderr: error: Node /127.0.0.4:7012 is alive and owns this ID. Use 
> decommission command to remove it from the ring -- StackTrace -- 
> java.lang.UnsupportedOperationException: Node /127.0.0.4:7012 is alive and 
> owns this ID. Use decommission command to remove it from the ring at 
> org.apache.cassandra.tcm.sequences.SingleNodeSequences.removeNode(SingleNodeSequences.java:110)
>  at 
> org.apache.cassandra.service.StorageService.removeNode(StorageService.java:3682)
>  at org.apache.cassandra.tools.NodeProbe.removeNode(NodeProbe.java:1020) at 
> org.apache.cassandra.tools.nodetool.RemoveNode.execute(RemoveNode.java:51) at 
> org.apache.cassandra.tools.NodeTool$NodeToolCmd.runInternal(NodeTool.java:388)
>  at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:373) at 
> org.apache.cassandra.tools.NodeTool.execute(NodeTool.java:272) at 
> org.apache.cassandra.distributed.impl.Instance$DTestNodeTool.execute(Instance.java:1129)
>  at 
> org.apache.cassandra.distributed.impl.Instance.lambda$nodetoolResult$51(Instance.java:1038)
>  at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61) at 
> org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71) at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
>  at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
>  at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>  at java.base/java.lang.Thread.run(Thread.java:833) Notifications: Error: 
> java.lang.UnsupportedOperationException: Node /127.0.0.4:7012 is alive and 
> owns this ID. Use decommission command to remove it from the ring at 
> org.apache.cassandra.tcm.sequences.SingleNodeSequences.removeNode(SingleNodeSequences.java:110)
>  at 
> org.apache.cassandra.service.StorageService.removeNode(StorageService.java:3682)
>  at org.apache.cassandra.tools.NodeProbe.removeNode(NodeProbe.java:1020) at 
> org.apache.cassandra.tools.nodetool.RemoveNode.execute(RemoveNode.java:51) at 
> org.apache.cassandra.tools.NodeTool$NodeToolCmd.runInternal(NodeTool.java:388)
>  at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:373) at 
> org.apache.cassandra.tools.NodeTool.execute(NodeTool.java:272) at 
> org.apache.cassandra.distributed.impl.Instance$DTestNodeTool.execute(Instance.java:1129)
>  at 
> org.apache.cassandra.distributed.impl.Instance.lambda$nodetoolResult$51(Instance.java:1038)
>  at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61) at 
> org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71) at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
>  at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
>  at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>  at java.base/java.lang.Thread.run(Thread.java:833) at 
> org.apache.cassandra.distributed.api.NodeToolResult$Asserts.fail(NodeToolResult.java:214)
>  at 
> org.apache.cassandra.distributed.api.NodeToolResult$Asserts.success(NodeToolResult.java:97)
>  at 
> org.apache.cassandra.distributed.test.TransientRangeMovementTest.testRemoveNode(TransientRangeMovementTest.java:173)
>  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method) at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
>  at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43){code}
> as in here - 
> https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/2634/workflows/24617d26-e297-4857-bc43-b6a04e64a6ea/jobs/54534/tests#failed-test-0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to