[ 
https://issues.apache.org/jira/browse/CASSANDRA-18792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17764395#comment-17764395
 ] 

Brandon Williams commented on CASSANDRA-18792:
----------------------------------------------

There are two problems (at least) here.  The first one is the 'Node is involved 
in cluster membership changes' error.  This is coming from node2 even node1 has 
been moved because there is a race between nodetool returning from node1 and 
the streaming session being closed out on node2.  A [small 
sleep|https://github.com/driftx/cassandra-dtest/commit/dfd506ec81fb30a444da54e79e539159813815e1]
 is all we need to [avoid 
that|https://app.circleci.com/pipelines/github/driftx/cassandra/1266/workflows/584d0c94-1a42-4747-99bd-869e076184d5/jobs/51039]
 but leaves us with the second problem: occasional timeouts that are not 
environmental since the occur on a consistent bases both on circle and locally 
for me.

> Test failure: 
> transient_replication_ring_test.py::TestTransientReplicationRing::test_move_forwards_between_and_cleanup
> ----------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-18792
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18792
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Test/dtest/python
>            Reporter: Andres de la Peña
>            Assignee: Brandon Williams
>            Priority: Normal
>             Fix For: 5.0.x, 5.x
>
>
> The Python dtest {{Test failure: 
> transient_replication_ring_test.py::TestTransientReplicationRing::test_move_forwards_between_and_cleanup}}
>  seems to be flaky at least in {{trunk}}:
> * 
> https://app.circleci.com/pipelines/github/instaclustr/cassandra/2993/workflows/80ac4db3-fc3d-4908-bc39-dfff6ab88871/jobs/105464/tests
> * 
> https://app.circleci.com/pipelines/github/adelapena/cassandra/3128/workflows/b0cf2754-81fd-491e-bac4-cc7fe8b0ac1b/jobs/70390/tests
> {code}
> ccmlib.node.ToolError: Subprocess ['nodetool', '-h', 'localhost', '-p', 
> '7200', 'cleanup'] exited with non-zero status; exit status: 2; 
> stderr: error: Node is involved in cluster membership changes. Not safe to 
> run cleanup.
> -- StackTrace --
> java.lang.RuntimeException: Node is involved in cluster membership changes. 
> Not safe to run cleanup.
>       at 
> org.apache.cassandra.service.StorageService.forceKeyspaceCleanup(StorageService.java:4037)
>       at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
>       at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.base/java.lang.reflect.Method.invoke(Method.java:568)
>       at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:72)
>       at jdk.internal.reflect.GeneratedMethodAccessor1.invoke(Unknown Source)
>       at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.base/java.lang.reflect.Method.invoke(Method.java:568)
>       at java.base/sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:262)
>       at 
> java.management/com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112)
>       at 
> java.management/com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46)
>       at 
> java.management/com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)
>       at 
> java.management/com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138)
>       at 
> java.management/com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252)
>       at 
> java.management/com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:814)
>       at 
> java.management/com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:802)
>       at 
> java.management.rmi/javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1472)
>       at 
> java.management.rmi/javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1310)
>       at 
> java.management.rmi/javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1405)
>       at 
> java.management.rmi/javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:829)
>       at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
>       at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.base/java.lang.reflect.Method.invoke(Method.java:568)
>       at 
> java.rmi/sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:360)
>       at java.rmi/sun.rmi.transport.Transport$1.run(Transport.java:200)
>       at java.rmi/sun.rmi.transport.Transport$1.run(Transport.java:197)
>       at 
> java.base/java.security.AccessController.doPrivileged(AccessController.java:712)
>       at java.rmi/sun.rmi.transport.Transport.serviceCall(Transport.java:196)
>       at 
> java.rmi/sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:587)
>       at 
> java.rmi/sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:828)
>       at 
> java.rmi/sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:705)
>       at 
> java.base/java.security.AccessController.doPrivileged(AccessController.java:399)
>       at 
> java.rmi/sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:704)
>       at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
>       at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
>       at java.base/java.lang.Thread.run(Thread.java:833)
> self = <transient_replication_ring_test.TestTransientReplicationRing object 
> at 0x7f6d38295710>
>     @flaky(max_runs=1)
>     @pytest.mark.no_vnodes
>     def test_move_forwards_between_and_cleanup(self):
>         """Test moving a node forwards past a neighbor token"""
>         move_token = '00025'
>         expected_after_move = [gen_expected(range(0, 26), range(31, 40, 2)),
>                                gen_expected(range(0, 21, 2), range(31, 40)),
>                                gen_expected(range(1, 11, 2), range(11, 21, 
> 2), range(21, 31)),
>                                gen_expected(range(21, 26, 2), range(26, 40))]
>         expected_after_repair = [gen_expected(range(0, 26)),
>                                  gen_expected(range(0, 21), range(31, 40)),
>                                  gen_expected(range(21, 31),),
>                                  gen_expected(range(26, 40))]
> >       self.move_test(move_token, expected_after_move, expected_after_repair)
> transient_replication_ring_test.py:291: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> transient_replication_ring_test.py:268: in move_test
>     cleanup_nodes(nodes)
> transient_replication_ring_test.py:43: in cleanup_nodes
>     node.nodetool('cleanup')
> ../env3.6/lib/python3.6/site-packages/ccmlib/node.py:1018: in nodetool
>     return handle_external_tool_process(p, ['nodetool', '-h', 'localhost', 
> '-p', str(self.jmx_port)] + shlex.split(cmd))
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> process = <subprocess.Popen object at 0x7f6d376055c0>
> cmd_args = ['nodetool', '-h', 'localhost', '-p', '7200', 'cleanup']
>     def handle_external_tool_process(process, cmd_args):
>         out, err = process.communicate()
>         if (out is not None) and isinstance(out, bytes):
>             out = out.decode()
>         if (err is not None) and isinstance(err, bytes):
>             err = err.decode()
>         rc = process.returncode
>     
>         if rc != 0:
> >           raise ToolError(cmd_args, rc, out, err)
> E           ccmlib.node.ToolError: Subprocess ['nodetool', '-h', 'localhost', 
> '-p', '7200', 'cleanup'] exited with non-zero status; exit status: 2; 
> E           stderr: error: Node is involved in cluster membership changes. 
> Not safe to run cleanup.
> E           -- StackTrace --
> E           java.lang.RuntimeException: Node is involved in cluster 
> membership changes. Not safe to run cleanup.
> E             at 
> org.apache.cassandra.service.StorageService.forceKeyspaceCleanup(StorageService.java:4037)
> E             at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> E             at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
> E             at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> E             at java.base/java.lang.reflect.Method.invoke(Method.java:568)
> E             at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:72)
> E             at jdk.internal.reflect.GeneratedMethodAccessor1.invoke(Unknown 
> Source)
> E             at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> E             at java.base/java.lang.reflect.Method.invoke(Method.java:568)
> E             at 
> java.base/sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:262)
> E             at 
> java.management/com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112)
> E             at 
> java.management/com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46)
> E             at 
> java.management/com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)
> E             at 
> java.management/com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138)
> E             at 
> java.management/com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252)
> E             at 
> java.management/com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:814)
> E             at 
> java.management/com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:802)
> E             at 
> java.management.rmi/javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1472)
> E             at 
> java.management.rmi/javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1310)
> E             at 
> java.management.rmi/javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1405)
> E             at 
> java.management.rmi/javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:829)
> E             at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> E             at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
> E             at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> E             at java.base/java.lang.reflect.Method.invoke(Method.java:568)
> E             at 
> java.rmi/sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:360)
> E             at 
> java.rmi/sun.rmi.transport.Transport$1.run(Transport.java:200)
> E             at 
> java.rmi/sun.rmi.transport.Transport$1.run(Transport.java:197)
> E             at 
> java.base/java.security.AccessController.doPrivileged(AccessController.java:712)
> E             at 
> java.rmi/sun.rmi.transport.Transport.serviceCall(Transport.java:196)
> E             at 
> java.rmi/sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:587)
> E             at 
> java.rmi/sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:828)
> E             at 
> java.rmi/sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:705)
> E             at 
> java.base/java.security.AccessController.doPrivileged(AccessController.java:399)
> E             at 
> java.rmi/sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:704)
> E             at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
> E             at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
> E             at java.base/java.lang.Thread.run(Thread.java:833)
> ../env3.6/lib/python3.6/site-packages/ccmlib/node.py:2318: ToolError
> {code}
> This hasn't been seen yet on Butler but on ad-hoc PRs. It can also be 
> reproduced with the multiplexer:
> {code}
> .circleci/generate.sh -p \
>   -e REPEATED_DTESTS_COUNT=500 \
>   -e 
> REPEATED_DTESTS=transient_replication_ring_test.py::TestTransientReplicationRing::test_move_forwards_between_and_cleanup
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to