[ 
https://issues.apache.org/jira/browse/CASSANDRA-13058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15818371#comment-15818371
 ] 

Stefan Podkowinski commented on CASSANDRA-13058:
------------------------------------------------

The test is passing because hint delivery is not resumable in 3.0 
(https://issues.apache.org/jira/browse/CASSANDRA-6230). This has only been 
fixed lately in 3.10 (CASSANDRA-11960). 

As due to the missing reply messages addressed in the patch, node2 would never 
respond, while handling non-local hints. That will in turn cause all callbacks 
on node1 to time out and the HintDispatcher will retry hint delievery. At this 
point, the FailureDetector is correctly reporting the node as alive and the 
dispatch process will not be aborted but simply try to consume the next hints 
from a now empty iterator and terminate with a successful return value 
afterwards. The log file will contain a "Finished hinted handoff of file 
[].hints to endpoint []", which is technically correct, but is probably a bit 
misleading in case all writes just timed out.

Even with the FailureDetector reporting the target node as unavailable and 
running into the ABORT case, we'd still get the Exception reported in 
CASSANDRA-11960. All things considered chances are high that hints will be lost 
in 3.0 in case of any errors during delivery.



> dtest failure in hintedhandoff_test.TestHintedHandoff.hintedhandoff_decom_test
> ------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-13058
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13058
>             Project: Cassandra
>          Issue Type: Test
>          Components: Testing
>            Reporter: Sean McCarthy
>            Assignee: Stefan Podkowinski
>            Priority: Blocker
>              Labels: dtest, test-failure
>             Fix For: 3.10
>
>         Attachments: 13058-3.x.patch, node1.log, node1_debug.log, 
> node1_gc.log, node2.log, node2_debug.log, node2_gc.log, node3.log, 
> node3_debug.log, node3_gc.log, node4.log, node4_debug.log, node4_gc.log
>
>
> example failure:
> http://cassci.datastax.com/job/cassandra-3.X_novnode_dtest/16/testReport/hintedhandoff_test/TestHintedHandoff/hintedhandoff_decom_test/
> {code}
> Error Message
> Subprocess ['nodetool', '-h', 'localhost', '-p', '7100', ['decommission']] 
> exited with non-zero status; exit status: 2; 
> stderr: error: Error while decommissioning node: Failed to transfer all hints 
> to 59f20b4f-0215-4e18-be1b-7e00f2901629
> {code}{code}
> Stacktrace
>   File "/usr/lib/python2.7/unittest/case.py", line 329, in run
>     testMethod()
>   File "/home/automaton/cassandra-dtest/hintedhandoff_test.py", line 167, in 
> hintedhandoff_decom_test
>     node1.decommission()
>   File "/usr/local/lib/python2.7/dist-packages/ccmlib/node.py", line 1314, in 
> decommission
>     self.nodetool("decommission")
>   File "/usr/local/lib/python2.7/dist-packages/ccmlib/node.py", line 783, in 
> nodetool
>     return handle_external_tool_process(p, ['nodetool', '-h', 'localhost', 
> '-p', str(self.jmx_port), cmd.split()])
>   File "/usr/local/lib/python2.7/dist-packages/ccmlib/node.py", line 1993, in 
> handle_external_tool_process
>     raise ToolError(cmd_args, rc, out, err)
> {code}{code}
> java.lang.RuntimeException: Error while decommissioning node: Failed to 
> transfer all hints to 59f20b4f-0215-4e18-be1b-7e00f2901629
>       at 
> org.apache.cassandra.service.StorageService.decommission(StorageService.java:3924)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:497)
>       at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:497)
>       at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:275)
>       at 
> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112)
>       at 
> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46)
>       at 
> com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)
>       at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138)
>       at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252)
>       at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
>       at 
> com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801)
>       at 
> javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1466)
>       at 
> javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:76)
>       at 
> javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1307)
>       at 
> javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1399)
>       at 
> javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:828)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:497)
>       at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:323)
>       at sun.rmi.transport.Transport$1.run(Transport.java:200)
>       at sun.rmi.transport.Transport$1.run(Transport.java:197)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at sun.rmi.transport.Transport.serviceCall(Transport.java:196)
>       at 
> sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:568)
>       at 
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:826)
>       at 
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$241(TCPTransport.java:683)
>       at 
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler$$Lambda$284/1694175644.run(Unknown
>  Source)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at 
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:682)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>       at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to