[jira] [Resolved] (CASSANDRA-3876) nodetool removetoken force causes an inconsistent state

Brandon Williams (Resolved) (JIRA) Wed, 08 Feb 2012 06:30:25 -0800

     [ 
https://issues.apache.org/jira/browse/CASSANDRA-3876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Brandon Williams resolved CASSANDRA-3876.
-----------------------------------------

       Resolution: Fixed
    Fix Version/s: 1.0.8

Committed, thanks.
                
> nodetool removetoken force causes an inconsistent state
> -------------------------------------------------------
>
>                 Key: CASSANDRA-3876
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3876
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.0.7, 1.1
>            Reporter: Sam Overton
>            Assignee: Sam Overton
>              Labels: force, nodetool, removetoken
>             Fix For: 1.0.8
>
>         Attachments: 3876.patch
>
>
> Steps to reproduce (tested on 1.0.7 and trunk):
> * Create a cluster of 3 nodes
> * Insert some data
> * stop one of the nodes
> * Call removetoken on the token of the stopped node
> * Immediately after, do removetoken force 
>   - this will cause the original removetoken to fail with an error after 30s 
> since the generation changed for the leaving node, but this is a convenient 
> way of simulating the case where a removetoken hangs at streaming since the 
> cleanup logic at the end of StorageService.removeToken is never executed.
>   - if you want a more realistic reproduction then get a removetoken to hang 
> in streaming, then do removetoken force
> Effects:
> * "removetoken status" now throws an exception because 
> StorageService.removingNode is not cleared, but the endpoint is no longer a 
> member of the ring:
> $ nodetool -h localhost removetoken status
> {noformat}
> Exception in thread "main" java.lang.AssertionError
>       at 
> org.apache.cassandra.locator.TokenMetadata.getToken(TokenMetadata.java:304)
>       at 
> org.apache.cassandra.service.StorageService.getRemovalStatus(StorageService.java:2369)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:616)
>       at 
> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:111)
>       at 
> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:45)
>       at 
> com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:226)
>       at 
> com.sun.jmx.mbeanserver.PerInterface.getAttribute(PerInterface.java:83)
>       at 
> com.sun.jmx.mbeanserver.MBeanSupport.getAttribute(MBeanSupport.java:205)
>       at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:683)
>       at 
> com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.java:672)
>       at 
> javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427)
>       at 
> javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:90)
>       at 
> javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1285)
>       at 
> javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1383)
>       at 
> javax.management.remote.rmi.RMIConnectionImpl.getAttribute(RMIConnectionImpl.java:619)
>       at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:616)
>       at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322)
>       at sun.rmi.transport.Transport$1.run(Transport.java:177)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at sun.rmi.transport.Transport.serviceCall(Transport.java:173)
>       at 
> sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:553)
>       at 
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:808)
>       at 
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:667)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>       at java.lang.Thread.run(Thread.java:636)
> {noformat}
> * truncate no longer works in the cli because the removed endpoint is not 
> removed from Gossiper.unreachableEndpoints. 
> The cli errors immediately with:
> {noformat}
> [default@ks1] truncate cf1;
> null
> UnavailableException()
>       at 
> org.apache.cassandra.thrift.Cassandra$truncate_result.read(Cassandra.java:20978)
>       at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
>       at 
> org.apache.cassandra.thrift.Cassandra$Client.recv_truncate(Cassandra.java:942)
>       at 
> org.apache.cassandra.thrift.Cassandra$Client.truncate(Cassandra.java:929)
>       at 
> org.apache.cassandra.cli.CliClient.executeTruncate(CliClient.java:1417)
>       at 
> org.apache.cassandra.cli.CliClient.executeCLIStatement(CliClient.java:270)
>       at 
> org.apache.cassandra.cli.CliMain.processStatementInteractive(CliMain.java:219)
>       at org.apache.cassandra.cli.CliMain.main(CliMain.java:346)
> {noformat}
> The logs show:
> {noformat}
> INFO [Thrift:11] 2012-02-08 11:55:50,135 StorageProxy.java (line 1172) Cannot 
> perform truncate, some hosts are down
> {noformat}
> * there are probably other schema related things that fail for the same 
> reason although this wasn't tested
> Workaround:
> * Restart the affected node.
> Fix:
> It looks like StorageService.forceRemoveCompletion is missing some cleanup 
> logic which is present at the end of StorageService.removeToken. Adding this 
> cleanup logic to forceRemoveCompletion fixes the above issues (see attached).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (CASSANDRA-3876) nodetool removetoken force causes an inconsistent state

Reply via email to