[ https://issues.apache.org/jira/browse/CASSANDRA-3876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Brandon Williams resolved CASSANDRA-3876. ----------------------------------------- Resolution: Fixed Fix Version/s: 1.0.8 Committed, thanks. > nodetool removetoken force causes an inconsistent state > ------------------------------------------------------- > > Key: CASSANDRA-3876 > URL: https://issues.apache.org/jira/browse/CASSANDRA-3876 > Project: Cassandra > Issue Type: Bug > Components: Core > Affects Versions: 1.0.7, 1.1 > Reporter: Sam Overton > Assignee: Sam Overton > Labels: force, nodetool, removetoken > Fix For: 1.0.8 > > Attachments: 3876.patch > > > Steps to reproduce (tested on 1.0.7 and trunk): > * Create a cluster of 3 nodes > * Insert some data > * stop one of the nodes > * Call removetoken on the token of the stopped node > * Immediately after, do removetoken force > - this will cause the original removetoken to fail with an error after 30s > since the generation changed for the leaving node, but this is a convenient > way of simulating the case where a removetoken hangs at streaming since the > cleanup logic at the end of StorageService.removeToken is never executed. > - if you want a more realistic reproduction then get a removetoken to hang > in streaming, then do removetoken force > Effects: > * "removetoken status" now throws an exception because > StorageService.removingNode is not cleared, but the endpoint is no longer a > member of the ring: > $ nodetool -h localhost removetoken status > {noformat} > Exception in thread "main" java.lang.AssertionError > at > org.apache.cassandra.locator.TokenMetadata.getToken(TokenMetadata.java:304) > at > org.apache.cassandra.service.StorageService.getRemovalStatus(StorageService.java:2369) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:616) > at > com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:111) > at > com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:45) > at > com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:226) > at > com.sun.jmx.mbeanserver.PerInterface.getAttribute(PerInterface.java:83) > at > com.sun.jmx.mbeanserver.MBeanSupport.getAttribute(MBeanSupport.java:205) > at > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:683) > at > com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.java:672) > at > javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427) > at > javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:90) > at > javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1285) > at > javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1383) > at > javax.management.remote.rmi.RMIConnectionImpl.getAttribute(RMIConnectionImpl.java:619) > at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:616) > at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322) > at sun.rmi.transport.Transport$1.run(Transport.java:177) > at java.security.AccessController.doPrivileged(Native Method) > at sun.rmi.transport.Transport.serviceCall(Transport.java:173) > at > sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:553) > at > sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:808) > at > sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:667) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) > at java.lang.Thread.run(Thread.java:636) > {noformat} > * truncate no longer works in the cli because the removed endpoint is not > removed from Gossiper.unreachableEndpoints. > The cli errors immediately with: > {noformat} > [default@ks1] truncate cf1; > null > UnavailableException() > at > org.apache.cassandra.thrift.Cassandra$truncate_result.read(Cassandra.java:20978) > at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78) > at > org.apache.cassandra.thrift.Cassandra$Client.recv_truncate(Cassandra.java:942) > at > org.apache.cassandra.thrift.Cassandra$Client.truncate(Cassandra.java:929) > at > org.apache.cassandra.cli.CliClient.executeTruncate(CliClient.java:1417) > at > org.apache.cassandra.cli.CliClient.executeCLIStatement(CliClient.java:270) > at > org.apache.cassandra.cli.CliMain.processStatementInteractive(CliMain.java:219) > at org.apache.cassandra.cli.CliMain.main(CliMain.java:346) > {noformat} > The logs show: > {noformat} > INFO [Thrift:11] 2012-02-08 11:55:50,135 StorageProxy.java (line 1172) Cannot > perform truncate, some hosts are down > {noformat} > * there are probably other schema related things that fail for the same > reason although this wasn't tested > Workaround: > * Restart the affected node. > Fix: > It looks like StorageService.forceRemoveCompletion is missing some cleanup > logic which is present at the end of StorageService.removeToken. Adding this > cleanup logic to forceRemoveCompletion fixes the above issues (see attached). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira