[ https://issues.apache.org/jira/browse/CASSANDRA-10687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15000320#comment-15000320 ]
Eyal Sorek commented on CASSANDRA-10687: ---------------------------------------- BTW, in the joining process, when the node is still joining the - nodetool info, fails with AssertionError nodetool info xss = -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -XX:+CMSClassUnloadingEnabled -Xms8192M -Xmx8192M -Xmn2048M -Xss256k Exception in thread "main" java.lang.AssertionError at org.apache.cassandra.locator.TokenMetadata.getTokens(TokenMetadata.java:502) at org.apache.cassandra.service.StorageService.getTokens(StorageService.java:2165) at org.apache.cassandra.service.StorageService.getTokens(StorageService.java:2154) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75) at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237) at com.sun.jmx.mbeanserver.PerInterface.getAttribute(PerInterface.java:83) at com.sun.jmx.mbeanserver.MBeanSupport.getAttribute(MBeanSupport.java:206) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:647) at com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.java:678) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1464) at javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:97) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1328) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1420) at javax.management.remote.rmi.RMIConnectionImpl.getAttribute(RMIConnectionImpl.java:657) at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322) at sun.rmi.transport.Transport$2.run(Transport.java:202) at sun.rmi.transport.Transport$2.run(Transport.java:199) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:198) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:567) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:828) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.access$400(TCPTransport.java:619) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler$1.run(TCPTransport.java:684) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler$1.run(TCPTransport.java:681) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:681) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) > When adding new node to cluster getting Cassandra timeout during write query > ---------------------------------------------------------------------------- > > Key: CASSANDRA-10687 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10687 > Project: Cassandra > Issue Type: Bug > Components: Configuration, Coordination, Streaming and Messaging > Environment: Cassandra 2.0.9 using vnodes, on Debian 7.9, on two > data centers (AUS & TAM) > Reporter: Eyal Sorek > > When adding one new node on 8 nodes cluster (also again after completing > adding the 9th in AUS data center and again when adding the 10th node on TAM > data center with same behaviour). > We get many of the following errors below. > First - why this, when the node is joining : > LOCAL_ONE (2 replica were required but only 1 acknowledged the write > Since when LOCAL_ONE requires 2 replicas ? > Second, why we fill so much overhead on the all cluster, when a node is > joining ? > com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout > during write query at consistency LOCAL_ONE (2 replica were required but only > 1 acknowledged the write) > Sample stack trace > …stax.driver.core.exceptions.WriteTimeoutException.copy > (WriteTimeoutException.java:73) > …m.datastax.driver.core.DriverThrowables.propagateCause > (DriverThrowables.java:37) > ….driver.core.DefaultResultSetFuture.getUninterruptibly > (DefaultResultSetFuture.java:214) > com.datastax.driver.core.AbstractSession.execute > (AbstractSession.java:52) > com.wixpress.publichtml.renderer.data.access.dao.page.CassandraPagesReadWriteDao$$anonfun$insertCompressed$1.apply(CassandraPagesReadWriteDao.scala:29) > com.wixpress.publichtml.renderer.data.access.dao.page.CassandraPagesReadWriteDao$$anonfun$insertCompressed$1.apply(CassandraPagesReadWriteDao.scala:25) > com.wixpress.framework.monitoring.metering.SyncMetering$class.tracking(Metering.scala:58) > com.wixpress.publichtml.renderer.data.access.dao.page.CassandraPagesReadOnlyDao.tracking(CassandraPagesReadOnlyDao.scala:19) > com.wixpress.publichtml.renderer.data.access.dao.page.CassandraPagesReadWriteDao.insertCompressed(CassandraPagesReadWriteDao.scala:25) > com.wixpress.html.data.distributor.core.DaoPageDistributor.com$wixpress$html$data$distributor$core$DaoPageDistributor$$distributePage(DaoPageDistributor.scala:36) > com.wixpress.html.data.distributor.core.DaoPageDistributor$$anonfun$process$1.apply$mcV$sp(DaoPageDistributor.scala:26) > com.wixpress.html.data.distributor.core.DaoPageDistributor$$anonfun$process$1.apply(DaoPageDistributor.scala:26) > com.wixpress.html.data.distributor.core.DaoPageDistributor$$anonfun$process$1.apply(DaoPageDistributor.scala:26) > com.wixpress.framework.monitoring.metering.SyncMetering$class.tracking(Metering.scala:58) > com.wixpress.html.data.distributor.core.DaoPageDistributor.tracking(DaoPageDistributor.scala:17) > com.wixpress.html.data.distributor.core.DaoPageDistributor.process(DaoPageDistributor.scala:25) > com.wixpress.html.data.distributor.core.greyhound.DistributionRequestHandler.handleMessage(DistributionRequestHandler.scala:19) > com.wixpress.greyhound.KafkaUserHandlers.handleMessage(UserHandlers.scala:11) > com.wixpress.greyhound.EventsConsumer.com$wixpress$greyhound$EventsConsumer$$handleMessage(EventsConsumer.scala:51) > com.wixpress.greyhound.EventsConsumer$$anonfun$com$wixpress$greyhound$EventsConsumer$$dispatch$1.apply$mcV$sp(EventsConsumer.scala:43) > com.wixpress.greyhound.EventsConsumer$$anonfun$com$wixpress$greyhound$EventsConsumer$$dispatch$1.apply(EventsConsumer.scala:40) > com.wixpress.greyhound.EventsConsumer$$anonfun$com$wixpress$greyhound$EventsConsumer$$dispatch$1.apply(EventsConsumer.scala:40) > scala.util.Try$.apply(Try.scala:192) > com.wixpress.greyhound.EventsConsumer.com$wixpress$greyhound$EventsConsumer$$dispatch(EventsConsumer.scala:40) > com.wixpress.greyhound.EventsConsumer$$anonfun$consumeEvents$1.apply(EventsConsumer.scala:26) > com.wixpress.greyhound.EventsConsumer$$anonfun$consumeEvents$1.apply(EventsConsumer.scala:25) > scala.collection.Iterator$class.foreach(Iterator.scala:742) > scala.collection.AbstractIterator.foreach(Iterator.scala:1194) > com.wixpress.greyhound.EventsConsumer.consumeEvents(EventsConsumer.scala:25) > com.wixpress.greyhound.EventsConsumer.run(EventsConsumer.scala:20) > java.util.concurrent.ThreadPoolExecutor.runWorker > (ThreadPoolExecutor.java:1142) > java.util.concurrent.ThreadPoolExecutor$Worker.run > (ThreadPoolExecutor.java:617) > java.lang.Thread.run (Thread.java:745) > caused by com.datastax.driver.core.exceptions.WriteTimeoutException: > Cassandra timeout during write query at consistency LOCAL_ONE (2 replica were > required but only 1 acknowledged the write) > …stax.driver.core.exceptions.WriteTimeoutException.copy > (WriteTimeoutException.java:100) > com.datastax.driver.core.Responses$Error.asException (Responses.java:98) > com.datastax.driver.core.DefaultResultSetFuture.onSet > (DefaultResultSetFuture.java:149) > com.datastax.driver.core.RequestHandler.setFinalResult > (RequestHandler.java:183) > com.datastax.driver.core.RequestHandler.access$2300 > (RequestHandler.java:44) > …ore.RequestHandler$SpeculativeExecution.setFinalResult > (RequestHandler.java:748) > ….driver.core.RequestHandler$SpeculativeExecution.onSet > (RequestHandler.java:587) > …atastax.driver.core.Connection$Dispatcher.channelRead0 (Connection.java:1013) > …atastax.driver.core.Connection$Dispatcher.channelRead0 (Connection.java:936) > ….netty.channel.SimpleChannelInboundHandler.channelRead > (SimpleChannelInboundHandler.java:105) > …hannel.AbstractChannelHandlerContext.invokeChannelRead > (AbstractChannelHandlerContext.java:339) > ….channel.AbstractChannelHandlerContext.fireChannelRead > (AbstractChannelHandlerContext.java:324) > io.netty.handler.timeout.IdleStateHandler.channelRead > (IdleStateHandler.java:254) > …hannel.AbstractChannelHandlerContext.invokeChannelRead > (AbstractChannelHandlerContext.java:339) > ….channel.AbstractChannelHandlerContext.fireChannelRead > (AbstractChannelHandlerContext.java:324) > …etty.handler.codec.MessageToMessageDecoder.channelRead > (MessageToMessageDecoder.java:103) > …hannel.AbstractChannelHandlerContext.invokeChannelRead > (AbstractChannelHandlerContext.java:339) > ….channel.AbstractChannelHandlerContext.fireChannelRead > (AbstractChannelHandlerContext.java:324) > …etty.handler.codec.MessageToMessageDecoder.channelRead > (MessageToMessageDecoder.java:103) > …hannel.AbstractChannelHandlerContext.invokeChannelRead > (AbstractChannelHandlerContext.java:339) > ….channel.AbstractChannelHandlerContext.fireChannelRead > (AbstractChannelHandlerContext.java:324) > io.netty.handler.codec.ByteToMessageDecoder.channelRead > (ByteToMessageDecoder.java:242) > …hannel.AbstractChannelHandlerContext.invokeChannelRead > (AbstractChannelHandlerContext.java:339) > ….channel.AbstractChannelHandlerContext.fireChannelRead > (AbstractChannelHandlerContext.java:324) > io.netty.channel.DefaultChannelPipeline.fireChannelRead > (DefaultChannelPipeline.java:847) > ….channel.nio.AbstractNioByteChannel$NioByteUnsafe.read > (AbstractNioByteChannel.java:131) > io.netty.channel.nio.NioEventLoop.processSelectedKey > (NioEventLoop.java:511) > ….channel.nio.NioEventLoop.processSelectedKeysOptimized > (NioEventLoop.java:468) > io.netty.channel.nio.NioEventLoop.processSelectedKeys > (NioEventLoop.java:382) > io.netty.channel.nio.NioEventLoop.run > (NioEventLoop.java:354) > ….netty.util.concurrent.SingleThreadEventExecutor$2.run > (SingleThreadEventExecutor.java:111) > java.lang.Thread.run (Thread.java:745) > caused by com.datastax.driver.core.exceptions.WriteTimeoutException: > Cassandra timeout during write query at consistency LOCAL_ONE (2 replica were > required but only 1 acknowledged the write) > com.datastax.driver.core.Responses$Error$1.decode (Responses.java:57) > com.datastax.driver.core.Responses$Error$1.decode (Responses.java:37) > com.datastax.driver.core.Message$ProtocolDecoder.decode (Message.java:213) > com.datastax.driver.core.Message$ProtocolDecoder.decode (Message.java:204) > …etty.handler.codec.MessageToMessageDecoder.channelRead > (MessageToMessageDecoder.java:89) > …hannel.AbstractChannelHandlerContext.invokeChannelRead > (AbstractChannelHandlerContext.java:339) > ….channel.AbstractChannelHandlerContext.fireChannelRead > (AbstractChannelHandlerContext.java:324) > …etty.handler.codec.MessageToMessageDecoder.channelRead > (MessageToMessageDecoder.java:103) > …hannel.AbstractChannelHandlerContext.invokeChannelRead > (AbstractChannelHandlerContext.java:339) > ….channel.AbstractChannelHandlerContext.fireChannelRead > (AbstractChannelHandlerContext.java:324) > io.netty.handler.codec.ByteToMessageDecoder.channelRead > (ByteToMessageDecoder.java:242) > …hannel.AbstractChannelHandlerContext.invokeChannelRead > (AbstractChannelHandlerContext.java:339) > ….channel.AbstractChannelHandlerContext.fireChannelRead > (AbstractChannelHandlerContext.java:324) > io.netty.channel.DefaultChannelPipeline.fireChannelRead > (DefaultChannelPipeline.java:847) > ….channel.nio.AbstractNioByteChannel$NioByteUnsafe.read > (AbstractNioByteChannel.java:131) > io.netty.channel.nio.NioEventLoop.processSelectedKey > (NioEventLoop.java:511) > ….channel.nio.NioEventLoop.processSelectedKeysOptimized > (NioEventLoop.java:468) > io.netty.channel.nio.NioEventLoop.processSelectedKeys > (NioEventLoop.java:382) > io.netty.channel.nio.NioEventLoop.run > (NioEventLoop.java:354) > ….netty.util.concurrent.SingleThreadEventExecutor$2.run > (SingleThreadEventExecutor.java:111) > java.lang.Thread.run (Thread.java:745) > # nodetool status > xss = -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar > -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 > -XX:+CMSClassUnloadingEnabled -Xms8192M -Xmx8192M -Xmn2048M -Xss256k > Note: Ownership information does not include topology; for complete > information, specify a keyspace > Datacenter: AUS > =============== > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- Address Load Tokens Owns Host ID > Rack > UN 172.16.213.62 85.52 GB 256 11.7% > 27f2fd1d-5f3c-4691-a1f6-e28c1343e212 R1 > UN 172.16.213.63 83.11 GB 256 12.2% > 4869f14b-e858-46c7-967c-60bd8260a149 R1 > UN 172.16.213.64 80.91 GB 256 11.7% > d4ad2495-cb24-4964-94d2-9e3f557054a4 R1 > UN 172.16.213.66 84.11 GB 256 10.3% > 2a16c0dc-c36a-4196-89df-2de4f6b6cae5 R1 > UN 172.16.144.75 95.2 GB 256 11.4% > f87d6518-6c8e-49d9-a013-018bbedb8414 R1 > Datacenter: TAM > =============== > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- Address Load Tokens Owns Host ID > Rack > UJ 10.14.0.155 4.38 GB 256 ? > c88bebae-737b-4ade-8f79-64f655036eee R1 > UN 10.14.0.106 81.57 GB 256 10.0% > 3b539927-b53a-4f50-9acd-d92fefbd84b9 R1 > UN 10.14.0.107 80.23 GB 256 10.4% > b70f674d-892f-42ff-a261-5356bee79e99 R1 > UN 10.14.0.108 83.64 GB 256 11.2% > 6e24b17a-0b48-46b4-8edb-b0a9206314a3 R1 > UN 10.14.0.109 91.02 GB 256 11.2% > 11f02dbd-257f-4623-81f4-b94db7365775 R1 -- This message was sent by Atlassian JIRA (v6.3.4#6332)