Eyal Sorek created CASSANDRA-10687:
--------------------------------------

             Summary: When adding new node to cluster getting Cassandra timeout 
during write query
                 Key: CASSANDRA-10687
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10687
             Project: Cassandra
          Issue Type: Bug
          Components: Configuration, Coordination, Streaming and Messaging
         Environment: Cassandra 2.0.9 using vnodes, on Debian 7.9,  on two data 
centers (AUS & TAM)
            Reporter: Eyal Sorek


When adding one new node on 8 nodes cluster (also again after completing adding 
the 9th in AUS data center and again when adding the 10th node on TAM data 
center with same behaviour).
We get many of the following errors below.
First - why this, when the node is joining :
LOCAL_ONE (2 replica were required but only 1 acknowledged the write
Since when LOCAL_ONE requires 2 replicas ?
Second, why we fill so much overhead on the all cluster, when a node is joining 
?

com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout 
during write query at consistency LOCAL_ONE (2 replica were required but only 1 
acknowledged the write)

Sample stack trace
…stax.driver.core.exceptions.WriteTimeoutException.copy 
(WriteTimeoutException.java:73)

…m.datastax.driver.core.DriverThrowables.propagateCause 
(DriverThrowables.java:37)

….driver.core.DefaultResultSetFuture.getUninterruptibly 
(DefaultResultSetFuture.java:214)

       com.datastax.driver.core.AbstractSession.execute 
(AbstractSession.java:52)

com.wixpress.publichtml.renderer.data.access.dao.page.CassandraPagesReadWriteDao$$anonfun$insertCompressed$1.apply(CassandraPagesReadWriteDao.scala:29)
com.wixpress.publichtml.renderer.data.access.dao.page.CassandraPagesReadWriteDao$$anonfun$insertCompressed$1.apply(CassandraPagesReadWriteDao.scala:25)
com.wixpress.framework.monitoring.metering.SyncMetering$class.tracking(Metering.scala:58)
com.wixpress.publichtml.renderer.data.access.dao.page.CassandraPagesReadOnlyDao.tracking(CassandraPagesReadOnlyDao.scala:19)
com.wixpress.publichtml.renderer.data.access.dao.page.CassandraPagesReadWriteDao.insertCompressed(CassandraPagesReadWriteDao.scala:25)
com.wixpress.html.data.distributor.core.DaoPageDistributor.com$wixpress$html$data$distributor$core$DaoPageDistributor$$distributePage(DaoPageDistributor.scala:36)
com.wixpress.html.data.distributor.core.DaoPageDistributor$$anonfun$process$1.apply$mcV$sp(DaoPageDistributor.scala:26)
com.wixpress.html.data.distributor.core.DaoPageDistributor$$anonfun$process$1.apply(DaoPageDistributor.scala:26)
com.wixpress.html.data.distributor.core.DaoPageDistributor$$anonfun$process$1.apply(DaoPageDistributor.scala:26)
com.wixpress.framework.monitoring.metering.SyncMetering$class.tracking(Metering.scala:58)
com.wixpress.html.data.distributor.core.DaoPageDistributor.tracking(DaoPageDistributor.scala:17)
com.wixpress.html.data.distributor.core.DaoPageDistributor.process(DaoPageDistributor.scala:25)
com.wixpress.html.data.distributor.core.greyhound.DistributionRequestHandler.handleMessage(DistributionRequestHandler.scala:19)
com.wixpress.greyhound.KafkaUserHandlers.handleMessage(UserHandlers.scala:11)
com.wixpress.greyhound.EventsConsumer.com$wixpress$greyhound$EventsConsumer$$handleMessage(EventsConsumer.scala:51)
com.wixpress.greyhound.EventsConsumer$$anonfun$com$wixpress$greyhound$EventsConsumer$$dispatch$1.apply$mcV$sp(EventsConsumer.scala:43)
com.wixpress.greyhound.EventsConsumer$$anonfun$com$wixpress$greyhound$EventsConsumer$$dispatch$1.apply(EventsConsumer.scala:40)
com.wixpress.greyhound.EventsConsumer$$anonfun$com$wixpress$greyhound$EventsConsumer$$dispatch$1.apply(EventsConsumer.scala:40)
scala.util.Try$.apply(Try.scala:192)
com.wixpress.greyhound.EventsConsumer.com$wixpress$greyhound$EventsConsumer$$dispatch(EventsConsumer.scala:40)
com.wixpress.greyhound.EventsConsumer$$anonfun$consumeEvents$1.apply(EventsConsumer.scala:26)
com.wixpress.greyhound.EventsConsumer$$anonfun$consumeEvents$1.apply(EventsConsumer.scala:25)
scala.collection.Iterator$class.foreach(Iterator.scala:742)
scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
com.wixpress.greyhound.EventsConsumer.consumeEvents(EventsConsumer.scala:25)
com.wixpress.greyhound.EventsConsumer.run(EventsConsumer.scala:20)
      java.util.concurrent.ThreadPoolExecutor.runWorker 
(ThreadPoolExecutor.java:1142)

     java.util.concurrent.ThreadPoolExecutor$Worker.run 
(ThreadPoolExecutor.java:617)

                                   java.lang.Thread.run (Thread.java:745)


caused by com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra 
timeout during write query at consistency LOCAL_ONE (2 replica were required 
but only 1 acknowledged the write)
…stax.driver.core.exceptions.WriteTimeoutException.copy 
(WriteTimeoutException.java:100)

   com.datastax.driver.core.Responses$Error.asException (Responses.java:98)

  com.datastax.driver.core.DefaultResultSetFuture.onSet 
(DefaultResultSetFuture.java:149)

 com.datastax.driver.core.RequestHandler.setFinalResult 
(RequestHandler.java:183)

    com.datastax.driver.core.RequestHandler.access$2300 (RequestHandler.java:44)

…ore.RequestHandler$SpeculativeExecution.setFinalResult 
(RequestHandler.java:748)

….driver.core.RequestHandler$SpeculativeExecution.onSet 
(RequestHandler.java:587)

…atastax.driver.core.Connection$Dispatcher.channelRead0 (Connection.java:1013)

…atastax.driver.core.Connection$Dispatcher.channelRead0 (Connection.java:936)

….netty.channel.SimpleChannelInboundHandler.channelRead 
(SimpleChannelInboundHandler.java:105)

…hannel.AbstractChannelHandlerContext.invokeChannelRead 
(AbstractChannelHandlerContext.java:339)

….channel.AbstractChannelHandlerContext.fireChannelRead 
(AbstractChannelHandlerContext.java:324)

  io.netty.handler.timeout.IdleStateHandler.channelRead 
(IdleStateHandler.java:254)

…hannel.AbstractChannelHandlerContext.invokeChannelRead 
(AbstractChannelHandlerContext.java:339)

….channel.AbstractChannelHandlerContext.fireChannelRead 
(AbstractChannelHandlerContext.java:324)

…etty.handler.codec.MessageToMessageDecoder.channelRead 
(MessageToMessageDecoder.java:103)

…hannel.AbstractChannelHandlerContext.invokeChannelRead 
(AbstractChannelHandlerContext.java:339)

….channel.AbstractChannelHandlerContext.fireChannelRead 
(AbstractChannelHandlerContext.java:324)

…etty.handler.codec.MessageToMessageDecoder.channelRead 
(MessageToMessageDecoder.java:103)

…hannel.AbstractChannelHandlerContext.invokeChannelRead 
(AbstractChannelHandlerContext.java:339)

….channel.AbstractChannelHandlerContext.fireChannelRead 
(AbstractChannelHandlerContext.java:324)

io.netty.handler.codec.ByteToMessageDecoder.channelRead 
(ByteToMessageDecoder.java:242)

…hannel.AbstractChannelHandlerContext.invokeChannelRead 
(AbstractChannelHandlerContext.java:339)

….channel.AbstractChannelHandlerContext.fireChannelRead 
(AbstractChannelHandlerContext.java:324)

io.netty.channel.DefaultChannelPipeline.fireChannelRead 
(DefaultChannelPipeline.java:847)

….channel.nio.AbstractNioByteChannel$NioByteUnsafe.read 
(AbstractNioByteChannel.java:131)

   io.netty.channel.nio.NioEventLoop.processSelectedKey (NioEventLoop.java:511)

….channel.nio.NioEventLoop.processSelectedKeysOptimized (NioEventLoop.java:468)

  io.netty.channel.nio.NioEventLoop.processSelectedKeys (NioEventLoop.java:382)

                  io.netty.channel.nio.NioEventLoop.run (NioEventLoop.java:354)

….netty.util.concurrent.SingleThreadEventExecutor$2.run 
(SingleThreadEventExecutor.java:111)

                                   java.lang.Thread.run (Thread.java:745)


caused by com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra 
timeout during write query at consistency LOCAL_ONE (2 replica were required 
but only 1 acknowledged the write)
      com.datastax.driver.core.Responses$Error$1.decode (Responses.java:57)

      com.datastax.driver.core.Responses$Error$1.decode (Responses.java:37)

com.datastax.driver.core.Message$ProtocolDecoder.decode (Message.java:213)

com.datastax.driver.core.Message$ProtocolDecoder.decode (Message.java:204)

…etty.handler.codec.MessageToMessageDecoder.channelRead 
(MessageToMessageDecoder.java:89)

…hannel.AbstractChannelHandlerContext.invokeChannelRead 
(AbstractChannelHandlerContext.java:339)

….channel.AbstractChannelHandlerContext.fireChannelRead 
(AbstractChannelHandlerContext.java:324)

…etty.handler.codec.MessageToMessageDecoder.channelRead 
(MessageToMessageDecoder.java:103)

…hannel.AbstractChannelHandlerContext.invokeChannelRead 
(AbstractChannelHandlerContext.java:339)

….channel.AbstractChannelHandlerContext.fireChannelRead 
(AbstractChannelHandlerContext.java:324)

io.netty.handler.codec.ByteToMessageDecoder.channelRead 
(ByteToMessageDecoder.java:242)

…hannel.AbstractChannelHandlerContext.invokeChannelRead 
(AbstractChannelHandlerContext.java:339)

….channel.AbstractChannelHandlerContext.fireChannelRead 
(AbstractChannelHandlerContext.java:324)

io.netty.channel.DefaultChannelPipeline.fireChannelRead 
(DefaultChannelPipeline.java:847)

….channel.nio.AbstractNioByteChannel$NioByteUnsafe.read 
(AbstractNioByteChannel.java:131)

   io.netty.channel.nio.NioEventLoop.processSelectedKey (NioEventLoop.java:511)

….channel.nio.NioEventLoop.processSelectedKeysOptimized (NioEventLoop.java:468)

  io.netty.channel.nio.NioEventLoop.processSelectedKeys (NioEventLoop.java:382)

                  io.netty.channel.nio.NioEventLoop.run (NioEventLoop.java:354)

….netty.util.concurrent.SingleThreadEventExecutor$2.run 
(SingleThreadEventExecutor.java:111)

                                   java.lang.Thread.run (Thread.java:745)



# nodetool status
xss =  -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar 
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 
-XX:+CMSClassUnloadingEnabled -Xms8192M -Xmx8192M -Xmn2048M -Xss256k
Note: Ownership information does not include topology; for complete 
information, specify a keyspace
Datacenter: AUS
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address        Load       Tokens  Owns   Host ID                            
   Rack
UN  172.16.213.62  85.52 GB   256     11.7%  
27f2fd1d-5f3c-4691-a1f6-e28c1343e212  R1
UN  172.16.213.63  83.11 GB   256     12.2%  
4869f14b-e858-46c7-967c-60bd8260a149  R1
UN  172.16.213.64  80.91 GB   256     11.7%  
d4ad2495-cb24-4964-94d2-9e3f557054a4  R1
UN  172.16.213.66  84.11 GB   256     10.3%  
2a16c0dc-c36a-4196-89df-2de4f6b6cae5  R1
UN  172.16.144.75  95.2 GB    256     11.4%  
f87d6518-6c8e-49d9-a013-018bbedb8414  R1
Datacenter: TAM
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address        Load       Tokens  Owns   Host ID                            
   Rack
UJ  10.14.0.155    4.38 GB    256     ?      
c88bebae-737b-4ade-8f79-64f655036eee  R1
UN  10.14.0.106    81.57 GB   256     10.0%  
3b539927-b53a-4f50-9acd-d92fefbd84b9  R1
UN  10.14.0.107    80.23 GB   256     10.4%  
b70f674d-892f-42ff-a261-5356bee79e99  R1
UN  10.14.0.108    83.64 GB   256     11.2%  
6e24b17a-0b48-46b4-8edb-b0a9206314a3  R1
UN  10.14.0.109    91.02 GB   256     11.2%  
11f02dbd-257f-4623-81f4-b94db7365775  R1




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to