Cassandra WriteTimeoutException
Hi, I get the following error intermittently while writing to Cassandra. I am using version 2.1.7. Not sure how to fix the actual issue without increasing the timeout in cassandra.yaml. Regards, Amlan java.lang.RuntimeException: com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during write query at consistency ONE (1 replica were required but only 0 acknowledged the write) at backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:128) at backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:99) at backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:80) at backtype.storm.daemon.executor$fn__3441$fn__3453$fn__3500.invoke(executor.clj:748) at backtype.storm.util$async_loop$fn__464.invoke(util.clj:463) at clojure.lang.AFn.run(AFn.java:24) at java.lang.Thread.run(Thread.java:745) Caused by: com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during write query at consistency ONE (1 replica were required but only 0 acknowledged the write) at com.datastax.driver.core.exceptions.WriteTimeoutException.copy(WriteTimeoutException.java:54) at com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:289) at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:205) at com.cleartrip.analytics.realtime.air.personalization.bolt.CassandraBolt.execute(CassandraBolt.java:434) at backtype.storm.topology.BasicBoltExecutor.execute(BasicBoltExecutor.java:50) at backtype.storm.daemon.executor$fn__3441$tuple_action_fn__3443.invoke(executor.clj:633) at backtype.storm.daemon.executor$mk_task_receiver$fn__3364.invoke(executor.clj:401) at backtype.storm.disruptor$clojure_handler$reify__1447.onEvent(disruptor.clj:58) at backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:125) ... 6 more Caused by: com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during write query at consistency ONE (1 replica were required but only 0 acknowledged the write) at com.datastax.driver.core.exceptions.WriteTimeoutException.copy(WriteTimeoutException.java:54) at com.datastax.driver.core.Responses$Error.asException(Responses.java:99) at com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:140) at com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:249) at com.datastax.driver.core.RequestHandler.onSet(RequestHandler.java:433) at com.datastax.driver.core.Connection$Dispatcher.messageReceived(Connection.java:697) at com.datastax.shaded.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) at com.datastax.shaded.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) at com.datastax.shaded.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) at com.datastax.shaded.netty.channel.Channels.fireMessageReceived(Channels.java:296) at com.datastax.shaded.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:70) at com.datastax.shaded.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) at com.datastax.shaded.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) at com.datastax.shaded.netty.channel.Channels.fireMessageReceived(Channels.java:296) at com.datastax.shaded.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462) at com.datastax.shaded.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443) at com.datastax.shaded.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303) at com.datastax.shaded.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) at com.datastax.shaded.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) at com.datastax.shaded.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559) at com.datastax.shaded.netty.channel.Channels.fireMessageReceived(Channels.java:268) at com.datastax.shaded.netty.channel.Channels.fireMessageReceived(Channels.java:255) at com.datastax.shaded.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88) at com.datastax.shaded.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108) at com.datastax.shaded.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318) at com.datastax.shaded.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89) at com.datastax.shaded.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) at com.datastax.shaded.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) at
Re: Will virtual nodes have worse performance?
On Wed, Jul 15, 2015 at 9:28 AM, Benyi Wang bewang.t...@gmail.com wrote: I have a small cluster with 3 nodes and installed Cassandra 2.1.2 from DataStax YUM repository. I knew 2.1.2 is not recommended for production. I'm wondering what is the root cause of the worse performance for vnode: - num_token is too high for 3-node cluster? Wish I had a blog post to link to, but... briefly.. If you have RF=3 and under probably about 9 nodes, you probably just lose from having vnodes. This is because the overhead of dealing with so many ranges is not sufficiently counterbalanced by the advantages of for example significantly faster bootstrap. Vnode performance has been increasing all the time, because the other advantages are significant and so people are motivated to improve it. If you're interested in observing the significance of number of vnodes, just recreate your cluster with half, then half again tokens per node, testing each time? =Rob
Re: OpsCenter datastax-agent 300% CPU
OpsCenter 5.2 has a couple of fixes that may result in the symptoms you described: http://docs.datas tax.com/en/opscenter/5.2/opsc/release_notes/opscReleaseNotes520.html - Fixed issues with agent OOM when storing metrics for large numbers of tables. (OPSC-5934 - Improved handling of metrics overflow queue on agent. (OPSC-4618) It's also got a lot of other great new features -- http://docs.datastax.com/en/opscenter/5.2/opsc/online_help/services/opscPerformanceService.html Let us know if this stops once you upgrade. All the best, [image: datastax_logo.png] http://www.datastax.com/ Sebastián Estévez Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com [image: linkedin.png] https://www.linkedin.com/company/datastax [image: facebook.png] https://www.facebook.com/datastax [image: twitter.png] https://twitter.com/datastax [image: g+.png] https://plus.google.com/+Datastax/about http://feeds.feedburner.com/datastax http://cassandrasummit-datastax.com/ DataStax is the fastest, most scalable distributed database technology, delivering Apache Cassandra to the world’s most innovative enterprises. Datastax is built to be agile, always-on, and predictably scalable to any size. With more than 500 customers in 45 countries, DataStax is the database technology and transactional backbone of choice for the worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay. On Tue, Jul 14, 2015 at 4:40 PM, Mikhail Strebkov streb...@gmail.com wrote: Looks like it dies with OOM: https://gist.github.com/kluyg/03785041e16333015c2c On Tue, Jul 14, 2015 at 12:01 PM, Mikhail Strebkov streb...@gmail.com wrote: OpsCenter 5.1.3 and datastax-agent-5.1.3-standalone.jar On Tue, Jul 14, 2015 at 12:00 PM, Sebastian Estevez sebastian.este...@datastax.com wrote: What version of the agents and what version of OpsCenter are you running? I recently saw something like this and upgrading to matching versions fixed the issue. On Jul 14, 2015 2:58 PM, Mikhail Strebkov streb...@gmail.com wrote: Hi everyone, Recently I've noticed that most of the nodes have OpsCenter agents running at 300% CPU. Each node has 4 cores, so agents are using 75% of total available CPU. We're running 5 nodes with OpenSource Cassandra 2.1.8 in AWS using Community AMI. OpsCenter version is 5.1.3. We're using Oracle Java version 1.8.0_45. * PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND* 31501 cassandr 20 0 3599m 296m 14m S *339* 2.0 48:20.39 /opt/jdk/jdk1.8.0_45/bin/java -Xmx128M -Djclouds.mpu.parts.magnitude=10 -Djclouds.mpu.parts.size=16777216 -Dopscenter.ssl.trustStore=/var/lib/datastax-agent/ssl/agentKeyStore -Dopscenter.ssl.keyStore=/var/lib/datastax-agent/ssl/agentKeyStore -Dopscenter.ssl.keyStorePassword=opscenter -Dagent-pidfile=/var/run/datastax-agent/datastax-agent.pid -Dlog4j.configuration=file:/etc/datastax-agent/log4j.properties -Djava.security.auth.login.config=/etc/datastax-agent/kerberos.config -jar datastax-agent-5.1.3-standalone.jar /var/lib/datastax-agent/conf/address.yaml The logs from the agent looks strange to me: https://gist.github.com/kluyg/21f78af7adff0a940ed3 The cluster itself seems to be fine, the load is small, nothing bad in Cassandra system.log. Does anyone know what to tune to bring it back to normal? Thanks, Mikhail
Re: OpsCenter datastax-agent 300% CPU
Thanks, I think it got resolved after an update. Kind regards, Mikhail On Wed, Jul 15, 2015 at 2:04 PM, Sebastian Estevez sebastian.este...@datastax.com wrote: OpsCenter 5.2 has a couple of fixes that may result in the symptoms you described: http://docs.datas tax.com/en/opscenter/5.2/opsc/release_notes/opscReleaseNotes520.html - Fixed issues with agent OOM when storing metrics for large numbers of tables. (OPSC-5934 - Improved handling of metrics overflow queue on agent. (OPSC-4618) It's also got a lot of other great new features -- http://docs.datastax.com/en/opscenter/5.2/opsc/online_help/services/opscPerformanceService.html Let us know if this stops once you upgrade. All the best, [image: datastax_logo.png] http://www.datastax.com/ Sebastián Estévez Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com [image: linkedin.png] https://www.linkedin.com/company/datastax [image: facebook.png] https://www.facebook.com/datastax [image: twitter.png] https://twitter.com/datastax [image: g+.png] https://plus.google.com/+Datastax/about http://feeds.feedburner.com/datastax http://cassandrasummit-datastax.com/ DataStax is the fastest, most scalable distributed database technology, delivering Apache Cassandra to the world’s most innovative enterprises. Datastax is built to be agile, always-on, and predictably scalable to any size. With more than 500 customers in 45 countries, DataStax is the database technology and transactional backbone of choice for the worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay. On Tue, Jul 14, 2015 at 4:40 PM, Mikhail Strebkov streb...@gmail.com wrote: Looks like it dies with OOM: https://gist.github.com/kluyg/03785041e16333015c2c On Tue, Jul 14, 2015 at 12:01 PM, Mikhail Strebkov streb...@gmail.com wrote: OpsCenter 5.1.3 and datastax-agent-5.1.3-standalone.jar On Tue, Jul 14, 2015 at 12:00 PM, Sebastian Estevez sebastian.este...@datastax.com wrote: What version of the agents and what version of OpsCenter are you running? I recently saw something like this and upgrading to matching versions fixed the issue. On Jul 14, 2015 2:58 PM, Mikhail Strebkov streb...@gmail.com wrote: Hi everyone, Recently I've noticed that most of the nodes have OpsCenter agents running at 300% CPU. Each node has 4 cores, so agents are using 75% of total available CPU. We're running 5 nodes with OpenSource Cassandra 2.1.8 in AWS using Community AMI. OpsCenter version is 5.1.3. We're using Oracle Java version 1.8.0_45. * PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND* 31501 cassandr 20 0 3599m 296m 14m S *339* 2.0 48:20.39 /opt/jdk/jdk1.8.0_45/bin/java -Xmx128M -Djclouds.mpu.parts.magnitude=10 -Djclouds.mpu.parts.size=16777216 -Dopscenter.ssl.trustStore=/var/lib/datastax-agent/ssl/agentKeyStore -Dopscenter.ssl.keyStore=/var/lib/datastax-agent/ssl/agentKeyStore -Dopscenter.ssl.keyStorePassword=opscenter -Dagent-pidfile=/var/run/datastax-agent/datastax-agent.pid -Dlog4j.configuration=file:/etc/datastax-agent/log4j.properties -Djava.security.auth.login.config=/etc/datastax-agent/kerberos.config -jar datastax-agent-5.1.3-standalone.jar /var/lib/datastax-agent/conf/address.yaml The logs from the agent looks strange to me: https://gist.github.com/kluyg/21f78af7adff0a940ed3 The cluster itself seems to be fine, the load is small, nothing bad in Cassandra system.log. Does anyone know what to tune to bring it back to normal? Thanks, Mikhail
Re: Cassandra WriteTimeoutException
On 07/15/2015 02:28 AM, Amlan Roy wrote: Hi, I get the following error intermittently while writing to Cassandra. I am using version 2.1.7. Not sure how to fix the actual issue without increasing the timeout in cassandra.yaml. snip Post your data model, query, and maybe some cluster config basics for better help. Increasing the timeout is never a great answer.. -- Kind regards, Michael
Will virtual nodes have worse performance?
I have a small cluster with 3 nodes and installed Cassandra 2.1.2 from DataStax YUM repository. I knew 2.1.2 is not recommended for production. The problem I observed is: - When I use vnode with num_token=256, the read latency is about 20ms for 50 percentile. - If I disable vnode, the read latency is about 1ms for 50 percentile. I'm wondering what is the root cause of the worse performance for vnode: - Is ver 2.1.2 the root cause? - num_token is too high for 3-node cluster? Thanks.