Cassandra WriteTimeoutException

2015-07-15 Thread Amlan Roy
Hi,

I get the following error intermittently while writing to Cassandra. I am using 
version 2.1.7. Not sure how to fix the actual issue without increasing the 
timeout in cassandra.yaml. 

Regards,
Amlan

java.lang.RuntimeException: 
com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout 
during write query at consistency ONE (1 replica were required but only 0 
acknowledged the write) at 
backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:128)
 at 
backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:99)
 at 
backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:80) 
at 
backtype.storm.daemon.executor$fn__3441$fn__3453$fn__3500.invoke(executor.clj:748)
 at backtype.storm.util$async_loop$fn__464.invoke(util.clj:463) at 
clojure.lang.AFn.run(AFn.java:24) at java.lang.Thread.run(Thread.java:745) 
Caused by: com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra 
timeout during write query at consistency ONE (1 replica were required but only 
0 acknowledged the write) at 
com.datastax.driver.core.exceptions.WriteTimeoutException.copy(WriteTimeoutException.java:54)
 at 
com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:289)
 at 
com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:205)
 at 
com.cleartrip.analytics.realtime.air.personalization.bolt.CassandraBolt.execute(CassandraBolt.java:434)
 at 
backtype.storm.topology.BasicBoltExecutor.execute(BasicBoltExecutor.java:50) at 
backtype.storm.daemon.executor$fn__3441$tuple_action_fn__3443.invoke(executor.clj:633)
 at 
backtype.storm.daemon.executor$mk_task_receiver$fn__3364.invoke(executor.clj:401)
 at 
backtype.storm.disruptor$clojure_handler$reify__1447.onEvent(disruptor.clj:58) 
at 
backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:125)
 ... 6 more Caused by: 
com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout 
during write query at consistency ONE (1 replica were required but only 0 
acknowledged the write) at 
com.datastax.driver.core.exceptions.WriteTimeoutException.copy(WriteTimeoutException.java:54)
 at com.datastax.driver.core.Responses$Error.asException(Responses.java:99) at 
com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:140)
 at 
com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:249) 
at com.datastax.driver.core.RequestHandler.onSet(RequestHandler.java:433) at 
com.datastax.driver.core.Connection$Dispatcher.messageReceived(Connection.java:697)
 at 
com.datastax.shaded.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
 at 
com.datastax.shaded.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
 at 
com.datastax.shaded.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
 at 
com.datastax.shaded.netty.channel.Channels.fireMessageReceived(Channels.java:296)
 at 
com.datastax.shaded.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:70)
 at 
com.datastax.shaded.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
 at 
com.datastax.shaded.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
 at 
com.datastax.shaded.netty.channel.Channels.fireMessageReceived(Channels.java:296)
 at 
com.datastax.shaded.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
 at 
com.datastax.shaded.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
 at 
com.datastax.shaded.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
 at 
com.datastax.shaded.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
 at 
com.datastax.shaded.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
 at 
com.datastax.shaded.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
 at 
com.datastax.shaded.netty.channel.Channels.fireMessageReceived(Channels.java:268)
 at 
com.datastax.shaded.netty.channel.Channels.fireMessageReceived(Channels.java:255)
 at 
com.datastax.shaded.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88) 
at 
com.datastax.shaded.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
 at 
com.datastax.shaded.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
 at 
com.datastax.shaded.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
 at 
com.datastax.shaded.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) 
at 
com.datastax.shaded.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
 at 

Re: Will virtual nodes have worse performance?

2015-07-15 Thread Robert Coli
On Wed, Jul 15, 2015 at 9:28 AM, Benyi Wang bewang.t...@gmail.com wrote:

 I have a small cluster with 3 nodes and installed Cassandra 2.1.2 from
 DataStax YUM repository. I knew 2.1.2 is not recommended for production.
 I'm wondering what is the root cause of the worse performance for vnode:

- num_token is too high for 3-node cluster?

 Wish I had a blog post to link to, but... briefly..

If you have RF=3 and under probably about 9 nodes, you probably just lose
from having vnodes.

This is because the overhead of dealing with so many ranges is not
sufficiently counterbalanced by the advantages of for example significantly
faster bootstrap.

Vnode performance has been increasing all the time, because the other
advantages are significant and so people are motivated to improve it.

If you're interested in observing the significance of number of vnodes,
just recreate your cluster with half, then half again tokens per node,
testing each time?

=Rob


Re: OpsCenter datastax-agent 300% CPU

2015-07-15 Thread Sebastian Estevez
OpsCenter 5.2 has a couple of fixes that may result in the symptoms you 
described:
http://docs.datas
tax.com/en/opscenter/5.2/opsc/release_notes/opscReleaseNotes520.html


   - Fixed issues with agent OOM when storing metrics for large numbers of 
   tables. (OPSC-5934
   - Improved handling of metrics overflow queue on agent. (OPSC-4618)


It's also got a lot of other great new features -- 
http://docs.datastax.com/en/opscenter/5.2/opsc/online_help/services/opscPerformanceService.html

Let us know if this stops once you upgrade.

All the best,


[image: datastax_logo.png] http://www.datastax.com/

Sebastián Estévez

Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com

[image: linkedin.png] https://www.linkedin.com/company/datastax [image: 
facebook.png] https://www.facebook.com/datastax [image: twitter.png] 
https://twitter.com/datastax [image: g+.png] 
https://plus.google.com/+Datastax/about 
http://feeds.feedburner.com/datastax

http://cassandrasummit-datastax.com/

DataStax is the fastest, most scalable distributed database technology, 
delivering Apache Cassandra to the world’s most innovative enterprises. 
Datastax is built to be agile, always-on, and predictably scalable to any 
size. With more than 500 customers in 45 countries, DataStax is the 
database technology and transactional backbone of choice for the worlds 
most innovative companies such as Netflix, Adobe, Intuit, and eBay. 

On Tue, Jul 14, 2015 at 4:40 PM, Mikhail Strebkov streb...@gmail.com 
wrote:

 Looks like it dies with OOM: 
 https://gist.github.com/kluyg/03785041e16333015c2c

 On Tue, Jul 14, 2015 at 12:01 PM, Mikhail Strebkov streb...@gmail.com 
 wrote:

 OpsCenter 5.1.3 and datastax-agent-5.1.3-standalone.jar

 On Tue, Jul 14, 2015 at 12:00 PM, Sebastian Estevez 
 sebastian.este...@datastax.com wrote:

 What version of the agents and what version of OpsCenter are you running?

 I recently saw something like this and upgrading to matching versions 
 fixed the issue.
 On Jul 14, 2015 2:58 PM, Mikhail Strebkov streb...@gmail.com wrote:

 Hi everyone,

 Recently I've noticed that most of the nodes have OpsCenter agents 
 running at 300% CPU. Each node has 4 cores, so agents are using 75% of 
 total available CPU.

 We're running 5 nodes with OpenSource Cassandra 2.1.8 in AWS using 
 Community AMI. OpsCenter version is 5.1.3. We're using Oracle Java version 
 1.8.0_45.

 *  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND*
 31501 cassandr  20   0 3599m 296m  14m S  *339*  2.0  48:20.39 
 /opt/jdk/jdk1.8.0_45/bin/java -Xmx128M 
 -Djclouds.mpu.parts.magnitude=10 
 -Djclouds.mpu.parts.size=16777216 
 -Dopscenter.ssl.trustStore=/var/lib/datastax-agent/ssl/agentKeyStore 
 -Dopscenter.ssl.keyStore=/var/lib/datastax-agent/ssl/agentKeyStore 
 -Dopscenter.ssl.keyStorePassword=opscenter 
 -Dagent-pidfile=/var/run/datastax-agent/datastax-agent.pid 
 -Dlog4j.configuration=file:/etc/datastax-agent/log4j.properties 
 -Djava.security.auth.login.config=/etc/datastax-agent/kerberos.config -jar 
 datastax-agent-5.1.3-standalone.jar 
 /var/lib/datastax-agent/conf/address.yaml

 The logs from the agent looks strange to me: 
 https://gist.github.com/kluyg/21f78af7adff0a940ed3

 The cluster itself seems to be fine, the load is small, nothing bad in 
 Cassandra system.log.

 Does anyone know what to tune to bring it back to normal?

 Thanks,
 Mikhail





Re: OpsCenter datastax-agent 300% CPU

2015-07-15 Thread Mikhail Strebkov
Thanks, I think it got resolved after an update.

Kind regards,
Mikhail

On Wed, Jul 15, 2015 at 2:04 PM, Sebastian Estevez 
sebastian.este...@datastax.com wrote:

 OpsCenter 5.2 has a couple of fixes that may result in the symptoms you
 described:
 http://docs.datas
 tax.com/en/opscenter/5.2/opsc/release_notes/opscReleaseNotes520.html


- Fixed issues with agent OOM when storing metrics for large numbers
of tables. (OPSC-5934
- Improved handling of metrics overflow queue on agent. (OPSC-4618)


 It's also got a lot of other great new features --
 http://docs.datastax.com/en/opscenter/5.2/opsc/online_help/services/opscPerformanceService.html

 Let us know if this stops once you upgrade.

 All the best,


 [image: datastax_logo.png] http://www.datastax.com/

 Sebastián Estévez

 Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com

 [image: linkedin.png] https://www.linkedin.com/company/datastax [image:
 facebook.png] https://www.facebook.com/datastax [image: twitter.png]
 https://twitter.com/datastax [image: g+.png]
 https://plus.google.com/+Datastax/about
 http://feeds.feedburner.com/datastax

 http://cassandrasummit-datastax.com/

 DataStax is the fastest, most scalable distributed database technology,
 delivering Apache Cassandra to the world’s most innovative enterprises.
 Datastax is built to be agile, always-on, and predictably scalable to any
 size. With more than 500 customers in 45 countries, DataStax is the
 database technology and transactional backbone of choice for the worlds
 most innovative companies such as Netflix, Adobe, Intuit, and eBay.

 On Tue, Jul 14, 2015 at 4:40 PM, Mikhail Strebkov streb...@gmail.com
 wrote:

 Looks like it dies with OOM:
 https://gist.github.com/kluyg/03785041e16333015c2c

 On Tue, Jul 14, 2015 at 12:01 PM, Mikhail Strebkov streb...@gmail.com
 wrote:

 OpsCenter 5.1.3 and datastax-agent-5.1.3-standalone.jar

 On Tue, Jul 14, 2015 at 12:00 PM, Sebastian Estevez 
 sebastian.este...@datastax.com wrote:

 What version of the agents and what version of OpsCenter are you
 running?

 I recently saw something like this and upgrading to matching versions
 fixed the issue.
 On Jul 14, 2015 2:58 PM, Mikhail Strebkov streb...@gmail.com wrote:

 Hi everyone,

 Recently I've noticed that most of the nodes have OpsCenter agents
 running at 300% CPU. Each node has 4 cores, so agents are using 75% of
 total available CPU.

 We're running 5 nodes with OpenSource Cassandra 2.1.8 in AWS using
 Community AMI. OpsCenter version is 5.1.3. We're using Oracle Java version
 1.8.0_45.

 *  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND*
 31501 cassandr  20   0 3599m 296m  14m S  *339*  2.0  48:20.39
 /opt/jdk/jdk1.8.0_45/bin/java -Xmx128M 
 -Djclouds.mpu.parts.magnitude=10
 -Djclouds.mpu.parts.size=16777216
 -Dopscenter.ssl.trustStore=/var/lib/datastax-agent/ssl/agentKeyStore
 -Dopscenter.ssl.keyStore=/var/lib/datastax-agent/ssl/agentKeyStore
 -Dopscenter.ssl.keyStorePassword=opscenter
 -Dagent-pidfile=/var/run/datastax-agent/datastax-agent.pid
 -Dlog4j.configuration=file:/etc/datastax-agent/log4j.properties
 -Djava.security.auth.login.config=/etc/datastax-agent/kerberos.config -jar
 datastax-agent-5.1.3-standalone.jar
 /var/lib/datastax-agent/conf/address.yaml

 The logs from the agent looks strange to me:
 https://gist.github.com/kluyg/21f78af7adff0a940ed3

 The cluster itself seems to be fine, the load is small, nothing bad in
 Cassandra system.log.

 Does anyone know what to tune to bring it back to normal?

 Thanks,
 Mikhail







Re: Cassandra WriteTimeoutException

2015-07-15 Thread Michael Shuler

On 07/15/2015 02:28 AM, Amlan Roy wrote:

Hi,

I get the following error intermittently while writing to Cassandra.
I am using version 2.1.7. Not sure how to fix the actual issue
without increasing the timeout in cassandra.yaml.


snip

Post your data model, query, and maybe some cluster config basics for 
better help. Increasing the timeout is never a great answer..


--
Kind regards,
Michael


Will virtual nodes have worse performance?

2015-07-15 Thread Benyi Wang
I have a small cluster with 3 nodes and installed Cassandra 2.1.2 from
DataStax YUM repository. I knew 2.1.2 is not recommended for production.

The problem I observed is:

   - When I use vnode with num_token=256, the read latency is about 20ms
   for 50 percentile.
   - If I disable vnode, the read latency is about 1ms for 50 percentile.

I'm wondering what is the root cause of the worse performance for vnode:

   - Is ver 2.1.2 the root cause?
   - num_token is too high for 3-node cluster?

Thanks.