[ 
https://issues.apache.org/jira/browse/CASSANDRA-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14297272#comment-14297272
 ] 

Jeff Liu commented on CASSANDRA-8696:
-------------------------------------

It's getting worse today after having the "failed to create snapshot" error. 4 
nodes of 6 are showing "UN" in nodetool status. I checked the system.log and 
syslog and couldn't find anything that tells why the node was down. The java 
cassandra process is also gone.

I tried to restart cassandra service on one node and watch system.log. Shortly 
after the service start running, cassandra starts to throw "failed to create 
snapshot" error for a while before it hung with no log output, nodetool status 
show error:"error: No nodes present in the cluster. Has this node finished 
starting up?". However the cassandra java process is still running in the 
system.

{noformat}
ps -ef | grep cass
110      18634     1 19 18:01 ?        00:01:39 java -ea 
-javaagent:/usr/share/cassandra/lib/jamm-0.2.8.jar -XX:+UseThreadPriorities 
-XX:ThreadPriorityPolicy=42 -Xms1792M -Xmx1792M -Xmn400M 
-XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:StringTableSize=1000003 
-XX:+UseParNewGC -XX
:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 
-XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 
-XX:+UseCMSInitiatingOccupancyOnly -XX:+UseTLAB -XX:+CMSClassUnloadingEnabled 
-XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC -XX
:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime 
-XX:+PrintPromotionFailure -Xloggc:/var/log/cassandra/gc-1422554504.log 
-XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=48M 
-Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=719
9 -Dcom.sun.management.jmxremote.ssl=false 
-Dcom.sun.management.jmxremote.authenticate=false 
-javaagent:/usr/share/java/graphite-reporter-agent-1.0-SNAPSHOT.jar=graphiteServer=metrics-a.hq.nest.com;graphitePort=2003;graphitePollInt=60
 -Dlogback.configurationFile=logback.xml -Dca
ssandra.logdir=/var/log/cassandra -Dcassandra.storagedir= 
-Dcassandra-pidfile=/var/run/cassandra/cassandra.pid -cp 
/etc/cassandra:/usr/share/cassandra/lib/airline-0.6.jar:/usr/share/cassandra/lib/antlr-runtime-3.5.2.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cas
sandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang3-3.1.jar:/usr/share/cassandra/lib/commons-math3-3.2.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.4.jar:/usr/share/cassandra/lib/disruptor-3.0.1.ja
r:/usr/share/cassandra/lib/guava-16.0.jar:/usr/share/cassandra/lib/high-scale-lib-1.0.6.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.2.8.jar:/usr/share/cassandra/lib/javax.inject.jar
:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jline-1.0.jar:/usr/share/cassandra/lib/jna-4.0.0.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.9.1.jar:/usr/share/cassandra/lib/logback-classic-1.1.2.jar:/usr/share/cassan
dra/lib/logback-core-1.1.2.jar:/usr/share/cassandra/lib/lz4-1.2.0.jar:/usr/share/cassandra/lib/metrics-core-2.2.0.jar:/usr/share/cassandra/lib/metrics-graphite-2.2.0.jar:/usr/share/cassandra/lib/mx4j-tools.jar:/usr/share/cassandra/lib/netty-all-4.0.23.Final.jar:/usr/share/cassan
dra/lib/reporter-config-2.1.0.jar:/usr/share/cassandra/lib/slf4j-api-1.7.2.jar:/usr/share/cassandra/lib/snakeyaml-1.11.jar:/usr/share/cassandra/lib/snappy-java-1.0.5.2.jar:/usr/share/cassandra/lib/stream-2.5.2.jar:/usr/share/cassandra/lib/stringtemplate-4.0.2.jar:/usr/share/cass
andra/lib/super-csv-2.1.0.jar:/usr/share/cassandra/lib/thrift-server-0.3.7.jar:/usr/share/cassandra/apache-cassandra-2.1.2.jar:/usr/share/cassandra/apache-cassandra-thrift-2.1.2.jar:/usr/share/cassandra/apache-cassandra.jar:/usr/share/cassandra/cassandra-driver-core-2.0.5.jar:/u
sr/share/cassandra/netty-3.9.0.Final.jar:/usr/share/cassandra/stress.jar: 
-XX:HeapDumpPath=/var/lib/cassandra/java_1422554504.hprof 
-XX:ErrorFile=/var/lib/cassandra/hs_err_1422554504.log 
org.apache.cassandra.service.CassandraDaemon
root     19317 19074  0 18:10 pts/1    00:00:00 grep --color=auto cass
#:/home/jliu# less /var/log/cassandra/system.log
{noformat}

{noformat}
#:/home/jliu# nodetool status
error: No nodes present in the cluster. Has this node finished starting up?
-- StackTrace --
java.lang.RuntimeException: No nodes present in the cluster. Has this node 
finished starting up?
        at 
org.apache.cassandra.dht.RandomPartitioner.describeOwnership(RandomPartitioner.java:143)
        at 
org.apache.cassandra.service.StorageService.getOwnership(StorageService.java:3702)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75)
        at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279)
        at 
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112)
        at 
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46)
        at 
com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)
        at 
com.sun.jmx.mbeanserver.PerInterface.getAttribute(PerInterface.java:83)
        at 
com.sun.jmx.mbeanserver.MBeanSupport.getAttribute(MBeanSupport.java:206)
        at 
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:647)
        at 
com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.java:678)
        at 
javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1464)
        at 
javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:97)
        at 
javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1328)
        at 
javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1420)
        at 
javax.management.remote.rmi.RMIConnectionImpl.getAttribute(RMIConnectionImpl.java:657)
        at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322)
{noformat}

> nodetool repair on cassandra 2.1.2 keyspaces return 
> java.lang.RuntimeException: Could not create snapshot
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-8696
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8696
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jeff Liu
>
> When trying to run nodetool repair -pr on cassandra node ( 2.1.2), cassandra 
> throw java exceptions: cannot create snapshot. 
> the error log from system.log:
> {noformat}
> INFO  [STREAM-IN-/10.97.9.110] 2015-01-28 02:07:28,815 
> StreamResultFuture.java:166 - [Stream #692c1450-a692-11e4-9973-070e938df227 
> ID#0] Prepare completed. Receiving 2 files(221187 bytes), sending 5 
> files(632105 bytes)
> INFO  [STREAM-IN-/10.97.9.110] 2015-01-28 02:07:29,046 
> StreamResultFuture.java:180 - [Stream #692c1450-a692-11e4-9973-070e938df227] 
> Session with /10.97.9.110 is complete
> INFO  [STREAM-IN-/10.97.9.110] 2015-01-28 02:07:29,046 
> StreamResultFuture.java:212 - [Stream #692c1450-a692-11e4-9973-070e938df227] 
> All sessions completed
> INFO  [STREAM-IN-/10.97.9.110] 2015-01-28 02:07:29,047 
> StreamingRepairTask.java:96 - [repair #685e3d00-a692-11e4-9973-070e938df227] 
> streaming task succeed, returning response to /10.98.194.68
> INFO  [RepairJobTask:1] 2015-01-28 02:07:29,065 StreamResultFuture.java:86 - 
> [Stream #692c6270-a692-11e4-9973-070e938df227] Executing streaming plan for 
> Repair
> INFO  [StreamConnectionEstablisher:4] 2015-01-28 02:07:29,065 
> StreamSession.java:213 - [Stream #692c6270-a692-11e4-9973-070e938df227] 
> Starting streaming to /10.66.187.201
> INFO  [StreamConnectionEstablisher:4] 2015-01-28 02:07:29,070 
> StreamCoordinator.java:209 - [Stream #692c6270-a692-11e4-9973-070e938df227, 
> ID#0] Beginning stream session with /10.66.187.201
> INFO  [STREAM-IN-/10.66.187.201] 2015-01-28 02:07:29,465 
> StreamResultFuture.java:166 - [Stream #692c6270-a692-11e4-9973-070e938df227 
> ID#0] Prepare completed. Receiving 5 files(627994 bytes), sending 5 
> files(632105 bytes)
> INFO  [StreamReceiveTask:22] 2015-01-28 02:07:31,971 
> StreamResultFuture.java:180 - [Stream #692c6270-a692-11e4-9973-070e938df227] 
> Session with /10.66.187.201 is complete
> INFO  [StreamReceiveTask:22] 2015-01-28 02:07:31,972 
> StreamResultFuture.java:212 - [Stream #692c6270-a692-11e4-9973-070e938df227] 
> All sessions completed
> INFO  [StreamReceiveTask:22] 2015-01-28 02:07:31,972 
> StreamingRepairTask.java:96 - [repair #685e3d00-a692-11e4-9973-070e938df227] 
> streaming task succeed, returning response to /10.98.194.68
> ERROR [RepairJobTask:1] 2015-01-28 02:07:39,444 RepairJob.java:127 - Error 
> occurred during snapshot phase
> java.lang.RuntimeException: Could not create snapshot at /10.97.9.110
>         at 
> org.apache.cassandra.repair.SnapshotTask$SnapshotCallback.onFailure(SnapshotTask.java:77)
>  ~[apache-cassandra-2.1.2.jar:2.1.2]
>         at 
> org.apache.cassandra.net.MessagingService$5$1.run(MessagingService.java:347) 
> ~[apache-cassandra-2.1.2.jar:2.1.2]
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 
> ~[na:1.7.0_45]
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262) 
> ~[na:1.7.0_45]
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_45]
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_45]
>         at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]
> INFO  [AntiEntropySessions:6] 2015-01-28 02:07:39,445 RepairSession.java:260 
> - [repair #6f85e740-a692-11e4-9973-070e938df227] new session: will sync 
> /10.98.194.68, /10.66.187.201, /10.226.218.135 on range 
> (12817179804668051873746972069086
> 2638799,128635403083592540777731520865977436165] for events.[bigint0text, 
> bigint0boolean, bigint0int, dataset_catalog, column_categories, 
> bigint0double, bigint0bigint]
> ERROR [AntiEntropySessions:5] 2015-01-28 02:07:39,445 RepairSession.java:303 
> - [repair #685e3d00-a692-11e4-9973-070e938df227] session completed with the 
> following error
> java.io.IOException: Failed during snapshot creation.
>         at 
> org.apache.cassandra.repair.RepairSession.failedSnapshot(RepairSession.java:344)
>  ~[apache-cassandra-2.1.2.jar:2.1.2]
>         at 
> org.apache.cassandra.repair.RepairJob$2.onFailure(RepairJob.java:128) 
> ~[apache-cassandra-2.1.2.jar:2.1.2]
>         at com.google.common.util.concurrent.Futures$4.run(Futures.java:1172) 
> ~[guava-16.0.jar:na]
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_45]
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_45]
>         at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]
> ERROR [AntiEntropySessions:5] 2015-01-28 02:07:39,446 
> CassandraDaemon.java:153 - Exception in thread 
> Thread[AntiEntropySessions:5,5,RMI Runtime]
> java.lang.RuntimeException: java.io.IOException: Failed during snapshot 
> creation.
>         at com.google.common.base.Throwables.propagate(Throwables.java:160) 
> ~[guava-16.0.jar:na]
>         at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32) 
> ~[apache-cassandra-2.1.2.jar:2.1.2]
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 
> ~[na:1.7.0_45]
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262) 
> ~[na:1.7.0_45]
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  ~[na:1.7.0_45]
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_45]
>         at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]
> Caused by: java.io.IOException: Failed during snapshot creation.
>         at 
> org.apache.cassandra.repair.RepairSession.failedSnapshot(RepairSession.java:344)
>  ~[apache-cassandra-2.1.2.jar:2.1.2]
>         at 
> org.apache.cassandra.repair.RepairJob$2.onFailure(RepairJob.java:128) 
> ~[apache-cassandra-2.1.2.jar:2.1.2]
>         at com.google.common.util.concurrent.Futures$4.run(Futures.java:1172) 
> ~[guava-16.0.jar:na]
>         ... 3 common frames omitted
> {noformat}
> The only change we did recently was to change tablespace replication factor 
> from 2 to 3 before seeing those errors. Also same time we start seeing 
> timeout errors from application. 
> the timeout error is something like:
> {noformat}
> core.exceptions.ReadTimeoutException: Cassandra timeout during read query at 
> consistency ONE (1 responses were required but only 0 replica responded)
>     at 
> com.datastax.driver.core.exceptions.ReadTimeoutException.copy(ReadTimeoutException.java:69)
>  ~[com.datastax.cassandra.cassandra-driver-core-2.1.3.jar:na]
>     at 
> com.datastax.driver.core.Responses$Error.asException(Responses.java:100) 
> ~[com.datastax.cassandra.cassandra-driver-core-2.1.3.jar:na]
>     at 
> com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:110)
>  ~[com.datastax.cassandra.cassandra-driver-core-2.1.3.jar:na]
>     at 
> com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:249)
>  ~[com.datastax.cassandra.cassandra-driver-core-2.1.3.jar:na]
>     at com.datastax.driver.core.RequestHandler.onSet(RequestHandler.java:433) 
> ~[com.datastax.cassandra.cassandra-driver-core-2.1.3.jar:na]
>     at 
> com.datastax.driver.core.Connection$Dispatcher.messageReceived(Connection.java:668)
>  ~[com.datastax.cassandra.cassandra-driver-core-2.1.3.jar:na]
>     at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
>  ~[io.netty.netty-3.9.0.Final.jar:na]
>     at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>  ~[io.netty.netty-3.9.0.Final.jar:na]
>     at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
>  ~[io.netty.netty-3.9.0.Final.jar:na]
>     at 
> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296) 
> ~[io.netty.netty-3.9.0.Final.jar:na]
>     at 
> org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:70)
>  ~[io.netty.netty-3.9.0.Final.jar:na]
>     at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>  ~[io.netty.netty-3.9.0.Final.jar:na]
>     at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
>  ~[io.netty.netty-3.9.0.Final.jar:na]
>     at 
> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296) 
> ~[io.netty.netty-3.9.0.Final.jar:na]
>     at 
> org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
>  ~[io.netty.netty-3.9.0.Final.jar:na]
>     at 
> org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
>  ~[io.netty.netty-3.9.0.Final.jar:na]
>     at 
> org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
>  ~[io.netty.netty-3.9.0.Final.jar:na]
>     at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
>  ~[io.netty.netty-3.9.0.Final.jar:na]
>     at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>  ~[io.netty.netty-3.9.0.Final.jar:na]
>     at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
>  ~[io.netty.netty-3.9.0.Final.jar:na]
>     at 
> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268) 
> ~[io.netty.netty-3.9.0.Final.jar:na]
>     at 
> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255) 
> ~[io.netty.netty-3.9.0.Final.jar:na]
>     at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88) 
> ~[io.netty.netty-3.9.0.Final.jar:na]
>     at 
> org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
>  ~[io.netty.netty-3.9.0.Final.jar:na]
>     at 
> org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
>  ~[io.netty.netty-3.9.0.Final.jar:na]
>     at 
> org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
>  ~[io.netty.netty-3.9.0.Final.jar:na]
>     at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) 
> ~[io.netty.netty-3.9.0.Final.jar:na]
>     at 
> org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
>  ~[io.netty.netty-3.9.0.Final.jar:na]
>     at 
> org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
>  ~[io.netty.netty-3.9.0.Final.jar:na]
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  ~[na:1.7.0_55]
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  ~[na:1.7.0_55]
>     at java.lang.Thread.run(Thread.java:745) ~[na:1.7.0_55]
> Caused by: com.datastax.driver.core.exceptions.ReadTimeoutException: 
> Cassandra timeout during read query at consistency ONE (1 responses were 
> required but only 0 replica responded)
>     at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:61) 
> ~[com.datastax.cassandra.cassandra-driver-core-2.1.3.jar:na]
>     at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:38) 
> ~[com.datastax.cassandra.cassandra-driver-core-2.1.3.jar:na]
>     at 
> com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:168) 
> ~[com.datastax.cassandra.cassandra-driver-core-2.1.3.jar:na]
>     at 
> org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:66)
>  ~[io.netty.netty-3.9.0.Final.jar:na]
>     ... 21 common frames omitted
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to