Re: Bootstrap fails on 3.10
I proposed a quite simple fix for https://issues.apache.org/jira/browse/CASSANDRA-12905 Sorry that I don't supply a patch. I am good at analysing code but totally unexperienced with the workflows here. 2016-11-25 19:57 GMT+01:00 Benjamin Roth : > Yes, I have MVs. > > Interesting is also that in the middle of bootstrapping (cannot tell when > exactly) it seemed like other nodes started to send hints to the > bootstrapping node. When that happened, it seems that every single HintVerb > fails also with a WTE. At least the logs are completely flooded with WTE. > When I paused hints on all other nodes, logs were quiet again. > > I completely restarted the bootstrap (deleted /var/lib/cassandra) - this > time with hints paused from the beginning. We will see if that changes > anything. > > I find it also quite weird that other nodes have hints for a bootstrapping > node. Is that intended behaviour? > And is it possible that streaming locks the whole CF? I looked like > absolutely no hint could be delivered successfully. > > 2016-11-25 19:43 GMT+01:00 Paulo Motta : > >> If you have an MV table It seems you're hitting https://issues.apache. >> org/jira/browse/CASSANDRA-12905. I will bump it's priority to critical >> since it can prevent or difficult bootstrap. >> >> Did you try resuming bootstrap with "nodetool bootstrap resume" after the >> failure? It may eventually succeed, since this is an MV lock contention >> problem. >> >> 2016-11-25 15:59 GMT-02:00 Benjamin Roth : >> >>> Hi! >>> >>> Today I wanted a new node to join the cluster. >>> When looking at netstats on all the old nodes, it seemed like the >>> streaming sessions did complete. >>> They all said that all files have been transferred. But looking at the >>> debug.log the stream sessions finished with an error. >>> Also after all streams have been done the node remains in state >>> "JOINING". >>> >>> See logs: >>> >>> debug.log, last words >>> >>> ERROR [StreamReceiveTask:94] 2016-11-25 17:50:51,712 >>> StreamSession.java:593 - [Stream #b998aec0-b2fd-11e6-a63d-75828fa8d45c] >>> Streaming error occurred on session with peer 10.23.71.6 >>> org.apache.cassandra.exceptions.WriteTimeoutException: Operation timed >>> out - received only 0 responses. >>> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:525) >>> ~[apache-cassandra-3.10.jar:3.10] >>> at org.apache.cassandra.db.Keyspace.applyNotDeferrable(Keyspace.java:440) >>> ~[apache-cassandra-3.10.jar:3.10] >>> at org.apache.cassandra.db.Mutation.apply(Mutation.java:223) >>> ~[apache-cassandra-3.10.jar:3.10] >>> at org.apache.cassandra.db.Mutation.applyUnsafe(Mutation.java:242) >>> ~[apache-cassandra-3.10.jar:3.10] >>> at org.apache.cassandra.streaming.StreamReceiveTask$OnCompletio >>> nRunnable.run(StreamReceiveTask.java:205) ~[apache-cassandra-3.10.jar:3. >>> 10] >>> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) >>> [na:1.8.0_102] >>> at java.util.concurrent.FutureTask.run(FutureTask.java:266) >>> [na:1.8.0_102] >>> at >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >>> [na:1.8.0_102] >>> at >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >>> [na:1.8.0_102] >>> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_102] >>> DEBUG [STREAM-OUT-/10.23.71.6:7000] 2016-11-25 17:50:51,713 >>> ConnectionHandler.java:388 - [Stream #b998aec0-b2fd-11e6-a63d-75828fa8d45c] >>> Sending Session Failed >>> DEBUG [StreamReceiveTask:94] 2016-11-25 17:50:51,713 >>> StreamSession.java:472 - [Stream #b998aec0-b2fd-11e6-a63d-75828fa8d45c] >>> Finishing keep-alive task. >>> DEBUG [StreamReceiveTask:94] 2016-11-25 17:50:51,713 >>> ConnectionHandler.java:120 - [Stream #b998aec0-b2fd-11e6-a63d-75828fa8d45c] >>> Closing stream connection handler on /10.23.71.6 >>> INFO [StreamReceiveTask:94] 2016-11-25 17:50:51,719 >>> StreamResultFuture.java:187 - [Stream #b998aec0-b2fd-11e6-a63d-75828fa8d45c] >>> Session with /10.23.71.6 is complete >>> DEBUG [StreamReceiveTask:94] 2016-11-25 17:50:51,719 >>> StreamCoordinator.java:146 - Finished connecting all sessions >>> WARN [StreamReceiveTask:94] 2016-11-25 17:50:51,723 >>> StreamResultFuture.java:214 - [Stream #b998aec0-b2fd-11e6-a63d-75828fa8d45c] >>> Stream failed >>> ERROR [main] 2016-11-25 17:50:51,724 StorageService.java:1493 - Error >>> while waiting on bootstrap to complete. Bootstrap will have to be restarted. >>> java.util.concurrent.ExecutionException: >>> org.apache.cassandra.streaming.StreamException: >>> Stream failed >>> at >>> com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299) >>> ~[guava-18.0.jar:na] >>> at >>> com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286) >>> ~[guava-18.0.jar:na] >>> at >>> com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) >>> ~[guava-18.0.jar:na] >>> at >>> org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1488)
Re: Bootstrap fails on 3.10
Yes, I have MVs. Interesting is also that in the middle of bootstrapping (cannot tell when exactly) it seemed like other nodes started to send hints to the bootstrapping node. When that happened, it seems that every single HintVerb fails also with a WTE. At least the logs are completely flooded with WTE. When I paused hints on all other nodes, logs were quiet again. I completely restarted the bootstrap (deleted /var/lib/cassandra) - this time with hints paused from the beginning. We will see if that changes anything. I find it also quite weird that other nodes have hints for a bootstrapping node. Is that intended behaviour? And is it possible that streaming locks the whole CF? I looked like absolutely no hint could be delivered successfully. 2016-11-25 19:43 GMT+01:00 Paulo Motta : > If you have an MV table It seems you're hitting https://issues.apache. > org/jira/browse/CASSANDRA-12905. I will bump it's priority to critical > since it can prevent or difficult bootstrap. > > Did you try resuming bootstrap with "nodetool bootstrap resume" after the > failure? It may eventually succeed, since this is an MV lock contention > problem. > > 2016-11-25 15:59 GMT-02:00 Benjamin Roth : > >> Hi! >> >> Today I wanted a new node to join the cluster. >> When looking at netstats on all the old nodes, it seemed like the >> streaming sessions did complete. >> They all said that all files have been transferred. But looking at the >> debug.log the stream sessions finished with an error. >> Also after all streams have been done the node remains in state "JOINING". >> >> See logs: >> >> debug.log, last words >> >> ERROR [StreamReceiveTask:94] 2016-11-25 17:50:51,712 >> StreamSession.java:593 - [Stream #b998aec0-b2fd-11e6-a63d-75828fa8d45c] >> Streaming error occurred on session with peer 10.23.71.6 >> org.apache.cassandra.exceptions.WriteTimeoutException: Operation timed >> out - received only 0 responses. >> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:525) >> ~[apache-cassandra-3.10.jar:3.10] >> at org.apache.cassandra.db.Keyspace.applyNotDeferrable(Keyspace.java:440) >> ~[apache-cassandra-3.10.jar:3.10] >> at org.apache.cassandra.db.Mutation.apply(Mutation.java:223) >> ~[apache-cassandra-3.10.jar:3.10] >> at org.apache.cassandra.db.Mutation.applyUnsafe(Mutation.java:242) >> ~[apache-cassandra-3.10.jar:3.10] >> at org.apache.cassandra.streaming.StreamReceiveTask$OnCompletio >> nRunnable.run(StreamReceiveTask.java:205) ~[apache-cassandra-3.10.jar:3. >> 10] >> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) >> [na:1.8.0_102] >> at java.util.concurrent.FutureTask.run(FutureTask.java:266) >> [na:1.8.0_102] >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >> [na:1.8.0_102] >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >> [na:1.8.0_102] >> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_102] >> DEBUG [STREAM-OUT-/10.23.71.6:7000] 2016-11-25 17:50:51,713 >> ConnectionHandler.java:388 - [Stream #b998aec0-b2fd-11e6-a63d-75828fa8d45c] >> Sending Session Failed >> DEBUG [StreamReceiveTask:94] 2016-11-25 17:50:51,713 >> StreamSession.java:472 - [Stream #b998aec0-b2fd-11e6-a63d-75828fa8d45c] >> Finishing keep-alive task. >> DEBUG [StreamReceiveTask:94] 2016-11-25 17:50:51,713 >> ConnectionHandler.java:120 - [Stream #b998aec0-b2fd-11e6-a63d-75828fa8d45c] >> Closing stream connection handler on /10.23.71.6 >> INFO [StreamReceiveTask:94] 2016-11-25 17:50:51,719 >> StreamResultFuture.java:187 - [Stream #b998aec0-b2fd-11e6-a63d-75828fa8d45c] >> Session with /10.23.71.6 is complete >> DEBUG [StreamReceiveTask:94] 2016-11-25 17:50:51,719 >> StreamCoordinator.java:146 - Finished connecting all sessions >> WARN [StreamReceiveTask:94] 2016-11-25 17:50:51,723 >> StreamResultFuture.java:214 - [Stream #b998aec0-b2fd-11e6-a63d-75828fa8d45c] >> Stream failed >> ERROR [main] 2016-11-25 17:50:51,724 StorageService.java:1493 - Error >> while waiting on bootstrap to complete. Bootstrap will have to be restarted. >> java.util.concurrent.ExecutionException: >> org.apache.cassandra.streaming.StreamException: >> Stream failed >> at com.google.common.util.concurrent.AbstractFuture$Sync. >> getValue(AbstractFuture.java:299) ~[guava-18.0.jar:na] >> at >> com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286) >> ~[guava-18.0.jar:na] >> at >> com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) >> ~[guava-18.0.jar:na] >> at >> org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1488) >> [apache-cassandra-3.10.jar:3.10] >> at >> org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:948) >> [apache-cassandra-3.10.jar:3.10] >> at >> org.apache.cassandra.service.StorageService.initServer(StorageService.java:667) >> [apache-cassandra-3.10.jar:3.10] >> at >> org.apache.cassandra.service.StorageService.initServer(StorageService
Re: Bootstrap fails on 3.10
If you have an MV table It seems you're hitting https://issues.apache.org/jira/browse/CASSANDRA-12905. I will bump it's priority to critical since it can prevent or difficult bootstrap. Did you try resuming bootstrap with "nodetool bootstrap resume" after the failure? It may eventually succeed, since this is an MV lock contention problem. 2016-11-25 15:59 GMT-02:00 Benjamin Roth : > Hi! > > Today I wanted a new node to join the cluster. > When looking at netstats on all the old nodes, it seemed like the > streaming sessions did complete. > They all said that all files have been transferred. But looking at the > debug.log the stream sessions finished with an error. > Also after all streams have been done the node remains in state "JOINING". > > See logs: > > debug.log, last words > > ERROR [StreamReceiveTask:94] 2016-11-25 17:50:51,712 > StreamSession.java:593 - [Stream #b998aec0-b2fd-11e6-a63d-75828fa8d45c] > Streaming error occurred on session with peer 10.23.71.6 > org.apache.cassandra.exceptions.WriteTimeoutException: Operation timed > out - received only 0 responses. > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:525) > ~[apache-cassandra-3.10.jar:3.10] > at org.apache.cassandra.db.Keyspace.applyNotDeferrable(Keyspace.java:440) > ~[apache-cassandra-3.10.jar:3.10] > at org.apache.cassandra.db.Mutation.apply(Mutation.java:223) > ~[apache-cassandra-3.10.jar:3.10] > at org.apache.cassandra.db.Mutation.applyUnsafe(Mutation.java:242) > ~[apache-cassandra-3.10.jar:3.10] > at org.apache.cassandra.streaming.StreamReceiveTask$ > OnCompletionRunnable.run(StreamReceiveTask.java:205) > ~[apache-cassandra-3.10.jar:3.10] > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [na:1.8.0_102] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_102] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [na:1.8.0_102] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_102] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_102] > DEBUG [STREAM-OUT-/10.23.71.6:7000] 2016-11-25 17:50:51,713 > ConnectionHandler.java:388 - [Stream #b998aec0-b2fd-11e6-a63d-75828fa8d45c] > Sending Session Failed > DEBUG [StreamReceiveTask:94] 2016-11-25 17:50:51,713 > StreamSession.java:472 - [Stream #b998aec0-b2fd-11e6-a63d-75828fa8d45c] > Finishing keep-alive task. > DEBUG [StreamReceiveTask:94] 2016-11-25 17:50:51,713 > ConnectionHandler.java:120 - [Stream #b998aec0-b2fd-11e6-a63d-75828fa8d45c] > Closing stream connection handler on /10.23.71.6 > INFO [StreamReceiveTask:94] 2016-11-25 17:50:51,719 > StreamResultFuture.java:187 - [Stream #b998aec0-b2fd-11e6-a63d-75828fa8d45c] > Session with /10.23.71.6 is complete > DEBUG [StreamReceiveTask:94] 2016-11-25 17:50:51,719 > StreamCoordinator.java:146 - Finished connecting all sessions > WARN [StreamReceiveTask:94] 2016-11-25 17:50:51,723 > StreamResultFuture.java:214 - [Stream #b998aec0-b2fd-11e6-a63d-75828fa8d45c] > Stream failed > ERROR [main] 2016-11-25 17:50:51,724 StorageService.java:1493 - Error > while waiting on bootstrap to complete. Bootstrap will have to be restarted. > java.util.concurrent.ExecutionException: > org.apache.cassandra.streaming.StreamException: > Stream failed > at com.google.common.util.concurrent.AbstractFuture$ > Sync.getValue(AbstractFuture.java:299) ~[guava-18.0.jar:na] > at com.google.common.util.concurrent.AbstractFuture$ > Sync.get(AbstractFuture.java:286) ~[guava-18.0.jar:na] > at > com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) > ~[guava-18.0.jar:na] > at > org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1488) > [apache-cassandra-3.10.jar:3.10] > at > org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:948) > [apache-cassandra-3.10.jar:3.10] > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:667) > [apache-cassandra-3.10.jar:3.10] > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:598) > [apache-cassandra-3.10.jar:3.10] > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:394) > [apache-cassandra-3.10.jar:3.10] > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:601) > [apache-cassandra-3.10.jar:3.10] > at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:735) > [apache-cassandra-3.10.jar:3.10] > Caused by: org.apache.cassandra.streaming.StreamException: Stream failed > at > org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:215) > ~[apache-cassandra-3.10.jar:3.10] > at org.apache.cassandra.streaming.StreamResultFuture. > handleSessionComplete(StreamResultFuture.java:191) > ~[apache-cassandra-3.10.jar:3.10] > at org.apache.cassandra.streaming.StreamSession. > closeSession(StreamSession.java:481) ~[apache-cassandra-3.10.jar:3.10] > at org.apache.