Re: Bootstrap fails on 3.10

2016-11-25 Thread Benjamin Roth
I proposed a quite simple fix for
https://issues.apache.org/jira/browse/CASSANDRA-12905

Sorry that I don't supply a patch. I am good at analysing code but totally
unexperienced with the workflows here.

2016-11-25 19:57 GMT+01:00 Benjamin Roth :

> Yes, I have MVs.
>
> Interesting is also that in the middle of bootstrapping (cannot tell when
> exactly) it seemed like other nodes started to send hints to the
> bootstrapping node. When that happened, it seems that every single HintVerb
> fails also with a WTE. At least the logs are completely flooded with WTE.
> When I paused hints on all other nodes, logs were quiet again.
>
> I completely restarted the bootstrap (deleted /var/lib/cassandra) - this
> time with hints paused from the beginning. We will see if that changes
> anything.
>
> I find it also quite weird that other nodes have hints for a bootstrapping
> node. Is that intended behaviour?
> And is it possible that streaming locks the whole CF? I looked like
> absolutely no hint could be delivered successfully.
>
> 2016-11-25 19:43 GMT+01:00 Paulo Motta :
>
>> If you have an MV table It seems you're hitting https://issues.apache.
>> org/jira/browse/CASSANDRA-12905. I will bump it's priority to critical
>> since it can prevent or difficult bootstrap.
>>
>> Did you try resuming bootstrap with "nodetool bootstrap resume" after the
>> failure? It may eventually succeed, since this is an MV lock contention
>> problem.
>>
>> 2016-11-25 15:59 GMT-02:00 Benjamin Roth :
>>
>>> Hi!
>>>
>>> Today I wanted a new node to join the cluster.
>>> When looking at netstats on all the old nodes, it seemed like the
>>> streaming sessions did complete.
>>> They all said that all files have been transferred. But looking at the
>>> debug.log the stream sessions finished with an error.
>>> Also after all streams have been done the node remains in state
>>> "JOINING".
>>>
>>> See logs:
>>>
>>> debug.log, last words
>>> 
>>> ERROR [StreamReceiveTask:94] 2016-11-25 17:50:51,712
>>> StreamSession.java:593 - [Stream #b998aec0-b2fd-11e6-a63d-75828fa8d45c]
>>> Streaming error occurred on session with peer 10.23.71.6
>>> org.apache.cassandra.exceptions.WriteTimeoutException: Operation timed
>>> out - received only 0 responses.
>>> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:525)
>>> ~[apache-cassandra-3.10.jar:3.10]
>>> at org.apache.cassandra.db.Keyspace.applyNotDeferrable(Keyspace.java:440)
>>> ~[apache-cassandra-3.10.jar:3.10]
>>> at org.apache.cassandra.db.Mutation.apply(Mutation.java:223)
>>> ~[apache-cassandra-3.10.jar:3.10]
>>> at org.apache.cassandra.db.Mutation.applyUnsafe(Mutation.java:242)
>>> ~[apache-cassandra-3.10.jar:3.10]
>>> at org.apache.cassandra.streaming.StreamReceiveTask$OnCompletio
>>> nRunnable.run(StreamReceiveTask.java:205) ~[apache-cassandra-3.10.jar:3.
>>> 10]
>>> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>>> [na:1.8.0_102]
>>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>> [na:1.8.0_102]
>>> at 
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>> [na:1.8.0_102]
>>> at 
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>> [na:1.8.0_102]
>>> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_102]
>>> DEBUG [STREAM-OUT-/10.23.71.6:7000] 2016-11-25 17:50:51,713
>>> ConnectionHandler.java:388 - [Stream #b998aec0-b2fd-11e6-a63d-75828fa8d45c]
>>> Sending Session Failed
>>> DEBUG [StreamReceiveTask:94] 2016-11-25 17:50:51,713
>>> StreamSession.java:472 - [Stream #b998aec0-b2fd-11e6-a63d-75828fa8d45c]
>>> Finishing keep-alive task.
>>> DEBUG [StreamReceiveTask:94] 2016-11-25 17:50:51,713
>>> ConnectionHandler.java:120 - [Stream #b998aec0-b2fd-11e6-a63d-75828fa8d45c]
>>> Closing stream connection handler on /10.23.71.6
>>> INFO  [StreamReceiveTask:94] 2016-11-25 17:50:51,719
>>> StreamResultFuture.java:187 - [Stream #b998aec0-b2fd-11e6-a63d-75828fa8d45c]
>>> Session with /10.23.71.6 is complete
>>> DEBUG [StreamReceiveTask:94] 2016-11-25 17:50:51,719
>>> StreamCoordinator.java:146 - Finished connecting all sessions
>>> WARN  [StreamReceiveTask:94] 2016-11-25 17:50:51,723
>>> StreamResultFuture.java:214 - [Stream #b998aec0-b2fd-11e6-a63d-75828fa8d45c]
>>> Stream failed
>>> ERROR [main] 2016-11-25 17:50:51,724 StorageService.java:1493 - Error
>>> while waiting on bootstrap to complete. Bootstrap will have to be restarted.
>>> java.util.concurrent.ExecutionException: 
>>> org.apache.cassandra.streaming.StreamException:
>>> Stream failed
>>> at 
>>> com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299)
>>> ~[guava-18.0.jar:na]
>>> at 
>>> com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286)
>>> ~[guava-18.0.jar:na]
>>> at 
>>> com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
>>> ~[guava-18.0.jar:na]
>>> at 
>>> 

Re: Bootstrap fails on 3.10

2016-11-25 Thread Benjamin Roth
Yes, I have MVs.

Interesting is also that in the middle of bootstrapping (cannot tell when
exactly) it seemed like other nodes started to send hints to the
bootstrapping node. When that happened, it seems that every single HintVerb
fails also with a WTE. At least the logs are completely flooded with WTE.
When I paused hints on all other nodes, logs were quiet again.

I completely restarted the bootstrap (deleted /var/lib/cassandra) - this
time with hints paused from the beginning. We will see if that changes
anything.

I find it also quite weird that other nodes have hints for a bootstrapping
node. Is that intended behaviour?
And is it possible that streaming locks the whole CF? I looked like
absolutely no hint could be delivered successfully.

2016-11-25 19:43 GMT+01:00 Paulo Motta :

> If you have an MV table It seems you're hitting https://issues.apache.
> org/jira/browse/CASSANDRA-12905. I will bump it's priority to critical
> since it can prevent or difficult bootstrap.
>
> Did you try resuming bootstrap with "nodetool bootstrap resume" after the
> failure? It may eventually succeed, since this is an MV lock contention
> problem.
>
> 2016-11-25 15:59 GMT-02:00 Benjamin Roth :
>
>> Hi!
>>
>> Today I wanted a new node to join the cluster.
>> When looking at netstats on all the old nodes, it seemed like the
>> streaming sessions did complete.
>> They all said that all files have been transferred. But looking at the
>> debug.log the stream sessions finished with an error.
>> Also after all streams have been done the node remains in state "JOINING".
>>
>> See logs:
>>
>> debug.log, last words
>> 
>> ERROR [StreamReceiveTask:94] 2016-11-25 17:50:51,712
>> StreamSession.java:593 - [Stream #b998aec0-b2fd-11e6-a63d-75828fa8d45c]
>> Streaming error occurred on session with peer 10.23.71.6
>> org.apache.cassandra.exceptions.WriteTimeoutException: Operation timed
>> out - received only 0 responses.
>> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:525)
>> ~[apache-cassandra-3.10.jar:3.10]
>> at org.apache.cassandra.db.Keyspace.applyNotDeferrable(Keyspace.java:440)
>> ~[apache-cassandra-3.10.jar:3.10]
>> at org.apache.cassandra.db.Mutation.apply(Mutation.java:223)
>> ~[apache-cassandra-3.10.jar:3.10]
>> at org.apache.cassandra.db.Mutation.applyUnsafe(Mutation.java:242)
>> ~[apache-cassandra-3.10.jar:3.10]
>> at org.apache.cassandra.streaming.StreamReceiveTask$OnCompletio
>> nRunnable.run(StreamReceiveTask.java:205) ~[apache-cassandra-3.10.jar:3.
>> 10]
>> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>> [na:1.8.0_102]
>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>> [na:1.8.0_102]
>> at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> [na:1.8.0_102]
>> at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> [na:1.8.0_102]
>> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_102]
>> DEBUG [STREAM-OUT-/10.23.71.6:7000] 2016-11-25 17:50:51,713
>> ConnectionHandler.java:388 - [Stream #b998aec0-b2fd-11e6-a63d-75828fa8d45c]
>> Sending Session Failed
>> DEBUG [StreamReceiveTask:94] 2016-11-25 17:50:51,713
>> StreamSession.java:472 - [Stream #b998aec0-b2fd-11e6-a63d-75828fa8d45c]
>> Finishing keep-alive task.
>> DEBUG [StreamReceiveTask:94] 2016-11-25 17:50:51,713
>> ConnectionHandler.java:120 - [Stream #b998aec0-b2fd-11e6-a63d-75828fa8d45c]
>> Closing stream connection handler on /10.23.71.6
>> INFO  [StreamReceiveTask:94] 2016-11-25 17:50:51,719
>> StreamResultFuture.java:187 - [Stream #b998aec0-b2fd-11e6-a63d-75828fa8d45c]
>> Session with /10.23.71.6 is complete
>> DEBUG [StreamReceiveTask:94] 2016-11-25 17:50:51,719
>> StreamCoordinator.java:146 - Finished connecting all sessions
>> WARN  [StreamReceiveTask:94] 2016-11-25 17:50:51,723
>> StreamResultFuture.java:214 - [Stream #b998aec0-b2fd-11e6-a63d-75828fa8d45c]
>> Stream failed
>> ERROR [main] 2016-11-25 17:50:51,724 StorageService.java:1493 - Error
>> while waiting on bootstrap to complete. Bootstrap will have to be restarted.
>> java.util.concurrent.ExecutionException: 
>> org.apache.cassandra.streaming.StreamException:
>> Stream failed
>> at com.google.common.util.concurrent.AbstractFuture$Sync.
>> getValue(AbstractFuture.java:299) ~[guava-18.0.jar:na]
>> at 
>> com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286)
>> ~[guava-18.0.jar:na]
>> at 
>> com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
>> ~[guava-18.0.jar:na]
>> at 
>> org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1488)
>> [apache-cassandra-3.10.jar:3.10]
>> at 
>> org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:948)
>> [apache-cassandra-3.10.jar:3.10]
>> at 
>> org.apache.cassandra.service.StorageService.initServer(StorageService.java:667)
>> [apache-cassandra-3.10.jar:3.10]
>> at 
>> 

Re: Bootstrap fails on 3.10

2016-11-25 Thread Paulo Motta
If you have an MV table It seems you're hitting
https://issues.apache.org/jira/browse/CASSANDRA-12905. I will bump it's
priority to critical since it can prevent or difficult bootstrap.

Did you try resuming bootstrap with "nodetool bootstrap resume" after the
failure? It may eventually succeed, since this is an MV lock contention
problem.

2016-11-25 15:59 GMT-02:00 Benjamin Roth :

> Hi!
>
> Today I wanted a new node to join the cluster.
> When looking at netstats on all the old nodes, it seemed like the
> streaming sessions did complete.
> They all said that all files have been transferred. But looking at the
> debug.log the stream sessions finished with an error.
> Also after all streams have been done the node remains in state "JOINING".
>
> See logs:
>
> debug.log, last words
> 
> ERROR [StreamReceiveTask:94] 2016-11-25 17:50:51,712
> StreamSession.java:593 - [Stream #b998aec0-b2fd-11e6-a63d-75828fa8d45c]
> Streaming error occurred on session with peer 10.23.71.6
> org.apache.cassandra.exceptions.WriteTimeoutException: Operation timed
> out - received only 0 responses.
> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:525)
> ~[apache-cassandra-3.10.jar:3.10]
> at org.apache.cassandra.db.Keyspace.applyNotDeferrable(Keyspace.java:440)
> ~[apache-cassandra-3.10.jar:3.10]
> at org.apache.cassandra.db.Mutation.apply(Mutation.java:223)
> ~[apache-cassandra-3.10.jar:3.10]
> at org.apache.cassandra.db.Mutation.applyUnsafe(Mutation.java:242)
> ~[apache-cassandra-3.10.jar:3.10]
> at org.apache.cassandra.streaming.StreamReceiveTask$
> OnCompletionRunnable.run(StreamReceiveTask.java:205)
> ~[apache-cassandra-3.10.jar:3.10]
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> [na:1.8.0_102]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_102]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> [na:1.8.0_102]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> [na:1.8.0_102]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_102]
> DEBUG [STREAM-OUT-/10.23.71.6:7000] 2016-11-25 17:50:51,713
> ConnectionHandler.java:388 - [Stream #b998aec0-b2fd-11e6-a63d-75828fa8d45c]
> Sending Session Failed
> DEBUG [StreamReceiveTask:94] 2016-11-25 17:50:51,713
> StreamSession.java:472 - [Stream #b998aec0-b2fd-11e6-a63d-75828fa8d45c]
> Finishing keep-alive task.
> DEBUG [StreamReceiveTask:94] 2016-11-25 17:50:51,713
> ConnectionHandler.java:120 - [Stream #b998aec0-b2fd-11e6-a63d-75828fa8d45c]
> Closing stream connection handler on /10.23.71.6
> INFO  [StreamReceiveTask:94] 2016-11-25 17:50:51,719
> StreamResultFuture.java:187 - [Stream #b998aec0-b2fd-11e6-a63d-75828fa8d45c]
> Session with /10.23.71.6 is complete
> DEBUG [StreamReceiveTask:94] 2016-11-25 17:50:51,719
> StreamCoordinator.java:146 - Finished connecting all sessions
> WARN  [StreamReceiveTask:94] 2016-11-25 17:50:51,723
> StreamResultFuture.java:214 - [Stream #b998aec0-b2fd-11e6-a63d-75828fa8d45c]
> Stream failed
> ERROR [main] 2016-11-25 17:50:51,724 StorageService.java:1493 - Error
> while waiting on bootstrap to complete. Bootstrap will have to be restarted.
> java.util.concurrent.ExecutionException: 
> org.apache.cassandra.streaming.StreamException:
> Stream failed
> at com.google.common.util.concurrent.AbstractFuture$
> Sync.getValue(AbstractFuture.java:299) ~[guava-18.0.jar:na]
> at com.google.common.util.concurrent.AbstractFuture$
> Sync.get(AbstractFuture.java:286) ~[guava-18.0.jar:na]
> at 
> com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
> ~[guava-18.0.jar:na]
> at 
> org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1488)
> [apache-cassandra-3.10.jar:3.10]
> at 
> org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:948)
> [apache-cassandra-3.10.jar:3.10]
> at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:667)
> [apache-cassandra-3.10.jar:3.10]
> at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:598)
> [apache-cassandra-3.10.jar:3.10]
> at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:394)
> [apache-cassandra-3.10.jar:3.10]
> at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:601)
> [apache-cassandra-3.10.jar:3.10]
> at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:735)
> [apache-cassandra-3.10.jar:3.10]
> Caused by: org.apache.cassandra.streaming.StreamException: Stream failed
> at 
> org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:215)
> ~[apache-cassandra-3.10.jar:3.10]
> at org.apache.cassandra.streaming.StreamResultFuture.
> handleSessionComplete(StreamResultFuture.java:191)
> ~[apache-cassandra-3.10.jar:3.10]
> at org.apache.cassandra.streaming.StreamSession.
> closeSession(StreamSession.java:481) 

Bootstrap fails on 3.10

2016-11-25 Thread Benjamin Roth
Hi!

Today I wanted a new node to join the cluster.
When looking at netstats on all the old nodes, it seemed like the streaming
sessions did complete.
They all said that all files have been transferred. But looking at the
debug.log the stream sessions finished with an error.
Also after all streams have been done the node remains in state "JOINING".

See logs:

debug.log, last words

ERROR [StreamReceiveTask:94] 2016-11-25 17:50:51,712 StreamSession.java:593
- [Stream #b998aec0-b2fd-11e6-a63d-75828fa8d45c] Streaming error occurred
on session with peer 10.23.71.6
org.apache.cassandra.exceptions.WriteTimeoutException: Operation timed out
- received only 0 responses.
at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:525)
~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.db.Keyspace.applyNotDeferrable(Keyspace.java:440)
~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.db.Mutation.apply(Mutation.java:223)
~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.db.Mutation.applyUnsafe(Mutation.java:242)
~[apache-cassandra-3.10.jar:3.10]
at
org.apache.cassandra.streaming.StreamReceiveTask$OnCompletionRunnable.run(StreamReceiveTask.java:205)
~[apache-cassandra-3.10.jar:3.10]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
[na:1.8.0_102]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_102]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[na:1.8.0_102]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[na:1.8.0_102]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_102]
DEBUG [STREAM-OUT-/10.23.71.6:7000] 2016-11-25 17:50:51,713
ConnectionHandler.java:388 - [Stream #b998aec0-b2fd-11e6-a63d-75828fa8d45c]
Sending Session Failed
DEBUG [StreamReceiveTask:94] 2016-11-25 17:50:51,713 StreamSession.java:472
- [Stream #b998aec0-b2fd-11e6-a63d-75828fa8d45c] Finishing keep-alive task.
DEBUG [StreamReceiveTask:94] 2016-11-25 17:50:51,713
ConnectionHandler.java:120 - [Stream #b998aec0-b2fd-11e6-a63d-75828fa8d45c]
Closing stream connection handler on /10.23.71.6
INFO  [StreamReceiveTask:94] 2016-11-25 17:50:51,719
StreamResultFuture.java:187 - [Stream
#b998aec0-b2fd-11e6-a63d-75828fa8d45c] Session with /10.23.71.6 is complete
DEBUG [StreamReceiveTask:94] 2016-11-25 17:50:51,719
StreamCoordinator.java:146 - Finished connecting all sessions
WARN  [StreamReceiveTask:94] 2016-11-25 17:50:51,723
StreamResultFuture.java:214 - [Stream
#b998aec0-b2fd-11e6-a63d-75828fa8d45c] Stream failed
ERROR [main] 2016-11-25 17:50:51,724 StorageService.java:1493 - Error while
waiting on bootstrap to complete. Bootstrap will have to be restarted.
java.util.concurrent.ExecutionException:
org.apache.cassandra.streaming.StreamException: Stream failed
at
com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299)
~[guava-18.0.jar:na]
at
com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286)
~[guava-18.0.jar:na]
at
com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
~[guava-18.0.jar:na]
at
org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1488)
[apache-cassandra-3.10.jar:3.10]
at
org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:948)
[apache-cassandra-3.10.jar:3.10]
at
org.apache.cassandra.service.StorageService.initServer(StorageService.java:667)
[apache-cassandra-3.10.jar:3.10]
at
org.apache.cassandra.service.StorageService.initServer(StorageService.java:598)
[apache-cassandra-3.10.jar:3.10]
at
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:394)
[apache-cassandra-3.10.jar:3.10]
at
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:601)
[apache-cassandra-3.10.jar:3.10]
at
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:735)
[apache-cassandra-3.10.jar:3.10]
Caused by: org.apache.cassandra.streaming.StreamException: Stream failed
at
org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:215)
~[apache-cassandra-3.10.jar:3.10]
at
org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:191)
~[apache-cassandra-3.10.jar:3.10]
at
org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:481)
~[apache-cassandra-3.10.jar:3.10]
at
org.apache.cassandra.streaming.StreamSession.onError(StreamSession.java:571)
~[apache-cassandra-3.10.jar:3.10]
at
org.apache.cassandra.streaming.StreamReceiveTask$OnCompletionRunnable.run(StreamReceiveTask.java:251)
~[apache-cassandra-3.10.jar:3.10]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
~[na:1.8.0_102]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_102]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
~[na:1.8.0_102]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)