[ https://issues.apache.org/jira/browse/CASSANDRA-14525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16523222#comment-16523222 ]
Kurt Greaves commented on CASSANDRA-14525: ------------------------------------------ A few things: If we fail streaming and {{isSurveyMode}} is true we still get the NPE if auth is enabled when trying to connect to C* on that node. Not much we can do about this because auth isn't initialised until we join the ring, but I'm not sure why we should handle this situation differently, and also it's currently kind of broken. At the moment if you resume bootstrap after a streaming failure _while in write survey mode_, you will leave write survey mode on completion of bootstrapping (ouch). I think we should handle write survey bootstrapping the same as normal bootstrap, where if we get an error during streaming we don't start transports. Then, on resume, handle survey mode so that we _don't_ join the ring on completion of bootstrapping, but we do still start transports. On top of that, seeing as we're in this code anyway, I think it would be reasonable if we could look at handling the auth case a bit better when write survey is enabled as well. Ideally, if auth is required I see no point in starting the transports seeing as you'll always get an NPE, so maybe we can add a check for that in {{CassandraDaemon#start()?}} {{DatabaseDescriptor.getAuthenticator().requireAuthentication()}} should be enough here I think. Some things regarding the error message: We've got repeated information in our error: {code:java} WARN [main] 2018-06-25 09:13:24,136 StorageService.java:935 - Some data streaming failed. Use nodetool to check bootstrap state and resume. For more, see `nodetool help bootstrap`. IN_PROGRESS ERROR [main] 2018-06-25 09:13:32,190 CassandraDaemon.java:445 - Node is not yet bootstraped hence not enabling native transport. Use nodetool to check bootstrap state and resume. For more, see `nodetool help bootstrap` {code} I think our new message should either be INFO or WARN (INFO is in line with other messages in {{start()}}, and I think it would make more sense if the original message in \{{StorageService}} was ERROR. We could change the message in CassandraDaemon to: {code:java} Not starting client transports as bootstrap has not completed.{code} or something similar, to be more in line with the other info messages. Finally, with your patch if we resume bootstrap we don't start thrift. As per Vince's patch, daemon.start() is desirable here over startNativeTransport so that we always start thrift and CQL. > streaming failure during bootstrap makes new node into inconsistent state > ------------------------------------------------------------------------- > > Key: CASSANDRA-14525 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14525 > Project: Cassandra > Issue Type: Bug > Components: Core > Reporter: Jaydeepkumar Chovatia > Assignee: Jaydeepkumar Chovatia > Priority: Major > Fix For: 4.0, 2.2.x, 3.0.x > > > If bootstrap fails for newly joining node (most common reason is due to > streaming failure) then Cassandra state remains in {{joining}} state which is > fine but Cassandra also enables Native transport which makes overall state > inconsistent. This further creates NullPointer exception if auth is enabled > on the new node, please find reproducible steps here: > For example if bootstrap fails due to streaming errors like > {quote}java.util.concurrent.ExecutionException: > org.apache.cassandra.streaming.StreamException: Stream failed > at > com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299) > ~[guava-18.0.jar:na] > at > com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286) > ~[guava-18.0.jar:na] > at > com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) > ~[guava-18.0.jar:na] > at > org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1256) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:894) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:660) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:573) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:330) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:567) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:695) > [apache-cassandra-3.0.16.jar:3.0.16] > Caused by: org.apache.cassandra.streaming.StreamException: Stream failed > at > org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:85) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at com.google.common.util.concurrent.Futures$6.run(Futures.java:1310) > ~[guava-18.0.jar:na] > at > com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457) > ~[guava-18.0.jar:na] > at > com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156) > ~[guava-18.0.jar:na] > at > com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145) > ~[guava-18.0.jar:na] > at > com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202) > ~[guava-18.0.jar:na] > at > org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:211) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:187) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:440) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.streaming.StreamSession.onError(StreamSession.java:540) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:307) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_121] > {quote} > then variable [StorageService.java::dataAvailable > |https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/StorageService.java#L892] > will be {{false}}. Since {{dataAvailable}} is {{false}} hence it will not > call [StorageService.java::finishJoiningRing > |https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/StorageService.java#L933] > and as a result > [StorageService.java::doAuthSetup|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/StorageService.java#L999] > will not be invoked. > API [StorageService.java::joinTokenRing > |https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/StorageService.java#L763] > returns without any problem. After this > [CassandraDaemon.java::start|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/CassandraDaemon.java#L584] > is invoked which starts native transport at > [CassandraDaemon.java::startNativeTransport > |https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/CassandraDaemon.java#L478] > At this point daemon’s bootstrap is still not finished and transport is > enabled. So client will connect to the node and will encounter > {{java.lang.NullPointerException}} as following: > {quote}ERROR [SharedPool-Worker-2] Message.java:647 - Unexpected exception > during request; channel = [id: 0x412a26b3, L:/a.b.c.d:9042 - R:/p.q.r.s:20121] > java.lang.NullPointerException: null > at > org.apache.cassandra.auth.PasswordAuthenticator.doAuthenticate(PasswordAuthenticator.java:160) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.auth.PasswordAuthenticator.authenticate(PasswordAuthenticator.java:82) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.auth.PasswordAuthenticator.access$100(PasswordAuthenticator.java:54) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.auth.PasswordAuthenticator$PlainTextSaslAuthenticator.getAuthenticatedUser(PasswordAuthenticator.java:198) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.transport.messages.AuthResponse.execute(AuthResponse.java:78) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:535) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:429) > [apache-cassandra-3.0.16.jar:3.0.16] > at > io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) > [netty-all-4.1.0.CR6.jar:4.1.0.CR6] > at > io.netty.channel.ChannelHandlerInvokerUtil.invokeChannelReadNow(ChannelHandlerInvokerUtil.java:83) > [netty-all-4.1.0.CR6.jar:4.1.0.CR6] > at > io.netty.channel.DefaultChannelHandlerInvoker$7.run(DefaultChannelHandlerInvoker.java:159) > [netty-all-4.1.0.CR6.jar:4.1.0.CR6] > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [na:1.8.0_121] > at > org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164) > [apache-cassandra-3.0.16.jar:3.0.16] > at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) > [apache-cassandra-3.0.16.jar:3.0.16] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_121] > {quote} > At this point if we run {{nodetool status}} then it will show this new node > in {{UJ}} state, however clients can connect to this node over {{CQL}} and > will receive {{java.lang.NullPointerException}} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org