[ 
https://issues.apache.org/jira/browse/CASSANDRA-14525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16523222#comment-16523222
 ] 

Kurt Greaves commented on CASSANDRA-14525:
------------------------------------------

A few things:
 If we fail streaming and {{isSurveyMode}} is true we still get the NPE if auth 
is enabled when trying to connect to C* on that node. Not much we can do about 
this because auth isn't initialised until we join the ring, but I'm not sure 
why we should handle this situation differently, and also it's currently kind 
of broken. At the moment if you resume bootstrap after a streaming failure 
_while in write survey mode_, you will leave write survey mode on completion of 
bootstrapping (ouch).


 I think we should handle write survey bootstrapping the same as normal 
bootstrap, where if we get an error during streaming we don't start transports. 
Then, on resume, handle survey mode so that we _don't_ join the ring on 
completion of bootstrapping, but we do still start transports.


 On top of that, seeing as we're in this code anyway, I think it would be 
reasonable if we could look at handling the auth case a bit better when write 
survey is enabled as well. Ideally, if auth is required I see no point in 
starting the transports seeing as you'll always get an NPE, so maybe we can add 
a check for that in {{CassandraDaemon#start()?}}

{{DatabaseDescriptor.getAuthenticator().requireAuthentication()}} should be 
enough here I think.

 

Some things regarding the error message:
 We've got repeated information in our error:
{code:java}
WARN  [main] 2018-06-25 09:13:24,136 StorageService.java:935 - Some data 
streaming failed. Use nodetool to check bootstrap state and resume. For more, 
see `nodetool help bootstrap`. IN_PROGRESS
ERROR [main] 2018-06-25 09:13:32,190 CassandraDaemon.java:445 - Node is not yet 
bootstraped hence not enabling native transport. Use nodetool to check 
bootstrap state and resume. For more, see `nodetool help    bootstrap`
{code}
I think our new message should either be INFO or WARN (INFO is in line with 
other messages in {{start()}}, and I think it would make more sense if the 
original message in \{{StorageService}} was ERROR. We could change the message 
in CassandraDaemon to:
{code:java}
Not starting client transports as bootstrap has not completed.{code}
or something similar, to be more in line with the other info messages.

Finally, with your patch if we resume bootstrap we don't start thrift. As per 
Vince's patch, daemon.start() is desirable here over startNativeTransport so 
that we always start thrift and CQL.

> streaming failure during bootstrap makes new node into inconsistent state
> -------------------------------------------------------------------------
>
>                 Key: CASSANDRA-14525
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14525
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Jaydeepkumar Chovatia
>            Assignee: Jaydeepkumar Chovatia
>            Priority: Major
>             Fix For: 4.0, 2.2.x, 3.0.x
>
>
> If bootstrap fails for newly joining node (most common reason is due to 
> streaming failure) then Cassandra state remains in {{joining}} state which is 
> fine but Cassandra also enables Native transport which makes overall state 
> inconsistent. This further creates NullPointer exception if auth is enabled 
> on the new node, please find reproducible steps here:
> For example if bootstrap fails due to streaming errors like
> {quote}java.util.concurrent.ExecutionException: 
> org.apache.cassandra.streaming.StreamException: Stream failed
>  at 
> com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299)
>  ~[guava-18.0.jar:na]
>  at 
> com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286)
>  ~[guava-18.0.jar:na]
>  at 
> com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) 
> ~[guava-18.0.jar:na]
>  at 
> org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1256)
>  [apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:894)
>  [apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:660)
>  [apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:573)
>  [apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:330) 
> [apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:567)
>  [apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:695) 
> [apache-cassandra-3.0.16.jar:3.0.16]
>  Caused by: org.apache.cassandra.streaming.StreamException: Stream failed
>  at 
> org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:85)
>  ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at com.google.common.util.concurrent.Futures$6.run(Futures.java:1310) 
> ~[guava-18.0.jar:na]
>  at 
> com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457)
>  ~[guava-18.0.jar:na]
>  at 
> com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)
>  ~[guava-18.0.jar:na]
>  at 
> com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145)
>  ~[guava-18.0.jar:na]
>  at 
> com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202)
>  ~[guava-18.0.jar:na]
>  at 
> org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:211)
>  ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:187)
>  ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:440)
>  ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.streaming.StreamSession.onError(StreamSession.java:540) 
> ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:307)
>  ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79)
>  ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_121]
> {quote}
> then variable [StorageService.java::dataAvailable 
> |https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/StorageService.java#L892]
>  will be {{false}}. Since {{dataAvailable}} is {{false}} hence it will not 
> call [StorageService.java::finishJoiningRing 
> |https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/StorageService.java#L933]
>  and as a result 
> [StorageService.java::doAuthSetup|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/StorageService.java#L999]
>  will not be invoked.
> API [StorageService.java::joinTokenRing 
> |https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/StorageService.java#L763]
>  returns without any problem. After this 
> [CassandraDaemon.java::start|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/CassandraDaemon.java#L584]
>  is invoked which starts native transport at 
>  [CassandraDaemon.java::startNativeTransport 
> |https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/CassandraDaemon.java#L478]
> At this point daemon’s bootstrap is still not finished and transport is 
> enabled. So client will connect to the node and will encounter 
> {{java.lang.NullPointerException}} as following:
> {quote}ERROR [SharedPool-Worker-2] Message.java:647 - Unexpected exception 
> during request; channel = [id: 0x412a26b3, L:/a.b.c.d:9042 - R:/p.q.r.s:20121]
>  java.lang.NullPointerException: null
>  at 
> org.apache.cassandra.auth.PasswordAuthenticator.doAuthenticate(PasswordAuthenticator.java:160)
>  ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.auth.PasswordAuthenticator.authenticate(PasswordAuthenticator.java:82)
>  ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.auth.PasswordAuthenticator.access$100(PasswordAuthenticator.java:54)
>  ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.auth.PasswordAuthenticator$PlainTextSaslAuthenticator.getAuthenticatedUser(PasswordAuthenticator.java:198)
>  ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.transport.messages.AuthResponse.execute(AuthResponse.java:78)
>  ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:535)
>  [apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:429)
>  [apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
>  [netty-all-4.1.0.CR6.jar:4.1.0.CR6]
>  at 
> io.netty.channel.ChannelHandlerInvokerUtil.invokeChannelReadNow(ChannelHandlerInvokerUtil.java:83)
>  [netty-all-4.1.0.CR6.jar:4.1.0.CR6]
>  at 
> io.netty.channel.DefaultChannelHandlerInvoker$7.run(DefaultChannelHandlerInvoker.java:159)
>  [netty-all-4.1.0.CR6.jar:4.1.0.CR6]
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:1.8.0_121]
>  at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164)
>  [apache-cassandra-3.0.16.jar:3.0.16]
>  at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) 
> [apache-cassandra-3.0.16.jar:3.0.16]
>  at java.lang.Thread.run(Thread.java:745) [na:1.8.0_121]
> {quote}
> At this point if we run {{nodetool status}} then it will show this new node 
> in {{UJ}} state, however clients can connect to this node over {{CQL}} and 
> will receive {{java.lang.NullPointerException}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to