[ 
https://issues.apache.org/jira/browse/CASSANDRA-14525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16529525#comment-16529525
 ] 

Kurt Greaves commented on CASSANDRA-14525:
------------------------------------------

{quote}Also I've discovered another bug exists in current open source code in 
which if isSurveyMode is true and streaming fails (i.e. isBootstrapMode is 
true) then also one can call nodetool join without nodetool bootstrap resume 
and have that node join the ring.
{quote}
Great catch. I found a couple more small issue w.r.t {{nodetool join}} as well 
while I was testing this.
 # If in write_survey and you join the ring after bootstrap, transports won't 
be enabled. can we call {{CassandraDaemon#start()}} here?
 # nodetool join fails silently if write_survey is true and we haven't 
completed bootstrapping, but server log prints the following
{code:java}
WARN [RMI TCP Connection(5)-127.0.0.1] 2018-06-29 12:39:49,735 
StorageService.java:1008 - Some data streaming failed. Use nodetool to check 
bootstrap state and resume. For more, see `nodetool help bootstrap`. IN_PROGRESS
{code}
nodetool join should say something along the lines of "{{Can't join the ring 
because in write_survey mode and bootstrap hasn't completed}}"

Also another minor nit w.r.t logging; you can get the following log message 
after successfully bootstrapping if you were in write survey mode:
{code:java}
INFO [main] 2018-06-29 12:12:39,071 CassandraDaemon.java:479 - Not starting 
client transports as bootstrap has not completed
{code}
Probably better to split CassandraDaemon.start() if block so that we print 
"{{Not starting client transports as write_survey mode is enabled.}}"

And finally, there's still 2 occurences of "bootstraped" in the exception 
messages in {{startNativeTransport}} and {{startRPCServer}}.

> streaming failure during bootstrap makes new node into inconsistent state
> -------------------------------------------------------------------------
>
>                 Key: CASSANDRA-14525
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14525
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Jaydeepkumar Chovatia
>            Assignee: Jaydeepkumar Chovatia
>            Priority: Major
>             Fix For: 4.0, 2.2.x, 3.0.x
>
>
> If bootstrap fails for newly joining node (most common reason is due to 
> streaming failure) then Cassandra state remains in {{joining}} state which is 
> fine but Cassandra also enables Native transport which makes overall state 
> inconsistent. This further creates NullPointer exception if auth is enabled 
> on the new node, please find reproducible steps here:
> For example if bootstrap fails due to streaming errors like
> {quote}java.util.concurrent.ExecutionException: 
> org.apache.cassandra.streaming.StreamException: Stream failed
>  at 
> com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299)
>  ~[guava-18.0.jar:na]
>  at 
> com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286)
>  ~[guava-18.0.jar:na]
>  at 
> com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) 
> ~[guava-18.0.jar:na]
>  at 
> org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1256)
>  [apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:894)
>  [apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:660)
>  [apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:573)
>  [apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:330) 
> [apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:567)
>  [apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:695) 
> [apache-cassandra-3.0.16.jar:3.0.16]
>  Caused by: org.apache.cassandra.streaming.StreamException: Stream failed
>  at 
> org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:85)
>  ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at com.google.common.util.concurrent.Futures$6.run(Futures.java:1310) 
> ~[guava-18.0.jar:na]
>  at 
> com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457)
>  ~[guava-18.0.jar:na]
>  at 
> com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)
>  ~[guava-18.0.jar:na]
>  at 
> com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145)
>  ~[guava-18.0.jar:na]
>  at 
> com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202)
>  ~[guava-18.0.jar:na]
>  at 
> org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:211)
>  ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:187)
>  ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:440)
>  ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.streaming.StreamSession.onError(StreamSession.java:540) 
> ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:307)
>  ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79)
>  ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_121]
> {quote}
> then variable [StorageService.java::dataAvailable 
> |https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/StorageService.java#L892]
>  will be {{false}}. Since {{dataAvailable}} is {{false}} hence it will not 
> call [StorageService.java::finishJoiningRing 
> |https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/StorageService.java#L933]
>  and as a result 
> [StorageService.java::doAuthSetup|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/StorageService.java#L999]
>  will not be invoked.
> API [StorageService.java::joinTokenRing 
> |https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/StorageService.java#L763]
>  returns without any problem. After this 
> [CassandraDaemon.java::start|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/CassandraDaemon.java#L584]
>  is invoked which starts native transport at 
>  [CassandraDaemon.java::startNativeTransport 
> |https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/CassandraDaemon.java#L478]
> At this point daemon’s bootstrap is still not finished and transport is 
> enabled. So client will connect to the node and will encounter 
> {{java.lang.NullPointerException}} as following:
> {quote}ERROR [SharedPool-Worker-2] Message.java:647 - Unexpected exception 
> during request; channel = [id: 0x412a26b3, L:/a.b.c.d:9042 - R:/p.q.r.s:20121]
>  java.lang.NullPointerException: null
>  at 
> org.apache.cassandra.auth.PasswordAuthenticator.doAuthenticate(PasswordAuthenticator.java:160)
>  ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.auth.PasswordAuthenticator.authenticate(PasswordAuthenticator.java:82)
>  ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.auth.PasswordAuthenticator.access$100(PasswordAuthenticator.java:54)
>  ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.auth.PasswordAuthenticator$PlainTextSaslAuthenticator.getAuthenticatedUser(PasswordAuthenticator.java:198)
>  ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.transport.messages.AuthResponse.execute(AuthResponse.java:78)
>  ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:535)
>  [apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:429)
>  [apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
>  [netty-all-4.1.0.CR6.jar:4.1.0.CR6]
>  at 
> io.netty.channel.ChannelHandlerInvokerUtil.invokeChannelReadNow(ChannelHandlerInvokerUtil.java:83)
>  [netty-all-4.1.0.CR6.jar:4.1.0.CR6]
>  at 
> io.netty.channel.DefaultChannelHandlerInvoker$7.run(DefaultChannelHandlerInvoker.java:159)
>  [netty-all-4.1.0.CR6.jar:4.1.0.CR6]
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:1.8.0_121]
>  at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164)
>  [apache-cassandra-3.0.16.jar:3.0.16]
>  at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) 
> [apache-cassandra-3.0.16.jar:3.0.16]
>  at java.lang.Thread.run(Thread.java:745) [na:1.8.0_121]
> {quote}
> At this point if we run {{nodetool status}} then it will show this new node 
> in {{UJ}} state, however clients can connect to this node over {{CQL}} and 
> will receive {{java.lang.NullPointerException}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to