[ https://issues.apache.org/jira/browse/SPARK-16711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15392145#comment-15392145 ]
Thomas Graves commented on SPARK-16711: --------------------------------------- Note this happens when security is on. The ExecutorShuffleInfo is what we store in the DB, which doesn't looks like it has the secrets at all so we have to rely on the NM initializeApplication calls, but we don't know if/when those will happen so we may need to to storing those. > YarnShuffleService doesn't re-init properly on YARN rolling upgrade > ------------------------------------------------------------------- > > Key: SPARK-16711 > URL: https://issues.apache.org/jira/browse/SPARK-16711 > Project: Spark > Issue Type: Bug > Components: Shuffle, YARN > Affects Versions: 1.5.2 > Reporter: Thomas Graves > > When a yarn rolling upgrade happens the Spark YarnShuffleService isn't > re-initializing the tokens soon enough which causes running applications to > fail with NullPointerExceptions rather then IOExceptions which causes clients > to not retry which in turn causes the application to totally fail when it > should have just retried and succeeded. > 2016-07-22 23:22:05,460 [shuffle-server-1] ERROR > server.TransportRequestHandler: Error while invoking RpcHandler#receive() on > RPC id 6235606084052282795 > java.lang.NullPointerException: Password cannot be null if SASL is enabled > at > org.spark-project.guava.base.Preconditions.checkNotNull(Preconditions.java:208) > at > org.apache.spark.network.sasl.SparkSaslServer.encodePassword(SparkSaslServer.java:196) > at > org.apache.spark.network.sasl.SparkSaslServer$DigestCallbackHandler.handle(SparkSaslServer.java:166) > at > com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:589) > at > com.sun.security.sasl.digest.DigestMD5Server.evaluateResponse(DigestMD5Server.java:244) > at > org.apache.spark.network.sasl.SparkSaslServer.response(SparkSaslServer.java:119) > at > org.apache.spark.network.sasl.SaslRpcHandler.receive(SaslRpcHandler.java:101) > at > org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:149) > at > org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:102) > at > org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:104) > at > org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51) > at > io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) > at > io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:254) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) > at > io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) > at > org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:86) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) > at > io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130) > at > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116) > at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org