[ 
https://issues.apache.org/jira/browse/LIVY-426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16279720#comment-16279720
 ] 

Jialiang Tan commented on LIVY-426:
-----------------------------------

Also I found out that when Livy has this problem on one user, it tends to kill 
other users sessions as well, and the same thing is no error message. 

> Livy stops session quietly without error message
> ------------------------------------------------
>
>                 Key: LIVY-426
>                 URL: https://issues.apache.org/jira/browse/LIVY-426
>             Project: Livy
>          Issue Type: Bug
>            Reporter: Jialiang Tan
>
> Livy stops interactive sessions quietly without any error messages. In YARN 
> log it only shows RECEIVED SIGNAL TERM which I think indicates Livy wants to 
> terminate the driver in yarn. In Livy log it only gives Client RPC channel 
> closed unexpectedly. No other information found.
> Spark Version: 2.0.1
> Scala Version: 2.11.8
> Livy Version: 0.4.0-incubating
> Zeppelin Version: 0.7.2
> Hadoop Version: 2.7.3
> YARN log:
> {code:java}
> 17/12/06 00:39:23 INFO storage.BlockManagerMaster: Removed 3 successfully in 
> removeExecutor
> 17/12/06 00:39:23 INFO spark.ExecutorAllocationManager: Existing executor 3 
> has been removed (new total is 0)
> 17/12/06 00:46:07 ERROR yarn.ApplicationMaster: RECEIVED SIGNAL TERM
> 17/12/06 00:46:07 INFO spark.SparkContext: Invoking stop() from shutdown hook
> 17/12/06 00:46:07 INFO server.ServerConnector: Stopped 
> ServerConnector@690ca6ae{HTTP/1.1}{0.0.0.0:0}
> 17/12/06 00:46:07 INFO handler.ContextHandler: Stopped 
> o.s.j.s.ServletContextHandler@5959d99{/stages/stage/kill,null,UNAVAILABLE}
> 17/12/06 00:46:07 INFO handler.ContextHandler: Stopped 
> o.s.j.s.ServletContextHandler@5daa4dcf{/api,null,UNAVAILABLE}
> 17/12/06 00:46:07 INFO handler.ContextHandler: Stopped 
> o.s.j.s.ServletContextHandler@1f30b9f2{/,null,UNAVAILABLE}
> 17/12/06 00:46:07 INFO handler.ContextHandler: Stopped 
> o.s.j.s.ServletContextHandler@1a36836{/static,null,UNAVAILABLE}
> 17/12/06 00:46:07 INFO handler.ContextHandler: Stopped 
> o.s.j.s.ServletContextHandler@56c1c4df{/executors/threadDump/json,null,UNAVAILABLE}
> 17/12/06 00:46:07 INFO handler.ContextHandler: Stopped 
> o.s.j.s.ServletContextHandler@5ea39f64{/executors/threadDump,null,UNAVAILABLE}
> 17/12/06 00:46:07 INFO handler.ContextHandler: Stopped 
> o.s.j.s.ServletContextHandler@5be8acbc{/executors/json,null,UNAVAILABLE}
> 17/12/06 00:46:07 INFO handler.ContextHandler: Stopped 
> o.s.j.s.ServletContextHandler@17a813b2{/executors,null,UNAVAILABLE}
> 17/12/06 00:46:07 INFO handler.ContextHandler: Stopped 
> o.s.j.s.ServletContextHandler@48780245{/environment/json,null,UNAVAILABLE}
> 17/12/06 00:46:07 INFO handler.ContextHandler: Stopped 
> o.s.j.s.ServletContextHandler@66bc6e53{/environment,null,UNAVAILABLE}
> 17/12/06 00:46:07 INFO handler.ContextHandler: Stopped 
> o.s.j.s.ServletContextHandler@7b06e14a{/storage/rdd/json,null,UNAVAILABLE}
> 17/12/06 00:46:07 INFO handler.ContextHandler: Stopped 
> o.s.j.s.ServletContextHandler@58f0ffff{/storage/rdd,null,UNAVAILABLE}
> 17/12/06 00:46:07 INFO handler.ContextHandler: Stopped 
> o.s.j.s.ServletContextHandler@4cfc52cb{/storage/json,null,UNAVAILABLE}
> 17/12/06 00:46:07 INFO handler.ContextHandler: Stopped 
> o.s.j.s.ServletContextHandler@731dd75e{/storage,null,UNAVAILABLE}
> 17/12/06 00:46:07 INFO handler.ContextHandler: Stopped 
> o.s.j.s.ServletContextHandler@1716986b{/stages/pool/json,null,UNAVAILABLE}
> 17/12/06 00:46:07 INFO handler.ContextHandler: Stopped 
> o.s.j.s.ServletContextHandler@54380417{/stages/pool,null,UNAVAILABLE}
> 17/12/06 00:46:07 INFO handler.ContextHandler: Stopped 
> o.s.j.s.ServletContextHandler@5b75d33{/stages/stage/json,null,UNAVAILABLE}
> 17/12/06 00:46:07 INFO handler.ContextHandler: Stopped 
> o.s.j.s.ServletContextHandler@24b81ae5{/stages/stage,null,UNAVAILABLE}
> 17/12/06 00:46:07 INFO handler.ContextHandler: Stopped 
> o.s.j.s.ServletContextHandler@68356b10{/stages/json,null,UNAVAILABLE}
> 17/12/06 00:46:07 INFO handler.ContextHandler: Stopped 
> o.s.j.s.ServletContextHandler@4422af95{/stages,null,UNAVAILABLE}
> 17/12/06 00:46:07 INFO handler.ContextHandler: Stopped 
> o.s.j.s.ServletContextHandler@39431dbc{/jobs/job/json,null,UNAVAILABLE}
> 17/12/06 00:46:07 INFO handler.ContextHandler: Stopped 
> o.s.j.s.ServletContextHandler@162b5d5e{/jobs/job,null,UNAVAILABLE}
> 17/12/06 00:46:07 INFO handler.ContextHandler: Stopped 
> o.s.j.s.ServletContextHandler@a84d7e6{/jobs/json,null,UNAVAILABLE}
> 17/12/06 00:46:07 INFO handler.ContextHandler: Stopped 
> o.s.j.s.ServletContextHandler@76d179b8{/jobs,null,UNAVAILABLE}
> 17/12/06 00:46:07 INFO ui.SparkUI: Stopped Spark web UI at 
> http://10.28.24.141:33403
> {code}
> And Livy log:
> {code:java}
> 17/12/06 00:46:07 WARN rsc.RSCClient: Client RPC channel closed unexpectedly.
> 17/12/06 00:46:07 WARN rsc.RSCClient: Error stopping RPC.
> io.netty.util.concurrent.BlockingOperationException: 
> DefaultChannelPromise@74e40000(uncancellable)
>         at 
> io.netty.util.concurrent.DefaultPromise.checkDeadLock(DefaultPromise.java:390)
>         at 
> io.netty.channel.DefaultChannelPromise.checkDeadLock(DefaultChannelPromise.java:157)
>         at 
> io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:251)
>         at 
> io.netty.channel.DefaultChannelPromise.await(DefaultChannelPromise.java:129)
>         at 
> io.netty.channel.DefaultChannelPromise.await(DefaultChannelPromise.java:28)
>         at 
> io.netty.util.concurrent.DefaultPromise.sync(DefaultPromise.java:218)
>         at 
> io.netty.channel.DefaultChannelPromise.sync(DefaultChannelPromise.java:117)
>         at 
> io.netty.channel.DefaultChannelPromise.sync(DefaultChannelPromise.java:28)
>         at org.apache.livy.rsc.rpc.Rpc.close(Rpc.java:307)
>         at org.apache.livy.rsc.RSCClient.stop(RSCClient.java:232)
>         at org.apache.livy.rsc.RSCClient$2$1.onSuccess(RSCClient.java:129)
>         at org.apache.livy.rsc.RSCClient$2$1.onSuccess(RSCClient.java:123)
>         at org.apache.livy.rsc.Utils$2.operationComplete(Utils.java:108)
>         at 
> io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)
>         at 
> io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:567)
>         at 
> io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:406)
>         at 
> io.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromise.java:82)
>         at 
> io.netty.channel.AbstractChannel$CloseFuture.setClosed(AbstractChannel.java:956)
>         at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.doClose0(AbstractChannel.java:608)
>         at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.close(AbstractChannel.java:586)
>         at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.closeOnRead(AbstractNioByteChannel.java:71)
>         at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:158)
>         at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
>         at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
>         at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
>         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
>         at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>         at java.lang.Thread.run(Unknown Source)
> {code}
> There is not enough debug information for this behavior and the behavior 
> happens occasionally. Sometimes the session got stopped immediately upon 
> startup. Other times the session can run properly for hours before it hit 
> this. I have changed some configurations in Livy, not sure if that is related:
> livy.conf:
> livy.spark.master = yarn
> livy.spark.deploy-mode = cluster
> livy.server.session.timeout = 24h
> livy.impersonation.enabled = true
> livy.file.local-dir-whitelist = /usr/lib/sparkExternalLibs
> livy.repl.enable-hive-context = true
> livy.server.yarn.app-lookup-timeout = 600s
> livy-client.conf:
> livy.rsc.server.connect.timeout = 360s
> livy.rsc.client.connect.timeout = 120s
> (I initially suspect it might be due to some rpc timeout issue and hence 
> increased the timeout, but nothing changed)
> Not sure if anyone encountered similar issues and any workaround?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to