Andreas Weise created ZEPPELIN-5598:
---------------------------------------
Summary: DeadLock in RemoteInterpreterServer Close / Cancel
Key: ZEPPELIN-5598
URL: https://issues.apache.org/jira/browse/ZEPPELIN-5598
Project: Zeppelin
Issue Type: Bug
Components: zeppelin-interpreter
Affects Versions: 0.10.0
Reporter: Andreas Weise
Using Interpreter Binding Isolated per User + Scoped per Note, we face dead
lock sitution during Interpreter Shutdown.
Unfortunately we can't provide a full thread dump as Interpreter was running in
container without jstack. Luckily we found Thread Overview of the Driver
Process in the corresponding Spark UI. There are +100 ShutdownThreads which are
all blocked as follows:
{code:java}
15800 ShutdownThread BLOCKED
Blocked by Thread 24
Lock(org.apache.zeppelin.interpreter.InterpreterGroup@2125781891})
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$ShutdownThread.run(RemoteInterpreterServer.java:664)
{code}
{code:java}
Thread 24 BLOCKED
Blocked by Thread 14066
Lock(org.apache.zeppelin.spark.PySparkInterpreter@554374428})
Lock(java.util.concurrent.ThreadPoolExecutor$Worker@188315435}),
Monitor(org.apache.zeppelin.interpreter.InterpreterGroup@2125781891})
org.apache.zeppelin.interpreter.LazyOpenInterpreter.close(LazyOpenInterpreter.java:91)
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer.close(RemoteInterpreterServer.java:487)
=> holding
Monitor(org.apache.zeppelin.interpreter.InterpreterGroup@2125781891})
org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Processor$close.getResult(RemoteInterpreterService.java:1757)
org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Processor$close.getResult(RemoteInterpreterService.java:1736)
org.apache.zeppelin.shaded.org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38)
org.apache.zeppelin.shaded.org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38)
org.apache.zeppelin.shaded.org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:313)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
{code}
{code:java}
Thread 14066 Thread-752 BLOCKED
Blocked by Thread 24
Lock(org.apache.zeppelin.interpreter.InterpreterGroup@2125781891})
Monitor(org.apache.zeppelin.spark.PySparkInterpreter@554374428}),
Monitor(org.apache.zeppelin.interpreter.LazyOpenInterpreter@1986146112})
org.apache.zeppelin.interpreter.Interpreter.getInterpreterInTheSameSessionByClassName(Interpreter.java:292)
org.apache.zeppelin.interpreter.Interpreter.getInterpreterInTheSameSessionByClassName(Interpreter.java:333)
org.apache.zeppelin.spark.PySparkInterpreter.open(PySparkInterpreter.java:90)
org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:70)
=> holding
Monitor(org.apache.zeppelin.interpreter.LazyOpenInterpreter@1986146112})
org.apache.zeppelin.interpreter.LazyOpenInterpreter.cancel(LazyOpenInterpreter.java:118)
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer.lambda$cancel$1(RemoteInterpreterServer.java:933)
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$$Lambda$5415/1522577174.run(Unknown
Source)
java.lang.Thread.run(Thread.java:748) {code}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)