Andreas Weise created ZEPPELIN-5598:
---------------------------------------

             Summary: DeadLock in RemoteInterpreterServer Close / Cancel
                 Key: ZEPPELIN-5598
                 URL: https://issues.apache.org/jira/browse/ZEPPELIN-5598
             Project: Zeppelin
          Issue Type: Bug
          Components: zeppelin-interpreter
    Affects Versions: 0.10.0
            Reporter: Andreas Weise


Using Interpreter Binding Isolated per User + Scoped per Note, we face dead 
lock sitution during Interpreter Shutdown.

Unfortunately we can't provide a full thread dump as Interpreter was running in 
container without jstack. Luckily we found Thread Overview of the Driver 
Process in the corresponding Spark UI. There are +100 ShutdownThreads which are 
all blocked as follows:
{code:java}
15800   ShutdownThread  BLOCKED 
Blocked by Thread 24 

Lock(org.apache.zeppelin.interpreter.InterpreterGroup@2125781891})

org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$ShutdownThread.run(RemoteInterpreterServer.java:664)
{code}
{code:java}
Thread 24       BLOCKED 
Blocked by Thread 14066 

Lock(org.apache.zeppelin.spark.PySparkInterpreter@554374428})
Lock(java.util.concurrent.ThreadPoolExecutor$Worker@188315435}), 
Monitor(org.apache.zeppelin.interpreter.InterpreterGroup@2125781891})

org.apache.zeppelin.interpreter.LazyOpenInterpreter.close(LazyOpenInterpreter.java:91)
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer.close(RemoteInterpreterServer.java:487)
 => holding 
Monitor(org.apache.zeppelin.interpreter.InterpreterGroup@2125781891})
org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Processor$close.getResult(RemoteInterpreterService.java:1757)
org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Processor$close.getResult(RemoteInterpreterService.java:1736)
org.apache.zeppelin.shaded.org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38)
org.apache.zeppelin.shaded.org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38)
org.apache.zeppelin.shaded.org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:313)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
{code}
{code:java}
Thread  14066   Thread-752      BLOCKED 
Blocked by Thread 24 

Lock(org.apache.zeppelin.interpreter.InterpreterGroup@2125781891})
Monitor(org.apache.zeppelin.spark.PySparkInterpreter@554374428}), 
Monitor(org.apache.zeppelin.interpreter.LazyOpenInterpreter@1986146112})

org.apache.zeppelin.interpreter.Interpreter.getInterpreterInTheSameSessionByClassName(Interpreter.java:292)
org.apache.zeppelin.interpreter.Interpreter.getInterpreterInTheSameSessionByClassName(Interpreter.java:333)
org.apache.zeppelin.spark.PySparkInterpreter.open(PySparkInterpreter.java:90)
org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:70)
 => holding 
Monitor(org.apache.zeppelin.interpreter.LazyOpenInterpreter@1986146112})
org.apache.zeppelin.interpreter.LazyOpenInterpreter.cancel(LazyOpenInterpreter.java:118)
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer.lambda$cancel$1(RemoteInterpreterServer.java:933)
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$$Lambda$5415/1522577174.run(Unknown
 Source)
java.lang.Thread.run(Thread.java:748) {code}
 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to