tyler fan created ZEPPELIN-4550:
-----------------------------------

             Summary: some problem of memory release of main process
                 Key: ZEPPELIN-4550
                 URL: https://issues.apache.org/jira/browse/ZEPPELIN-4550
             Project: Zeppelin
          Issue Type: Bug
    Affects Versions: 0.8.1
         Environment: centos 7.2

jdk 1.8-131
            Reporter: tyler fan


we set the spark interpreter to the per note mode,

the batch will be run every one hours include spark and python,

interpreter will be release after the job was finished,

after one month running, the batch was falted, 

we reset the interpreter in the config UI, zeppelin was locked when we run the 
note

log was like this:

 

 
{code:java}
"Reference Handler" #2 daemon prio=10 os_prio=0 tid=0x00007fbb5c082000 
nid=0x223d in Object.wait() [0x00007fbb6018f000]"Reference Handler" #2 daemon 
prio=10 os_prio=0 tid=0x00007fbb5c082000 nid=0x223d in Object.wait() 
[0x00007fbb6018f000]   java.lang.Thread.State: WAITING (on object monitor) at 
java.lang.Object.wait(Native Method) - waiting on <0x00000000eab06b68> (a 
java.lang.ref.Reference$Lock) at java.lang.Object.wait(Object.java:502) at 
java.lang.ref.Reference.tryHandlePending(Reference.java:191) - locked 
<0x00000000eab06b68> (a java.lang.ref.Reference$Lock) at 
java.lang.ref.Reference$ReferenceHandler.run(Reference.java:153) "main" #1 
prio=5 os_prio=0 tid=0x00007fbb5c00c000 nid=0x2237 in Object.wait() 
[0x00007fbb65315000]   java.lang.Thread.State: WAITING (on object monitor) at 
java.lang.Object.wait(Native Method) - waiting on <0x00000000eb85c1a0> (a 
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer) at 
java.lang.Thread.join(Thread.java:1252) - locked <0x00000000eb85c1a0> (a 
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer) at 
java.lang.Thread.join(Thread.java:1326) at 
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer.main(RemoteInterpreterServer.java:336)
 "VM Thread" os_prio=0 tid=0x00007fbb5c07a000 nid=0x223c runnable  "GC task 
thread#0 (ParallelGC)" os_prio=0 tid=0x00007fbb5c021000 nid=0x2238 runnable  
"GC task thread#1 (ParallelGC)" os_prio=0 tid=0x00007fbb5c023000 nid=0x2239 
runnable  "GC task thread#2 (ParallelGC)" os_prio=0 tid=0x00007fbb5c025000 
nid=0x223a runnable  "GC task thread#3 (ParallelGC)" os_prio=0 
tid=0x00007fbb5c026800 nid=0x223b runnable  "VM Periodic Task Thread" os_prio=0 
tid=0x00007fbb5c0f0000 nid=0x2244 waiting on condition  JNI global references: 
58Found one Java-level deadlock:============================="pool-1-thread-4": 
 waiting to lock monitor 0x00007fbafc0204f8 (object 0x00000000eca79ba0, a 
org.apache.zeppelin.interpreter.InterpreterGroup),  which is held by 
"pool-1-thread-3""pool-1-thread-3":  waiting to lock monitor 0x00007fbafc01d848 
(object 0x00000000edcc55c0, a 
org.apache.zeppelin.interpreter.LazyOpenInterpreter),  which is held by 
"pool-2-thread-2""pool-2-thread-2":  waiting to lock monitor 0x00007fbafc0204f8 
(object 0x00000000eca79ba0, a 
org.apache.zeppelin.interpreter.InterpreterGroup),  which is held by 
"pool-1-thread-3" Java stack information for the threads listed 
above:==================================================="pool-1-thread-4": at 
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer.close(RemoteInterpreterServer.java:456)
 - waiting to lock <0x00000000eca79ba0> (a 
org.apache.zeppelin.interpreter.InterpreterGroup) at 
org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Processor$close.getResult(RemoteInterpreterService.java:1839)
 at 
org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Processor$close.getResult(RemoteInterpreterService.java:1824)
 at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at 
org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
at java.lang.Thread.run(Thread.java:748)"pool-1-thread-3": at 
org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:63)
 - waiting to lock <0x00000000edcc55c0> (a 
org.apache.zeppelin.interpreter.LazyOpenInterpreter) at 
org.apache.zeppelin.python.PythonCondaInterpreter.getPythonInterpreter(PythonCondaInterpreter.java:172)
 at 
org.apache.zeppelin.python.PythonCondaInterpreter.getScheduler(PythonCondaInterpreter.java:381)
 at 
org.apache.zeppelin.interpreter.LazyOpenInterpreter.getScheduler(LazyOpenInterpreter.java:131)
 at 
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer.getStatus(RemoteInterpreterServer.java:1037)
 - locked <0x00000000eca79ba0> (a 
org.apache.zeppelin.interpreter.InterpreterGroup) at 
org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Processor$getStatus.getResult(RemoteInterpreterService.java:1980)
 at 
org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Processor$getStatus.getResult(RemoteInterpreterService.java:1965)
 at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at 
org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
at java.lang.Thread.run(Thread.java:748)"pool-2-thread-2": at 
org.apache.zeppelin.interpreter.Interpreter.getInterpreterInTheSameSessionByClassName(Interpreter.java:342)
 - waiting to lock <0x00000000eca79ba0> (a 
org.apache.zeppelin.interpreter.InterpreterGroup) at 
org.apache.zeppelin.python.PythonInterpreter.getIPythonInterpreter(PythonInterpreter.java:266)
 at 
org.apache.zeppelin.python.PythonInterpreter.open(PythonInterpreter.java:227) 
at 
org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69)
 - locked <0x00000000edc96070> (a org.apache.zeppelin.python.PythonInterpreter) 
- locked <0x00000000edcc55c0> (a 
org.apache.zeppelin.interpreter.LazyOpenInterpreter) at 
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:714)
 at org.apache.zeppelin.scheduler.Job.run(Job.java:188) at 
org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:140) at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at 
java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
 at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
at java.lang.Thread.run(Thread.java:748) Found 1 deadlock.
{code}
 

the jdk status is like this:

 

 
{code:java}
"Finalizer" #3 daemon prio=8 os_prio=0 tid=0x00007f034408b800 nid=0xb1bc in 
Object.wait() [0x00007f032f5f4000]"Finalizer" #3 daemon prio=8 os_prio=0 
tid=0x00007f034408b800 nid=0xb1bc in Object.wait() [0x00007f032f5f4000]   
java.lang.Thread.State: WAITING (on object monitor) at 
java.lang.Object.wait(Native Method) at 
java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:143) - locked 
<0x00000000c001f098> (a java.lang.ref.ReferenceQueue$Lock) at 
java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:164) at 
java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:209) "Reference 
Handler" #2 daemon prio=10 os_prio=0 tid=0x00007f0344087000 nid=0xb1bb in 
Object.wait() [0x00007f032f6f5000]   java.lang.Thread.State: WAITING (on object 
monitor) at java.lang.Object.wait(Native Method) at 
java.lang.Object.wait(Object.java:502) at 
java.lang.ref.Reference.tryHandlePending(Reference.java:191) - locked 
<0x00000000c0020b80> (a java.lang.ref.Reference$Lock) at 
java.lang.ref.Reference$ReferenceHandler.run(Reference.java:153) "main" #1 
prio=5 os_prio=0 tid=0x00007f0344011000 nid=0xb1b5 in Object.wait() 
[0x00007f034d8d2000]   java.lang.Thread.State: WAITING (on object monitor) at 
java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:502) 
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.join(QueuedThreadPool.java:466) 
- locked <0x00000000c0028200> (a java.lang.Object) at 
org.eclipse.jetty.server.Server.join(Server.java:555) at 
org.apache.zeppelin.server.ZeppelinServer.main(ZeppelinServer.java:280) "VM 
Thread" os_prio=0 tid=0x00007f034407f000 nid=0xb1ba runnable  "GC task thread#0 
(ParallelGC)" os_prio=0 tid=0x00007f0344026000 nid=0xb1b6 runnable  "GC task 
thread#1 (ParallelGC)" os_prio=0 tid=0x00007f0344028000 nid=0xb1b7 runnable  
"GC task thread#2 (ParallelGC)" os_prio=0 tid=0x00007f034402a000 nid=0xb1b8 
runnable  "GC task thread#3 (ParallelGC)" os_prio=0 tid=0x00007f034402b800 
nid=0xb1b9 runnable  "VM Periodic Task Thread" os_prio=0 tid=0x00007f034410d800 
nid=0xb1c2 waiting on condition  JNI global references: 842   S0     S1     E   
   O      M     CCS    YGC     YGCT    FGC    FGCT     GCT    49.91   0.00  
52.62  88.81  97.23  94.91   5570   50.811     2    0.076   50.887
{code}
 

 

 

we check the memory metrix, we found every time interpreter running the used 
mem will raise a little

 

hours plot

!SCS20200110105340.png!

 

7 day plot

 

!IMG_20200110_092327_463.jpg!

 

30 days plot:

 

!IMG_20200110_092346_915.jpg!

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to