tyler fan created ZEPPELIN-4550:
-----------------------------------
Summary: some problem of memory release of main process
Key: ZEPPELIN-4550
URL: https://issues.apache.org/jira/browse/ZEPPELIN-4550
Project: Zeppelin
Issue Type: Bug
Affects Versions: 0.8.1
Environment: centos 7.2
jdk 1.8-131
Reporter: tyler fan
we set the spark interpreter to the per note mode,
the batch will be run every one hours include spark and python,
interpreter will be release after the job was finished,
after one month running, the batch was falted,
we reset the interpreter in the config UI, zeppelin was locked when we run the
note
log was like this:
{code:java}
"Reference Handler" #2 daemon prio=10 os_prio=0 tid=0x00007fbb5c082000
nid=0x223d in Object.wait() [0x00007fbb6018f000]"Reference Handler" #2 daemon
prio=10 os_prio=0 tid=0x00007fbb5c082000 nid=0x223d in Object.wait()
[0x00007fbb6018f000] java.lang.Thread.State: WAITING (on object monitor) at
java.lang.Object.wait(Native Method) - waiting on <0x00000000eab06b68> (a
java.lang.ref.Reference$Lock) at java.lang.Object.wait(Object.java:502) at
java.lang.ref.Reference.tryHandlePending(Reference.java:191) - locked
<0x00000000eab06b68> (a java.lang.ref.Reference$Lock) at
java.lang.ref.Reference$ReferenceHandler.run(Reference.java:153) "main" #1
prio=5 os_prio=0 tid=0x00007fbb5c00c000 nid=0x2237 in Object.wait()
[0x00007fbb65315000] java.lang.Thread.State: WAITING (on object monitor) at
java.lang.Object.wait(Native Method) - waiting on <0x00000000eb85c1a0> (a
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer) at
java.lang.Thread.join(Thread.java:1252) - locked <0x00000000eb85c1a0> (a
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer) at
java.lang.Thread.join(Thread.java:1326) at
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer.main(RemoteInterpreterServer.java:336)
"VM Thread" os_prio=0 tid=0x00007fbb5c07a000 nid=0x223c runnable "GC task
thread#0 (ParallelGC)" os_prio=0 tid=0x00007fbb5c021000 nid=0x2238 runnable
"GC task thread#1 (ParallelGC)" os_prio=0 tid=0x00007fbb5c023000 nid=0x2239
runnable "GC task thread#2 (ParallelGC)" os_prio=0 tid=0x00007fbb5c025000
nid=0x223a runnable "GC task thread#3 (ParallelGC)" os_prio=0
tid=0x00007fbb5c026800 nid=0x223b runnable "VM Periodic Task Thread" os_prio=0
tid=0x00007fbb5c0f0000 nid=0x2244 waiting on condition JNI global references:
58Found one Java-level deadlock:============================="pool-1-thread-4":
waiting to lock monitor 0x00007fbafc0204f8 (object 0x00000000eca79ba0, a
org.apache.zeppelin.interpreter.InterpreterGroup), which is held by
"pool-1-thread-3""pool-1-thread-3": waiting to lock monitor 0x00007fbafc01d848
(object 0x00000000edcc55c0, a
org.apache.zeppelin.interpreter.LazyOpenInterpreter), which is held by
"pool-2-thread-2""pool-2-thread-2": waiting to lock monitor 0x00007fbafc0204f8
(object 0x00000000eca79ba0, a
org.apache.zeppelin.interpreter.InterpreterGroup), which is held by
"pool-1-thread-3" Java stack information for the threads listed
above:==================================================="pool-1-thread-4": at
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer.close(RemoteInterpreterServer.java:456)
- waiting to lock <0x00000000eca79ba0> (a
org.apache.zeppelin.interpreter.InterpreterGroup) at
org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Processor$close.getResult(RemoteInterpreterService.java:1839)
at
org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Processor$close.getResult(RemoteInterpreterService.java:1824)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at
org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)"pool-1-thread-3": at
org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:63)
- waiting to lock <0x00000000edcc55c0> (a
org.apache.zeppelin.interpreter.LazyOpenInterpreter) at
org.apache.zeppelin.python.PythonCondaInterpreter.getPythonInterpreter(PythonCondaInterpreter.java:172)
at
org.apache.zeppelin.python.PythonCondaInterpreter.getScheduler(PythonCondaInterpreter.java:381)
at
org.apache.zeppelin.interpreter.LazyOpenInterpreter.getScheduler(LazyOpenInterpreter.java:131)
at
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer.getStatus(RemoteInterpreterServer.java:1037)
- locked <0x00000000eca79ba0> (a
org.apache.zeppelin.interpreter.InterpreterGroup) at
org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Processor$getStatus.getResult(RemoteInterpreterService.java:1980)
at
org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Processor$getStatus.getResult(RemoteInterpreterService.java:1965)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at
org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)"pool-2-thread-2": at
org.apache.zeppelin.interpreter.Interpreter.getInterpreterInTheSameSessionByClassName(Interpreter.java:342)
- waiting to lock <0x00000000eca79ba0> (a
org.apache.zeppelin.interpreter.InterpreterGroup) at
org.apache.zeppelin.python.PythonInterpreter.getIPythonInterpreter(PythonInterpreter.java:266)
at
org.apache.zeppelin.python.PythonInterpreter.open(PythonInterpreter.java:227)
at
org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69)
- locked <0x00000000edc96070> (a org.apache.zeppelin.python.PythonInterpreter)
- locked <0x00000000edcc55c0> (a
org.apache.zeppelin.interpreter.LazyOpenInterpreter) at
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:714)
at org.apache.zeppelin.scheduler.Job.run(Job.java:188) at
org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:140) at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at
java.util.concurrent.FutureTask.run(FutureTask.java:266) at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748) Found 1 deadlock.
{code}
the jdk status is like this:
{code:java}
"Finalizer" #3 daemon prio=8 os_prio=0 tid=0x00007f034408b800 nid=0xb1bc in
Object.wait() [0x00007f032f5f4000]"Finalizer" #3 daemon prio=8 os_prio=0
tid=0x00007f034408b800 nid=0xb1bc in Object.wait() [0x00007f032f5f4000]
java.lang.Thread.State: WAITING (on object monitor) at
java.lang.Object.wait(Native Method) at
java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:143) - locked
<0x00000000c001f098> (a java.lang.ref.ReferenceQueue$Lock) at
java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:164) at
java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:209) "Reference
Handler" #2 daemon prio=10 os_prio=0 tid=0x00007f0344087000 nid=0xb1bb in
Object.wait() [0x00007f032f6f5000] java.lang.Thread.State: WAITING (on object
monitor) at java.lang.Object.wait(Native Method) at
java.lang.Object.wait(Object.java:502) at
java.lang.ref.Reference.tryHandlePending(Reference.java:191) - locked
<0x00000000c0020b80> (a java.lang.ref.Reference$Lock) at
java.lang.ref.Reference$ReferenceHandler.run(Reference.java:153) "main" #1
prio=5 os_prio=0 tid=0x00007f0344011000 nid=0xb1b5 in Object.wait()
[0x00007f034d8d2000] java.lang.Thread.State: WAITING (on object monitor) at
java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:502)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.join(QueuedThreadPool.java:466)
- locked <0x00000000c0028200> (a java.lang.Object) at
org.eclipse.jetty.server.Server.join(Server.java:555) at
org.apache.zeppelin.server.ZeppelinServer.main(ZeppelinServer.java:280) "VM
Thread" os_prio=0 tid=0x00007f034407f000 nid=0xb1ba runnable "GC task thread#0
(ParallelGC)" os_prio=0 tid=0x00007f0344026000 nid=0xb1b6 runnable "GC task
thread#1 (ParallelGC)" os_prio=0 tid=0x00007f0344028000 nid=0xb1b7 runnable
"GC task thread#2 (ParallelGC)" os_prio=0 tid=0x00007f034402a000 nid=0xb1b8
runnable "GC task thread#3 (ParallelGC)" os_prio=0 tid=0x00007f034402b800
nid=0xb1b9 runnable "VM Periodic Task Thread" os_prio=0 tid=0x00007f034410d800
nid=0xb1c2 waiting on condition JNI global references: 842 S0 S1 E
O M CCS YGC YGCT FGC FGCT GCT 49.91 0.00
52.62 88.81 97.23 94.91 5570 50.811 2 0.076 50.887
{code}
we check the memory metrix, we found every time interpreter running the used
mem will raise a little
hours plot
!SCS20200110105340.png!
7 day plot
!IMG_20200110_092327_463.jpg!
30 days plot:
!IMG_20200110_092346_915.jpg!
--
This message was sent by Atlassian Jira
(v8.3.4#803005)