[ https://issues.apache.org/jira/browse/SPARK-12511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15073344#comment-15073344 ]
Shixiong Zhu edited comment on SPARK-12511 at 12/29/15 1:33 AM: ---------------------------------------------------------------- Has not yet figured out the root cause. Here are my found right now: the "Finalizer" thread is blocked by py4j, so the finalizer queue keeps growing. {code} "Finalizer" #3 daemon prio=8 os_prio=31 tid=0x00007feaa380e000 nid=0x3503 runnable [0x0000000117ca4000] java.lang.Thread.State: RUNNABLE at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) at java.net.SocketInputStream.read(SocketInputStream.java:170) at java.net.SocketInputStream.read(SocketInputStream.java:141) at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284) at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326) at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) - locked <0x00000007813be228> (a java.io.InputStreamReader) at java.io.InputStreamReader.read(InputStreamReader.java:184) at java.io.BufferedReader.fill(BufferedReader.java:161) at java.io.BufferedReader.readLine(BufferedReader.java:324) - locked <0x00000007813be228> (a java.io.InputStreamReader) at java.io.BufferedReader.readLine(BufferedReader.java:389) at py4j.CallbackConnection.sendCommand(CallbackConnection.java:82) at py4j.CallbackClient.sendCommand(CallbackClient.java:236) at py4j.reflection.PythonProxyHandler.finalize(PythonProxyHandler.java:81) at java.lang.System$2.invokeFinalize(System.java:1270) at java.lang.ref.Finalizer.runFinalizer(Finalizer.java:98) at java.lang.ref.Finalizer.access$100(Finalizer.java:34) at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:210) {code} was (Author: zsxwing): Has not yet figured out the root cause. Here are my found right now: the "Finalizer" thread is blocked by py4j, so the finalizer keeps growing. {code} "Finalizer" #3 daemon prio=8 os_prio=31 tid=0x00007feaa380e000 nid=0x3503 runnable [0x0000000117ca4000] java.lang.Thread.State: RUNNABLE at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) at java.net.SocketInputStream.read(SocketInputStream.java:170) at java.net.SocketInputStream.read(SocketInputStream.java:141) at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284) at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326) at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) - locked <0x00000007813be228> (a java.io.InputStreamReader) at java.io.InputStreamReader.read(InputStreamReader.java:184) at java.io.BufferedReader.fill(BufferedReader.java:161) at java.io.BufferedReader.readLine(BufferedReader.java:324) - locked <0x00000007813be228> (a java.io.InputStreamReader) at java.io.BufferedReader.readLine(BufferedReader.java:389) at py4j.CallbackConnection.sendCommand(CallbackConnection.java:82) at py4j.CallbackClient.sendCommand(CallbackClient.java:236) at py4j.reflection.PythonProxyHandler.finalize(PythonProxyHandler.java:81) at java.lang.System$2.invokeFinalize(System.java:1270) at java.lang.ref.Finalizer.runFinalizer(Finalizer.java:98) at java.lang.ref.Finalizer.access$100(Finalizer.java:34) at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:210) {code} > streaming driver with checkpointing unable to finalize leading to OOM > --------------------------------------------------------------------- > > Key: SPARK-12511 > URL: https://issues.apache.org/jira/browse/SPARK-12511 > Project: Spark > Issue Type: Bug > Components: PySpark, Streaming > Affects Versions: 1.5.2 > Environment: pyspark 1.5.2 > yarn 2.6.0 > python 2.6 > centos 6.5 > openjdk 1.8.0 > Reporter: Antony Mayi > Assignee: Shixiong Zhu > Priority: Critical > Attachments: bug.py, finalizer-classes.png, finalizer-pending.png, > finalizer-spark_assembly.png > > > Spark streaming application when configured with checkpointing is filling > driver's heap with multiple ZipFileInputStream instances as results of > spark-assembly.jar (potentially some others like for example snappy-java.jar) > getting repetitively referenced (loaded?). Java Finalizer can't finalize > these ZipFileInputStream instances and it eventually takes all heap leading > the driver to OOM crash. > h2. Steps to reproduce: > * Submit attached [^bug.py] to spark > * Leave it running and monitor the driver java process heap > ** with heap dump you will primarily see growing instances of byte array data > (here cumulated zip payload of the jar refs): > {noformat} > num #instances #bytes class name > ---------------------------------------------- > 1: 32653 32735296 [B > 2: 48000 5135816 [C > 3: 41 1344144 [Lscala.concurrent.forkjoin.ForkJoinTask; > 4: 11362 1261816 java.lang.Class > 5: 47054 1129296 java.lang.String > 6: 25460 1018400 java.lang.ref.Finalizer > 7: 9802 789400 [Ljava.lang.Object; > {noformat} > ** with visualvm you can see: > *** increasing number of objects pending for finalization > !finalizer-pending.png! > *** increasing number of ZipFileInputStreams instances related to the > spark-assembly.jar referenced by Finalizer > !finalizer-spark_assembly.png! > * Depending on the heap size and running time this will lead to driver OOM > crash > h2. Comments > * The [^bug.py] is lightweight proof of the problem. In production I am > experiencing this as quite rapid effect - in few hours it eats gigs of heap > and kills the app. > * If the same [^bug.py] is run without checkpointing there is no issue > whatsoever. > * Not sure if it is just pyspark related. > * In [^bug.py] I am using the socketTextStream input but seems to be > independent of the input type (in production having same problem with Kafka > direct stream, have seen it even with textFileStream). > * It is happening even if the input stream doesn't produce any data. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org