[ 
https://issues.apache.org/jira/browse/SPARK-15317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-15317.
-------------------------------
          Resolution: Fixed
            Assignee: Shixiong Zhu
       Fix Version/s: 2.0.0
    Target Version/s: 2.0.0

> JobProgressListener takes a huge amount of memory with iterative DataFrame 
> program in local, standalone
> -------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-15317
>                 URL: https://issues.apache.org/jira/browse/SPARK-15317
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.0.0
>         Environment: Spark 2.0, local mode + standalone mode on MacBook Pro 
> OSX 10.9
>            Reporter: Joseph K. Bradley
>            Assignee: Shixiong Zhu
>             Fix For: 2.0.0
>
>         Attachments: cc_traces.txt, compare-1.6-10Kpartitions.png, 
> compare-2.0-10Kpartitions.png, compare-2.0-16partitions.png, 
> dump-standalone-2.0-1of4.png, dump-standalone-2.0-2of4.png, 
> dump-standalone-2.0-3of4.png, dump-standalone-2.0-4of4.png
>
>
> h2. TL;DR
> Running a small test locally, I found JobProgressListener consuming a huge 
> amount of memory.  There are many tasks being run, but it is still 
> surprising.  Summary, with details below:
> * Spark app: series of DataFrame joins
> * Issue: GC
> * Heap dump shows JobProgressListener taking 150 - 400MB, depending on the 
> Spark mode/version
> h2. Reproducing this issue
> h3.  With more complex code
> The code which fails:
> * Here is a branch with the code snippet which fails: 
> [https://github.com/jkbradley/spark/tree/18836174ab190d94800cc247f5519f3148822dce]
> ** This is based on Spark commit hash: 
> bb1362eb3b36b553dca246b95f59ba7fd8adcc8a
> * Look at {{CC.scala}}, which implements connected components using 
> DataFrames: 
> [https://github.com/jkbradley/spark/blob/18836174ab190d94800cc247f5519f3148822dce/mllib/src/main/scala/org/apache/spark/ml/CC.scala]
> In the spark shell, run:
> {code}
> import org.apache.spark.ml.CC
> import org.apache.spark.sql.SQLContext
> val sqlContext = SQLContext.getOrCreate(sc)
> CC.runTest(sqlContext)
> {code}
> I have attached a file {{cc_traces.txt}} with the stack traces from running 
> {{runTest}}.  Note that I sometimes had to run {{runTest}} twice to cause the 
> fatal exception.  This includes a trace for 1.6, which should run without 
> modifications to {{CC.scala}}.  These traces are from running in local mode.
> I used {{jmap}} to dump the heap:
> * local mode with 2.0: JobProgressListener took about 397 MB
> * standalone mode with 2.0: JobProgressListener took about 171 MB  (See 
> attached screenshots from MemoryAnalyzer)
> Both 1.6 and 2.0 exhibit this issue.  2.0 ran faster, and the issue 
> (JobProgressListener allocation) seems more severe with 2.0, though it could 
> just be that 2.0 makes more progress and runs more jobs.
> h3. With simpler code
> I ran this with master (~Spark 2.0):
> {code}
> val data = spark.range(0, 10000, 1, 10000)
> data.cache().count()
> {code}
> The resulting heap dump:
> * 78MB for {{scala.tools.nsc.interpreter.ILoop$ILoopInterpreter}}
> * 58MB for {{org.apache.spark.ui.jobs.JobProgressListener}}
> * 80MB for {{io.netty.buffer.PoolChunk}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to