[ https://issues.apache.org/jira/browse/SPARK-15317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrew Or resolved SPARK-15317. ------------------------------- Resolution: Fixed Assignee: Shixiong Zhu Fix Version/s: 2.0.0 Target Version/s: 2.0.0 > JobProgressListener takes a huge amount of memory with iterative DataFrame > program in local, standalone > ------------------------------------------------------------------------------------------------------- > > Key: SPARK-15317 > URL: https://issues.apache.org/jira/browse/SPARK-15317 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 2.0.0 > Environment: Spark 2.0, local mode + standalone mode on MacBook Pro > OSX 10.9 > Reporter: Joseph K. Bradley > Assignee: Shixiong Zhu > Fix For: 2.0.0 > > Attachments: cc_traces.txt, compare-1.6-10Kpartitions.png, > compare-2.0-10Kpartitions.png, compare-2.0-16partitions.png, > dump-standalone-2.0-1of4.png, dump-standalone-2.0-2of4.png, > dump-standalone-2.0-3of4.png, dump-standalone-2.0-4of4.png > > > h2. TL;DR > Running a small test locally, I found JobProgressListener consuming a huge > amount of memory. There are many tasks being run, but it is still > surprising. Summary, with details below: > * Spark app: series of DataFrame joins > * Issue: GC > * Heap dump shows JobProgressListener taking 150 - 400MB, depending on the > Spark mode/version > h2. Reproducing this issue > h3. With more complex code > The code which fails: > * Here is a branch with the code snippet which fails: > [https://github.com/jkbradley/spark/tree/18836174ab190d94800cc247f5519f3148822dce] > ** This is based on Spark commit hash: > bb1362eb3b36b553dca246b95f59ba7fd8adcc8a > * Look at {{CC.scala}}, which implements connected components using > DataFrames: > [https://github.com/jkbradley/spark/blob/18836174ab190d94800cc247f5519f3148822dce/mllib/src/main/scala/org/apache/spark/ml/CC.scala] > In the spark shell, run: > {code} > import org.apache.spark.ml.CC > import org.apache.spark.sql.SQLContext > val sqlContext = SQLContext.getOrCreate(sc) > CC.runTest(sqlContext) > {code} > I have attached a file {{cc_traces.txt}} with the stack traces from running > {{runTest}}. Note that I sometimes had to run {{runTest}} twice to cause the > fatal exception. This includes a trace for 1.6, which should run without > modifications to {{CC.scala}}. These traces are from running in local mode. > I used {{jmap}} to dump the heap: > * local mode with 2.0: JobProgressListener took about 397 MB > * standalone mode with 2.0: JobProgressListener took about 171 MB (See > attached screenshots from MemoryAnalyzer) > Both 1.6 and 2.0 exhibit this issue. 2.0 ran faster, and the issue > (JobProgressListener allocation) seems more severe with 2.0, though it could > just be that 2.0 makes more progress and runs more jobs. > h3. With simpler code > I ran this with master (~Spark 2.0): > {code} > val data = spark.range(0, 10000, 1, 10000) > data.cache().count() > {code} > The resulting heap dump: > * 78MB for {{scala.tools.nsc.interpreter.ILoop$ILoopInterpreter}} > * 58MB for {{org.apache.spark.ui.jobs.JobProgressListener}} > * 80MB for {{io.netty.buffer.PoolChunk}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org