[ https://issues.apache.org/jira/browse/SPARK-15002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15421164#comment-15421164 ]
Sean Owen commented on SPARK-15002: ----------------------------------- In the UI, go look at a heap dump of the pegged executor. It should show you RUNNABLE threads and it ought to be sort of clear where they're busy. Also check GC time. If it's a non-trivial fraction of execution time, at least it's clear that somehow it's memory pressure. > Calling unpersist can cause spark to hang indefinitely when writing out a > result > -------------------------------------------------------------------------------- > > Key: SPARK-15002 > URL: https://issues.apache.org/jira/browse/SPARK-15002 > Project: Spark > Issue Type: Bug > Components: GraphX, Spark Core > Affects Versions: 1.5.2, 1.6.0, 2.0.0 > Environment: AWS and Linux VM, both in spark-shell and spark-submit. > tested in 1.5.2 and 1.6. Tested in 2.0 > Reporter: Jamie Hutton > > The following code will cause spark to hang indefinitely. It happens when you > have an unpersist which is followed by some futher processing of that data > (in my case writing it out). > I first experienced this issue with graphX so my example below involved > graphX code, however i suspect it might be more of a core issue than a graphx > one. I have raised another bug with similar results (indefinite hanging) but > in different circumstances, so these may well be linked > Code to reproduce (can be run in spark-shell): > import org.apache.spark.graphx._ > import org.apache.spark.rdd.RDD > import org.apache.spark.sql.types._ > import org.apache.spark.sql._ > val r = scala.util.Random > val list = (0L to 500L).map(i=>(i,r.nextInt(500).asInstanceOf[Long],"LABEL")) > val distData = sc.parallelize(list) > val edgesRDD = distData.map(x => Edge(x._1, x._2, x._3)) > val distinctNodes = distData.flatMap{row => Iterable((row._1, ("A")),(row._2, > ("B")))}.distinct() > val nodesRDD: RDD[(VertexId, (String))] = distinctNodes > val graph = Graph(nodesRDD, edgesRDD) > graph.persist() > val ccGraph = graph.connectedComponents() > ccGraph.cache > > val schema = StructType(Seq(StructField("id", LongType, false), > StructField("netid", LongType, false))) > val > rdd=ccGraph.vertices.map(row=>(Row(row._1.asInstanceOf[Long],row._2.asInstanceOf[Long]))) > val builtEntityDF = sqlContext.createDataFrame(rdd, schema) > > /*this unpersist step causes the issue*/ > ccGraph.unpersist() > > /*write step hangs for ever*/ > builtEntityDF.write.format("parquet").mode("overwrite").save("/user/root/writetest.parquet") > > If you take out the ccGraph.unpersist() step the write step completes > successfully -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org