[ 
https://issues.apache.org/jira/browse/SPARK-15002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15421164#comment-15421164
 ] 

Sean Owen commented on SPARK-15002:
-----------------------------------

In the UI, go look at a heap dump of the pegged executor. It should show you 
RUNNABLE threads and it ought to be sort of clear where they're busy. Also 
check GC time. If it's a non-trivial fraction of execution time, at least it's 
clear that somehow it's memory pressure.

> Calling unpersist can cause spark to hang indefinitely when writing out a 
> result
> --------------------------------------------------------------------------------
>
>                 Key: SPARK-15002
>                 URL: https://issues.apache.org/jira/browse/SPARK-15002
>             Project: Spark
>          Issue Type: Bug
>          Components: GraphX, Spark Core
>    Affects Versions: 1.5.2, 1.6.0, 2.0.0
>         Environment: AWS and Linux VM, both in spark-shell and spark-submit. 
> tested in 1.5.2 and 1.6. Tested in 2.0
>            Reporter: Jamie Hutton
>
> The following code will cause spark to hang indefinitely. It happens when you 
> have an unpersist which is followed by some futher processing of that data 
> (in my case writing it out).
> I first experienced this issue with graphX so my example below involved 
> graphX code, however i suspect it might be more of a core issue than a graphx 
> one. I have raised another bug with similar results (indefinite hanging) but 
> in different circumstances, so these may well be linked
> Code to reproduce (can be run in spark-shell):
> import org.apache.spark.graphx._
> import org.apache.spark.rdd.RDD
> import org.apache.spark.sql.types._
> import org.apache.spark.sql._
> val r = scala.util.Random
> val list = (0L to 500L).map(i=>(i,r.nextInt(500).asInstanceOf[Long],"LABEL"))
> val distData = sc.parallelize(list)
> val edgesRDD = distData.map(x => Edge(x._1, x._2, x._3))
> val distinctNodes = distData.flatMap{row => Iterable((row._1, ("A")),(row._2, 
> ("B")))}.distinct()
> val nodesRDD: RDD[(VertexId, (String))] = distinctNodes
> val graph = Graph(nodesRDD, edgesRDD)
> graph.persist()
> val ccGraph = graph.connectedComponents()
> ccGraph.cache
>    
> val schema = StructType(Seq(StructField("id", LongType, false), 
> StructField("netid", LongType, false)))
> val 
> rdd=ccGraph.vertices.map(row=>(Row(row._1.asInstanceOf[Long],row._2.asInstanceOf[Long])))
> val builtEntityDF = sqlContext.createDataFrame(rdd, schema)
>  
> /*this unpersist step causes the issue*/
> ccGraph.unpersist()
>  
> /*write step hangs for ever*/
> builtEntityDF.write.format("parquet").mode("overwrite").save("/user/root/writetest.parquet")
>  
> If you take out the ccGraph.unpersist() step the write step completes 
> successfully



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to