[ https://issues.apache.org/jira/browse/SPARK-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14048073#comment-14048073 ]
Baoxu Shi commented on SPARK-2245: ---------------------------------- I edited my original comment to add the updates, but I do not know if you can get them via email. So I resubmit it again. Hope that won't bother you. [~ankurd] Hi Ankur Dave, I changed my pull request. But there is another exception, ShippableVertexPartition is not serializable. So I serialized it, but there is another exception org.apache.spark.graphx.impl.RoutingTablePartition is not serializable. Then I serialized it again, but on iteration 2 there will be an exception: org.apache.spark.graphx.impl.ShippableVertexPartition cannot be cast to scala.Tuple2 The code I'm using are: val conf = new SparkConf().setAppName("HDTM") .setMaster("local[4]") val sc = new SparkContext(conf) sc.setCheckpointDir("./checkpoint") val v = sc.parallelize(Seq[(VertexId, Long)]((0L, 0L), (1L, 1L), (2L, 2L))) val e = sc.parallelize(Seq[Edge[Long]](Edge(0L, 1L, 0L), Edge(1L, 2L, 1L), Edge(2L, 0L, 2L))) var g = Graph(v, e) val vertexIds = Seq(0L, 1L, 2L) var prevG: Graph[VertexId, Long] = null for (i <- 1 to 2000) { vertexIds.toStream.foreach(id => { prevG = g g = Graph(g.vertices, g.edges) g.vertices.cache() g.edges.cache() prevG.unpersistVertices(blocking = false) prevG.edges.unpersist(blocking = false) } ) g.vertices.checkpoint() g.edges.checkpoint() g.edges.count() g.vertices.count() println(s"$ {g.vertices.isCheckpointed} $ {g.edges.isCheckpointed} ") println(" iter " + i + " finished") } println(g.vertices.collect().mkString(" ")) println(g.edges.collect().mkString(" ")) Am I on the right track? Or Should there be another way to change it? > VertexRDD can not be materialized for checkpointing > --------------------------------------------------- > > Key: SPARK-2245 > URL: https://issues.apache.org/jira/browse/SPARK-2245 > Project: Spark > Issue Type: Bug > Components: GraphX > Reporter: Baoxu Shi > > Seems one can not materialize VertexRDD by simply calling count method, which > is overridden by VertexRDD. But if you call RDD's count, it could materialize > it. > Is this a feature that designed to get the count without materialize > VertexRDD? If so, do you guys think it is necessary to add a materialize > method to VertexRDD? > By the way, does count() is the cheapest way to materialize a RDD? Or it just > cost the same resources like other actions? > The pull request is here: > https://github.com/apache/spark/pull/1177 > Best, -- This message was sent by Atlassian JIRA (v6.2#6252)