[jira] [Commented] (SPARK-2245) VertexRDD can not be materialized for checkpointing

Baoxu Shi (JIRA) Mon, 30 Jun 2014 13:10:39 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14048073#comment-14048073
 ]


Baoxu Shi commented on SPARK-2245:
----------------------------------

I edited my original comment to add the updates, but I do not know if you can 
get them via email. So I resubmit it again. Hope that won't bother you. 
[~ankurd]

Hi Ankur Dave, I changed my pull request. But there is another exception, 
ShippableVertexPartition is not serializable. So I serialized it, but there is 
another exception org.apache.spark.graphx.impl.RoutingTablePartition is not 
serializable. Then I serialized it again, but on iteration 2 there will be an 
exception: org.apache.spark.graphx.impl.ShippableVertexPartition cannot be cast 
to scala.Tuple2
The code I'm using are:
val conf = new SparkConf().setAppName("HDTM")
.setMaster("local[4]")
val sc = new SparkContext(conf)
sc.setCheckpointDir("./checkpoint")
val v = sc.parallelize(Seq[(VertexId, Long)]((0L, 0L), (1L, 1L), (2L, 2L)))
val e = sc.parallelize(Seq[Edge[Long]](Edge(0L, 1L, 0L), Edge(1L, 2L, 1L), 
Edge(2L, 0L, 2L)))
var g = Graph(v, e)
val vertexIds = Seq(0L, 1L, 2L)
var prevG: Graph[VertexId, Long] = null
for (i <- 1 to 2000) {
vertexIds.toStream.foreach(id =>
{ prevG = g g = Graph(g.vertices, g.edges) g.vertices.cache() g.edges.cache() 
prevG.unpersistVertices(blocking = false) prevG.edges.unpersist(blocking = 
false) }
)
g.vertices.checkpoint()
g.edges.checkpoint()
g.edges.count()
g.vertices.count()
println(s"$
{g.vertices.isCheckpointed}
$
{g.edges.isCheckpointed}
")
println(" iter " + i + " finished")
}
println(g.vertices.collect().mkString(" "))
println(g.edges.collect().mkString(" "))
Am I on the right track? Or Should there be another way to change it?

> VertexRDD can not be materialized for checkpointing
> ---------------------------------------------------
>
>                 Key: SPARK-2245
>                 URL: https://issues.apache.org/jira/browse/SPARK-2245
>             Project: Spark
>          Issue Type: Bug
>          Components: GraphX
>            Reporter: Baoxu Shi
>
> Seems one can not materialize VertexRDD by simply calling count method, which 
> is overridden by VertexRDD. But if you call RDD's count, it could materialize 
> it.
> Is this a feature that designed to get the count without materialize 
> VertexRDD? If so, do you guys think it is necessary to add a materialize 
> method to VertexRDD?
> By the way, does count() is the cheapest way to materialize a RDD? Or it just 
> cost the same resources like other actions?
> The pull request is here:
> https://github.com/apache/spark/pull/1177
> Best,



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (SPARK-2245) VertexRDD can not be materialized for checkpointing

Reply via email to