hi experts,
I’m reporting a problem about spark graphx, I use zeppelin submit spark jobs,
note that scala environment shares the same SparkContext, SQLContext instance,
and I call Connected components algorithm to do some Business,
found that every time when the job finished, some graph storage RDDs weren’t
bean released,
after several times there would be a lot of storage RDDs existing even through
all the jobs have finished .
So I check the code of connectedComponents and find that may be a problem in
Pregel.scala .
when param graph has been cached, there isn’t any way to unpersist,
so I add red font code to solve the problem
def apply[VD: ClassTag, ED: ClassTag, A: ClassTag]
(graph: Graph[VD, ED],
initialMsg: A,
maxIterations: Int = Int.MaxValue,
activeDirection: EdgeDirection = EdgeDirection.Either)
(vprog: (VertexId, VD, A) => VD,
sendMsg: EdgeTriplet[VD, ED] => Iterator[(VertexId, A)],
mergeMsg: (A, A) => A)
: Graph[VD, ED] =
{
......
var g = graph.mapVertices((vid, vdata) => vprog(vid, vdata,
initialMsg)).cache()
graph.unpersistVertices(blocking = false)
graph.edges.unpersist(blocking = false)
......
} // end of apply
I'm not sure if this is a bug,
and thank you for your time,
juntao