Yes, I am not really happy with that "collect". I was taking a look to use subgraph method and others options and didn't figure out anything easy or direct..
I'm going to try your idea. 2016-02-26 14:16 GMT+01:00 Robin East <robin.e...@xense.co.uk>: > Whilst I can think of other ways to do it I don’t think they would be > conceptually or syntactically any simpler. GraphX doesn’t have the concept > of built-in vertex properties which would make this simpler - a vertex in > GraphX is a Vertex ID (Long) and a bunch of custom attributes that you > assign. This means you have to find a way of ‘pushing’ the vertex degree > into the graph so you can do comparisons (cf a join in relational > databases) or as you have done create a list and filter against that (cf > filtering against a sub-query in relational database). > > One thing I would point out is that you probably want to avoid > finalVerexes.collect() for a large-scale system - this will pull all the > vertices into the driver and then push them out to the executors again as > part of the filter operation. A better strategy for large graphs would be: > > 1. build a graph based on the existing graph where the vertex attribute is > the vertex degree - the GraphX documentation shows how to do this > 2. filter this “degrees” graph to just give you 0 degree vertices > 3 use graph.mask passing in the 0-degree graph to get the original graph > with just 0 degree vertices > > Just one variation on several possibilities, the key point is that > everything is just a graph transformation until you call an action on the > resulting graph > > ------------------------------------------------------------------------------- > Robin East > *Spark GraphX in Action* Michael Malak and Robin East > Manning Publications Co. > http://www.manning.com/books/spark-graphx-in-action > > > > > > On 26 Feb 2016, at 11:59, Guillermo Ortiz <konstt2...@gmail.com> wrote: > > I'm new with graphX. I need to get the vertex without out edges.. > I guess that it's pretty easy but I did it pretty complicated.. and > inefficienct > > val vertices: RDD[(VertexId, (List[String], List[String]))] = > sc.parallelize(Array((1L, (List("a"), List[String]())), > (2L, (List("b"), List[String]())), > (3L, (List("c"), List[String]())), > (4L, (List("d"), List[String]())), > (5L, (List("e"), List[String]())), > (6L, (List("f"), List[String]())))) > > // Create an RDD for edges > val relationships: RDD[Edge[Boolean]] = > sc.parallelize(Array(Edge(1L, 2L, true), Edge(2L, 3L, true), Edge(3L, 4L, > true), Edge(5L, 2L, true))) > > val out = minGraph.outDegrees.map(vertex => vertex._1) > > val finalVertexes = minGraph.vertices.keys.subtract(out) > > //It must be something better than this way.. > val nodes = finalVertexes.collect() > val result = minGraph.vertices.filter(v => nodes.contains(v._1)) > > > What's the good way to do this operation? It seems that it should be pretty > easy. > > >