Re: GraphX vertex partition/location strategy

Michael Malak Mon, 19 Jan 2015 13:34:29 -0800

But wouldn't the gain be greater under something similar to EdgePartition1D 
(but perhaps better load-balanced based on number of edges for each vertex) and 
an algorithm that primarily follows edges in the forward direction?
      From: Ankur Dave <ankurd...@gmail.com>
 To: Michael Malak <michaelma...@yahoo.com> 
Cc: "dev@spark.apache.org" <dev@spark.apache.org> 
 Sent: Monday, January 19, 2015 2:08 PM
 Subject: Re: GraphX vertex partition/location strategy

No - the vertices are hash-partitioned onto workers independently of the edges. 
It would be nice for each vertex to be on the worker with the most adjacent 
edges, but we haven't done this yet since it would add a lot of complexity to 
avoid load imbalance while reducing the overall communication by a small factor.
We refer to the number of partitions containing adjacent edges for a particular 
vertex as the vertex's replication factor. I think the typical replication 
factor for power-law graphs with 100-200 partitions is 10-15, and placing the 
vertex at the ideal location would only reduce the replication factor by 1.


Ankur


On Mon, Jan 19, 2015 at 12:20 PM, Michael Malak 
<michaelma...@yahoo.com.invalid> wrote:

Does GraphX make an effort to co-locate vertices onto the same workers as the 
majority (or even some) of its edges?

Re: GraphX vertex partition/location strategy

Reply via email to