But wouldn't the gain be greater under something similar to EdgePartition1D
(but perhaps better load-balanced based on number of edges for each vertex) and
an algorithm that primarily follows edges in the forward direction?
From: Ankur Dave <[email protected]>
To: Michael Malak <[email protected]>
Cc: "[email protected]" <[email protected]>
Sent: Monday, January 19, 2015 2:08 PM
Subject: Re: GraphX vertex partition/location strategy
No - the vertices are hash-partitioned onto workers independently of the edges.
It would be nice for each vertex to be on the worker with the most adjacent
edges, but we haven't done this yet since it would add a lot of complexity to
avoid load imbalance while reducing the overall communication by a small factor.
We refer to the number of partitions containing adjacent edges for a particular
vertex as the vertex's replication factor. I think the typical replication
factor for power-law graphs with 100-200 partitions is 10-15, and placing the
vertex at the ideal location would only reduce the replication factor by 1.
Ankur
On Mon, Jan 19, 2015 at 12:20 PM, Michael Malak
<[email protected]> wrote:
Does GraphX make an effort to co-locate vertices onto the same workers as the
majority (or even some) of its edges?