[Graphx] which way is better to access faraway neighbors?

2014-12-05 Thread Yifan LI
Hi,

I have a graph in where each vertex keep several messages to some faraway 
neighbours(I mean, not to only immediate neighbours, at most k-hops far, e.g. k 
= 5).

now, I propose to distribute these messages to their corresponding 
destinations(say, faraway neighbours”):

- by using pregel api, one superset is enough 
- by using spark basic operations(groupByKey, leftJoin, etc) on vertices RDD 
and its intermediate results.

w.r.t the communication among machines, and the high cost of 
groupByKey/leftJoin, I guess that 1st option is better?

what’s your idea?


Best,
Yifan LI







Re: [Graphx] which way is better to access faraway neighbors?

2014-12-05 Thread Ankur Dave
At 2014-12-05 02:26:52 -0800, Yifan LI iamyifa...@gmail.com wrote:
 I have a graph in where each vertex keep several messages to some faraway 
 neighbours(I mean, not to only immediate neighbours, at most k-hops far, e.g. 
 k = 5).

 now, I propose to distribute these messages to their corresponding 
 destinations(say, faraway neighbours”):

 - by using pregel api, one superset is enough 
 - by using spark basic operations(groupByKey, leftJoin, etc) on vertices RDD 
 and its intermediate results.

 w.r.t the communication among machines, and the high cost of 
 groupByKey/leftJoin, I guess that 1st option is better?

If messages will only travel along edges (even if they travel over multiple 
edges), then the Pregel API should be faster. You'll have to run k supersteps 
for messages to propagate k hops away from their origins.

If messages can jump directly between two arbitrary vertices, then doing a 
single set of Spark basic operations may be faster than running multiple Pregel 
supersteps.

Ankur

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org