On 27 Jul 2015, at 16:42, Ulanov, Alexander <alexander.ula...@hp.com> wrote:
> It seems that the mentioned two joins can be rewritten as one outer join You're right. In fact, the outer join can be streamlined further using a method from GraphOps: g = g.joinVertices(messages)(vprog).cache() Then, instead of passing newVerts as the active set for mapReduceTriplets, we could pass `messages`. If you're interested in proposing a PR for this, I've attached a patch with these changes and updates to the comments. On Tue, Jul 28, 2015 at 1:15 AM, Ulanov, Alexander <alexander.ula...@hp.com> wrote: > I’ve found two PRs (almost identical) for replacing mapReduceTriplets with > aggregateMessages [...] > Do you know the reason why this improvement is not pushed? There isn't any performance benefit to switching Pregel to use aggregateMessages while preserving its current interface, because the interface uses Iterators and would require us to wrap and unwrap them anyway. The semantics of aggregateMessagesWithActiveSet are otherwise the same as mapReduceTriplets, so there isn't any functionality we are missing out on. And this change seems too small to justify introducing a new version of Pregel, though it would be worthwhile when combined with other improvements <https://github.com/apache/spark/pull/1217>. Ankur <http://www.ankurdave.com/>
pregel-simplify-join.patch
Description: Binary data
--------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org