On 27 Jul 2015, at 16:42, Ulanov, Alexander <alexander.ula...@hp.com> wrote:

> It seems that the mentioned two joins can be rewritten as one outer join


You're right. In fact, the outer join can be streamlined further using a
method from GraphOps:

g = g.joinVertices(messages)(vprog).cache()

Then, instead of passing newVerts as the active set for mapReduceTriplets,
we could pass `messages`.

If you're interested in proposing a PR for this, I've attached a patch with
these changes and updates to the comments.

On Tue, Jul 28, 2015 at 1:15 AM, Ulanov, Alexander <alexander.ula...@hp.com>
 wrote:

> I’ve found two PRs (almost identical) for replacing mapReduceTriplets with
> aggregateMessages

[...]
> Do you know the reason why this improvement is not pushed?


There isn't any performance benefit to switching Pregel to use
aggregateMessages while preserving its current interface, because the
interface uses Iterators and would require us to wrap and unwrap them
anyway. The semantics of aggregateMessagesWithActiveSet are otherwise the
same as mapReduceTriplets, so there isn't any functionality we are missing
out on. And this change seems too small to justify introducing a new
version of Pregel, though it would be worthwhile when combined with other
improvements <https://github.com/apache/spark/pull/1217>.

Ankur <http://www.ankurdave.com/>

Attachment: pregel-simplify-join.patch
Description: Binary data

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Reply via email to