[
https://issues.apache.org/jira/browse/TINKERPOP3-498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14544282#comment-14544282
]
Matthias Broecheler commented on TINKERPOP3-498:
------------------------------------------------
In fact, is there a ticket for making TraversalVertexProgram rely on local
messages where it can? Why TraversalVertexProgram is really nice in terms of
its generality, it makes a number of assumptions that go against the idea
behind GraphComputer attempt to abstract OLAP graph computation:
# It uses global message passing
# It assumes that the entire star graph is loaded
# Its data structures (sideEffect and whatever else traverser is carrying
around) is opaque to the implementation which makes it very hard to optimize
the datastructures.
For Giraph's vertex centric model all these assumptions don't matter much
(aside from the third one maybe) but it constrains implementations to that
model if they want to have reasonable performance.
You can see this in Fulgora. It's very efficient if a user implements a
VertexProgram using local messages but becomes really slow under the additional
assumptions above.
> Could we get rid of PageRankVertexProgram? -- is everything just
> TraversalVertexProgram?
> ----------------------------------------------------------------------------------------
>
> Key: TINKERPOP3-498
> URL: https://issues.apache.org/jira/browse/TINKERPOP3-498
> Project: TinkerPop 3
> Issue Type: Improvement
> Components: process
> Reporter: Marko A. Rodriguez
> Assignee: Marko A. Rodriguez
>
> There is the notion of a message in {{GraphComputer}}. A {{Traverser}} is
> sufficiently complex to represent any message (i.e. {{Traverser.get()}}).
> Next, the {{MessageCombiner}} is simply {{Traverser.merge()}}. Next, is
> {{PageRankVertexProgram}} (and others) simply a combination of {{Steps}} to
> execute via a {{TraversalVertexProgram}}.
> Look at the {{execute()}} method of {{PageRankVertexProgram}}:
> {code:java}
> @Override
> public void execute(final Vertex vertex, Messenger<Double> messenger,
> final Memory memory) {
> if (memory.isInitialIteration()) {
> messenger.sendMessage(this.countMessageScope, 1.0d);
> } else if (1 == memory.getIteration()) {
> double initialPageRank = 1.0d / this.vertexCountAsDouble;
> double edgeCount =
> StreamFactory.stream(messenger.receiveMessages(this.countMessageScope)).reduce(0.0d,
> (a, b) -> a + b);
> vertex.singleProperty(PAGE_RANK, initialPageRank);
> vertex.singleProperty(EDGE_COUNT, edgeCount);
> messenger.sendMessage(this.incidentMessageScope, initialPageRank
> / edgeCount);
> } else {
> double newPageRank =
> StreamFactory.stream(messenger.receiveMessages(this.incidentMessageScope)).reduce(0.0d,
> (a, b) -> a + b);
> newPageRank = (this.alpha * newPageRank) + ((1.0d - this.alpha) /
> this.vertexCountAsDouble);
> vertex.singleProperty(PAGE_RANK, newPageRank);
> messenger.sendMessage(this.incidentMessageScope, newPageRank /
> vertex.<Double>value(EDGE_COUNT));
> }
> }
> {code}
> Lets see what that looks like as a pure Gremlin play:
> {code:java}
> __.in().fold(0,(a,b) -> a+b).sideEffect(count ->
> count.sideEffect('count',count.get())).repeat(
> __.fold(0,(a,b)->a+b)
> .sideEffect(pageRank ->
> pageRank.sideEffect('pageRank',pageRank.get()))
> .map(pageRank -> pageRank.get() / pageRank.sideEffect('count'))
> .out()).times(30)
> {code}
> In essence, the first line of the traversal gets the edge counts. The
> {{repeat()}}-traversal aggregates the incoming pageRank values, sets the
> vertex's current pageRank value, divides the pageRank value by the number of
> edges, and splits that traverser amongst all the outgoing edges. It does this
> 30 times.
> In this way, all that needs to exist is {{TraversalVertexProgram}}. Moreover,
> its all just Gremlin. Of course, we can canned algorithms, e.g.:
> {code:java}
> public Algorithms {
> public static final Traversal<Vertex,Vertex> PAGE_RANK =
> __.in().fold(0,(a,b) -> a+b).sideEffect(count ->
> count.sideEffect('count',p.get())).repeat(
> __.fold(0,(a,b)->a+b)
> .sideEffect(pageRank ->
> pageRank.sideEffect('pageRank',pageRank.get()))
> .map(pageRank -> pageRank.get() / pageRank.sideEffect('count'))
> .out()).times(30);
> public static final Traversal<Vertex,Vertex> ANOTHER_ALGORITHM = ...
> }
> {code}
> This is pretty insane.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)