[ https://issues.apache.org/jira/browse/TINKERPOP-1617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15840347#comment-15840347 ]
ASF GitHub Bot commented on TINKERPOP-1617: ------------------------------------------- GitHub user okram opened a pull request: https://github.com/apache/tinkerpop/pull/549 TINKERPOP-1617: Create a SingleIterationStrategy which will do its best to rewrite OLAP traversals to not message pass. https://issues.apache.org/jira/browse/TINKERPOP-1617 There are various traversals that can be rewritten using `local()` that will enable the `GraphComputer` to avoid a message pass and thus, can accomplish the computation in a single scan of the graph. Benefiting traversal examples include: ``` g.V().out().id() --> g.V().local(out().id()) g.V().out().id().count() --> g.V().local(out().id()).count() g.V().out().id().dedup().count() g.V().inE().values("weight") // realize that in-edges are hosted by the out-vertex g.V().inE().values("weight").sum() g.V().both().count() g.V().inE().count() g.V().as("a").outE().inV().as("b").id().dedup("a", "b").by(T.id).count() ``` Finally, the traversal that sparked this PR: ``` g.V().in().id().select("articleNumber").dedup().count() // requires one message pass ==translatesTo==> g.V().local(in().id().select("articleNumber")).dedup().count() // requires no message passing ``` `SingleIterationStrategy` plays well with `SparkSingleIterationStrategy` which determines whether it is necessary to `cache()` and/or `partition()` the graph. If the traversal can be accomplished without a message pass (i.e. a single iteration), then performance is greatly improved as RDD partitions can be dropped as they are processed sequentially. VOTE +1. You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/tinkerpop TINKERPOP-1617 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/tinkerpop/pull/549.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #549 ---- ---- > Create a SingleIterationStrategy which will do its best to rewrite OLAP > traversals to not message pass. > ------------------------------------------------------------------------------------------------------- > > Key: TINKERPOP-1617 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1617 > Project: TinkerPop > Issue Type: Improvement > Components: process > Affects Versions: 3.2.3 > Reporter: Marko A. Rodriguez > Assignee: Marko A. Rodriguez > > The traversal: > {code} > g.V().out().id().count() > {code} > Requires a message pass from {{out()}}. We shouldn't do this. Instead, if we > wrap the pre-barrier stage into a {{local()}}, we have: > {code} > g.V().local(out().id()).count() > {code} > ...which doesn't require a message pass and has the same semantics. This will > help open up numerous OLAP type traversals to single-pass/non-caching scans. -- This message was sent by Atlassian JIRA (v6.3.4#6332)