[ https://issues.apache.org/jira/browse/TINKERPOP-1163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15213084#comment-15213084 ]
ASF GitHub Bot commented on TINKERPOP-1163: ------------------------------------------- GitHub user okram opened a pull request: https://github.com/apache/incubator-tinkerpop/pull/278 TINKERPOP-1163: GraphComputer's can have TraversalStrategies. https://issues.apache.org/jira/browse/TINKERPOP-1163 GraphComputers can now have their own `TraversalStrategy` registrations in the global cache. Currently, as it stands, all that is registered is `GraphComputer.class` which has `PathProcessStrategy`, `OrderLimitStrategy`, `ComputerVerificationStrategy`. Moving forward, we will be able to have strategies like `SparkCountStrategy` which will convert `g.V().count()` into `inputRDD.count()` and thus, allow us to talk more directly to the `GraphComputer` engine. `TinkerCountStrategy` would do `g.V().count()` as `this.vertices.count()`. Blazin'. .... however, what we have here is the the infrastructure to allow for the distinction between `Graph` and `GraphComputer` strategies. Note that this PR is backwards compatible. CHANGELOG ``` * `TraversalStrategies.GlobalCache` supports both `Graph` and `GraphComputer` strategy registrations. ``` VOTE +1. You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/incubator-tinkerpop TINKERPOP-1163 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-tinkerpop/pull/278.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #278 ---- commit 718caa6be1f722923aa3c23aae9175cecc6ad11a Author: Marko A. Rodriguez <okramma...@gmail.com> Date: 2016-03-26T16:07:41Z TraversalStrategies.GlobalCache has two caches now -- one for Graphs and one for GraphComputers. For 3.2.0, this simply allows us to partition the strategies so that the 3 GraphComputer strategies we have are never called in OLTP (saving clock cylces). In the future, it will enable us to have something like SparkCountStrategy which will just do inputRDD.count() for g.V().count() instead of going through the rigamorole of TraversalVertexProgram. ---- > GraphComputer's can have TraversalStrategies. > --------------------------------------------- > > Key: TINKERPOP-1163 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1163 > Project: TinkerPop > Issue Type: Improvement > Components: hadoop, process > Affects Versions: 3.1.0-incubating > Reporter: Marko A. Rodriguez > > @dkuppitz makes the joke that he can count the number of vertices in the > Friendster adjacency list with "awk to the sed to the bash to the.." in < 1 > minute. SparkGraphComputer with four blades takes ~5 minutes. > What's the dealio? > Imagine a world where {{SparkGraphComputerStrategy}} exists. It analyzes > traversals and does fast executions breaking away from the VertexProgram API > and going strait to the native API of Spark. Check it: > {code} > g.V().count() -> inputRDD.count() > {code} > ...add a {{EmptyVertex.instance()}} manipulation to the respective > InputFormats and you are just then skipping through bytes not manifesting > objects at all. BAM. That would take 30 seconds on Friendster. > {code} > g.V().outE('knows').count() --> > inputRDD.flatMapToPair{edgeComponents}.filter{knows}.count() > {code} > Blazing fast. > ....for all those standard patterns, we just do a "native" execution for the > respective GraphComputer engine. We sideStep object creation, iteration > phases, views, map reduce jobs.... However, we have to be smart to update the > {{Memory}} so it looks as if the real VertexProgram executed! --- > {{iteration}}, {{runtime}}, {{~reducing}}, etc. > Genius. -- This message was sent by Atlassian JIRA (v6.3.4#6332)