[ https://issues.apache.org/jira/browse/TINKERPOP-1163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15212621#comment-15212621 ]
Marko A. Rodriguez commented on TINKERPOP-1163: ----------------------------------------------- I think this is really important as we are now having lots of {{TraversalStategies}} that are OLAP only. I think we should have something like this: {code} public static TraversalStrategies getStrategies(final Class<? extends Graph> graphClass) public static TraversalStrategies getStrategies(final Class<? extends GraphComputer> graphComputerClass) {code} Next: {code} TraversalStrategies.GlobalCache.registerStrategies(TinkerGraph.class, TraversalStrategies.GlobalCache.getStrategies(Graph.class).clone().addStrategies(TinkerGraphStepStrategy.instance())); TraversalStrategies.GlobalCache.registerStrategies(TinkerGraphComputer.class, TraversalStrategies.GlobalCache.getStrategies(GraphComputer.class).clone()); {code} Finally: {code} final TraversalStrategies defaultGraphComputerStrategies = new DefaultTraversalStrategies(); defaultGraphComputerStrategies.addStrategies( MatchPredicateStrategy.instance(), PathProcessorStrategy.instance(), OrderLimitStrategy.instance(), ComputerVerificationStrategy.instance())); CACHE.put(GraphComputer.class, defaultGraphComputerStrategies.clone()); {code} In essence, we make a split between {{Graph}} and {{GraphComputer}} strategies so that we don't have a bunch of strategies in OLTP do {{if(!TraversalHelper.onGraphComputer(traversal)) return}}. All about clock cycles. > GraphComputer's can have TraversalStrategies. > --------------------------------------------- > > Key: TINKERPOP-1163 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1163 > Project: TinkerPop > Issue Type: Improvement > Components: hadoop, process > Affects Versions: 3.1.0-incubating > Reporter: Marko A. Rodriguez > > @dkuppitz makes the joke that he can count the number of vertices in the > Friendster adjacency list with "awk to the sed to the bash to the.." in < 1 > minute. SparkGraphComputer with four blades takes ~5 minutes. > What's the dealio? > Imagine a world where {{SparkGraphComputerStrategy}} exists. It analyzes > traversals and does fast executions breaking away from the VertexProgram API > and going strait to the native API of Spark. Check it: > {code} > g.V().count() -> inputRDD.count() > {code} > ...add a {{EmptyVertex.instance()}} manipulation to the respective > InputFormats and you are just then skipping through bytes not manifesting > objects at all. BAM. That would take 30 seconds on Friendster. > {code} > g.V().outE('knows').count() --> > inputRDD.flatMapToPair{edgeComponents}.filter{knows}.count() > {code} > Blazing fast. > ....for all those standard patterns, we just do a "native" execution for the > respective GraphComputer engine. We sideStep object creation, iteration > phases, views, map reduce jobs.... However, we have to be smart to update the > {{Memory}} so it looks as if the real VertexProgram executed! --- > {{iteration}}, {{runtime}}, {{~reducing}}, etc. > Genius. -- This message was sent by Atlassian JIRA (v6.3.4#6332)