[ 
https://issues.apache.org/jira/browse/TINKERPOP-1163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15212621#comment-15212621
 ] 

Marko A. Rodriguez commented on TINKERPOP-1163:
-----------------------------------------------

I think this is really important as we are now having lots of 
{{TraversalStategies}} that are OLAP only. I think we should have something 
like this:

{code}
public static TraversalStrategies getStrategies(final Class<? extends Graph> 
graphClass)
public static TraversalStrategies getStrategies(final Class<? extends 
GraphComputer> graphComputerClass)
{code}

Next:

{code}
TraversalStrategies.GlobalCache.registerStrategies(TinkerGraph.class, 
TraversalStrategies.GlobalCache.getStrategies(Graph.class).clone().addStrategies(TinkerGraphStepStrategy.instance()));
TraversalStrategies.GlobalCache.registerStrategies(TinkerGraphComputer.class, 
TraversalStrategies.GlobalCache.getStrategies(GraphComputer.class).clone());
{code}

Finally:

{code}
 final TraversalStrategies defaultGraphComputerStrategies = new 
DefaultTraversalStrategies();
            defaultGraphComputerStrategies.addStrategies(
                    MatchPredicateStrategy.instance(),
                    PathProcessorStrategy.instance(),
                    OrderLimitStrategy.instance(),
                    ComputerVerificationStrategy.instance()));
            CACHE.put(GraphComputer.class, 
defaultGraphComputerStrategies.clone());
{code}

In essence, we make a split between {{Graph}} and {{GraphComputer}} strategies 
so that we don't have a bunch of strategies in OLTP do 
{{if(!TraversalHelper.onGraphComputer(traversal)) return}}. All about clock 
cycles.

> GraphComputer's can have TraversalStrategies.
> ---------------------------------------------
>
>                 Key: TINKERPOP-1163
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-1163
>             Project: TinkerPop
>          Issue Type: Improvement
>          Components: hadoop, process
>    Affects Versions: 3.1.0-incubating
>            Reporter: Marko A. Rodriguez
>
> @dkuppitz makes the joke that he can count the number of vertices in the 
> Friendster adjacency list with "awk to the sed to the bash to the.." in < 1 
> minute. SparkGraphComputer with four blades takes ~5 minutes.
> What's the dealio?
> Imagine a world where {{SparkGraphComputerStrategy}} exists. It analyzes 
> traversals and does fast executions breaking away from the VertexProgram API 
> and going strait to the native API of Spark. Check it:
> {code}
> g.V().count() -> inputRDD.count()
> {code}
> ...add a {{EmptyVertex.instance()}} manipulation to the respective 
> InputFormats and you are just then skipping through bytes not manifesting 
> objects at all. BAM. That would take 30 seconds on Friendster.
> {code}
> g.V().outE('knows').count() --> 
> inputRDD.flatMapToPair{edgeComponents}.filter{knows}.count()
> {code}
> Blazing fast.
> ....for all those standard patterns, we just do a "native" execution for the 
> respective GraphComputer engine. We sideStep object creation, iteration 
> phases, views, map reduce jobs.... However, we have to be smart to update the 
> {{Memory}} so it looks as if the real VertexProgram executed! --- 
> {{iteration}}, {{runtime}}, {{~reducing}}, etc.
> Genius.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to