Hello,
> I have only one idea: do traversal API users still really have to know
> whether they use a GraphComputer or GraphActors? In other words, can the
> withEngine options not just be some illuminating token constants for users
> that just want to have the traversal() returned (LOCAL, LOCAL_DISTRIBUTED,
> DISTRIBUTED)? Of course, the more extended API will be useful for a minority
> of power users that want to optimize an ActorProgram for a specific use case.
So, Matthias Bröcheler, for a few years now, wanted something like a
TraversalEngineReasoningStrategy. This would be a DecorationStrategy that would
look at the traversal and make a best guess as whether to execute
iterator-style, compute-style, or actors-style. For instance:
g.V().count() // computer
g.V(1).out().count() // iterator
g.V(1).repeat(out()).times(3).count() // actors
I think for now withEngine() is the bare-bones necessity and we can get clever
with reasoning and your enum-model down the line.
Finally, note that I went with withProcessor() last night as the name :).
GraphActors and GraphComputer both implement a new Processor interface (which
is primarily a marker interface).
Marko.
>
> Cheers, Marc
>
> Op woensdag 14 december 2016 18:46:44 UTC+1 schreef Marko A. Rodriguez:
> Hello,
>
> For the last week I’ve been working on “distributed OLTP.” Gremlin has a
> really nice architecture in that a traverser can be shipped around a cluster
> and reattached to its respective element (vertex/edge/etc.) and step
> (traversal) at the remote location and continue to compute. Thus, we can have
> step-by-step query routing.
>
> https://issues.apache.org/jira/browse/TINKERPOP-1564
> <https://issues.apache.org/jira/browse/TINKERPOP-1564>
>
> With that, I’ve created GraphActors which is similar to GraphComputer.
> However, there are some fundamental distinctions:
>
> 1. GraphActors assumes the boundary of computation is a Partition.
> - GraphComputer assumes the boundary of computation is vertex
> and its incident edges and properties.
> 2. GraphActors assumes asynchronous computation with barriers at
> Barrier steps.
> - GraphComputer assumes (sorta) synchronous computation with a
> barrier when all traversers have left their local vertex.
> 3. GraphActors is traverser-centric and partition-bound.
> - GraphComputer is vertex-centric and vertex-bound.
>
> In gremlin-core/ I’ve created a new set of interfaces off of process/.
>
>
> https://github.com/apache/tinkerpop/tree/TINKERPOP-1564/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/actor
>
> <https://github.com/apache/tinkerpop/tree/TINKERPOP-1564/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/actor>
> GraphActors <=> GraphComputer
> MasterActor <=> setup()/terminate().
> WorkerActor <=> execute()
> ActorProgram <=> VertexProgram
>
> The parallel between GraphComputer and GraphActors are strong. In short, a
> (hardcore) user can create an ActorProgram and submit it to a GraphActors.
> The ActorProgram will effect a distributed, asynchronous, partition-bound
> message passing algorithm and return a Future<Result>. There is one
> ActorProgram in particular the executes a Gremlin traversal.
>
>
> https://github.com/apache/tinkerpop/blob/TINKERPOP-1564/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/actor/traversal/TraversalActorProgram.java
>
> <https://github.com/apache/tinkerpop/blob/TINKERPOP-1564/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/actor/traversal/TraversalActorProgram.java>
> master actor program:
> https://github.com/apache/tinkerpop/blob/TINKERPOP-1564/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/actor/traversal/TraversalMasterProgram.java
>
> <https://github.com/apache/tinkerpop/blob/TINKERPOP-1564/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/actor/traversal/TraversalMasterProgram.java>
> worker actors program:
> https://github.com/apache/tinkerpop/blob/TINKERPOP-1564/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/actor/traversal/TraversalWorkerProgram.java
>
> <https://github.com/apache/tinkerpop/blob/TINKERPOP-1564/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/actor/traversal/TraversalWorkerProgram.java>
>
> Pretty simple. Besides some problems I’m having with serialization stuff in
> GroupStep, the ProcessSuite passes.
>
> Now, its up to a provider to implement the GraphActors interfaces. Welcome
> akka-gremlin/.
>
>
> https://github.com/apache/tinkerpop/blob/TINKERPOP-1564/akka-gremlin/src/main/java/org/apache/tinkerpop/gremlin/akka/process/actor/AkkaGraphActors.java
>
> <https://github.com/apache/tinkerpop/blob/TINKERPOP-1564/akka-gremlin/src/main/java/org/apache/tinkerpop/gremlin/akka/process/actor/AkkaGraphActors.java>
> master actor:
> https://github.com/apache/tinkerpop/blob/TINKERPOP-1564/akka-gremlin/src/main/java/org/apache/tinkerpop/gremlin/akka/process/actor/MasterActor.java
>
> <https://github.com/apache/tinkerpop/blob/TINKERPOP-1564/akka-gremlin/src/main/java/org/apache/tinkerpop/gremlin/akka/process/actor/MasterActor.java>
> worker actor:
> https://github.com/apache/tinkerpop/blob/TINKERPOP-1564/akka-gremlin/src/main/java/org/apache/tinkerpop/gremlin/akka/process/actor/WorkerActor.java
>
> <https://github.com/apache/tinkerpop/blob/TINKERPOP-1564/akka-gremlin/src/main/java/org/apache/tinkerpop/gremlin/akka/process/actor/WorkerActor.java>
> mailbox system:
> https://github.com/apache/tinkerpop/blob/TINKERPOP-1564/akka-gremlin/src/main/java/org/apache/tinkerpop/gremlin/akka/process/actor/ActorMailbox.java
>
> <https://github.com/apache/tinkerpop/blob/TINKERPOP-1564/akka-gremlin/src/main/java/org/apache/tinkerpop/gremlin/akka/process/actor/ActorMailbox.java>
>
> Dead simple!
>
> What we should have done with TinkerPop from the start is include the notion
> of a Partition. For this branch, I’ve added two concepts Partition and
> Partitioner.
>
>
> https://github.com/apache/tinkerpop/blob/TINKERPOP-1564/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/Partitioner.java
>
> <https://github.com/apache/tinkerpop/blob/TINKERPOP-1564/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/Partitioner.java>
>
> https://github.com/apache/tinkerpop/blob/TINKERPOP-1564/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/Partition.java
>
> <https://github.com/apache/tinkerpop/blob/TINKERPOP-1564/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/Partition.java>
>
> Why is this cool? This is cool because if GraphActors system knows the
> partitions of the underlying Graph data, then it can immediately process the
> Graph data in a distributed manner. No need to write a custom “InputFormat.”
> We should have done this from the start because then GraphComputer could do
> the same. For instance, spark-gremlin/ can run over TinkerGraph as it doesn’t
> care about TinkerGraph, it cares about Partition “input splits.” By adding
> this layer of information, ANY Graph can work with ANY GraphComputer or
> GraphActors. I have yet to create PartitionInputRDD and PartitionInputFormat,
> but that will be next and at that point, GraphComputers are agnostic to the
> underlying implementation.
>
> So there you have it. Your thoughts on the matter would be most appreciated.
>
> Things still left to do:
>
> * We need a concept of a traversal engine. That is, something like:
> - g.withEngine(SparkGraphComputer.class) // it knows its
> a GraphComputer so thats the engine.
> - g.withEngine(AkkaGraphActors.class) // it knows its a
> GraphActors so thats the engine.
> - g.withEngine(Iterator.class) // this means,
> just iterate the traversal locally :).
> * GraphComputer semantics are a restricted version of GraphActors
> semantics.
> - GraphActors becomes GraphComputer when the Partitions are
> defined by vertices.
> - I think I can unify the two and thus, we could have
> SparkGraphActors.
>
>
>
> Here is some fun playing:
>
>
> \,,,/
> (o o)
> -----oOOo-(3)-oOOo-----
> plugin activated: tinkerpop.server
> plugin activated: tinkerpop.utilities
> plugin activated: tinkerpop.tinkergraph
> gremlin> :install org.apache.tinkerpop akka-gremlin 3.3.0-SNAPSHOT
> ==>Loaded: [org.apache.tinkerpop, akka-gremlin, 3.3.0-SNAPSHOT]
> gremlin> graph = TinkerFactory.createModern()
> ==>tinkergraph[vertices:6 edges:6]
> gremlin> graph.partitioner()
> ==>partitioner[globalpartitioner:1]
> gremlin> partitioner = new HashPartitioner(graph.partitioner(),3) //
> lets create 3 logical partitions over TinkerGraph
> ==>partitioner[hashpartitioner:3]
> gremlin> g = graph.traversal().withStrategies(new
> ActorProgramStrategy(AkkaGraphActors,partitioner)) // in the future
> withEngine() will be used
> ==>graphtraversalsource[tinkergraph[vertices:6 edges:6], actors]
> gremlin> g.V().repeat(out()).times(2).values('name')
> ==>lop
> ==>ripple
> gremlin> g.V().repeat(both()).times(2).groupCount().by(out().in().count())
> // beyond the star graph!
> ==>[0:13,3:3,4:7,5:7]
> gremlin> g.V().match( //
> distributed pattern matching
> ......1> __.as('a').out('created').as('b'),
> ......2> __.as('b').has('name', 'lop'),
> ......3> __.as('b').in('created').as('c'),
> ......4> __.as('c').has('age', 29)).
> ......5> select('a','c').by('name')
> ==>[a:marko,c:marko]
> ==>[a:josh,c:marko]
> ==>[a:peter,c:marko]
> gremlin>
>
> Now imagine this executing over various providers:
>
> 1. A sharded graph database (Titan/DSEGraph/OrientDB): the traversers
> move between machines so they are always data local processing.
> 2. A replicated graph database (Neo4j): logical partitions are created
> so that each machine is responsible for a subgraph of the full graph (load
> balancing and parallization).
> 3. A single machine graph database (TinkerGraph): logical partitions
> are created so that each core of the machine is responsible for a subgraph
> (parallization).
>
> Pretty neat, eh?
>
> Ideas are more than welcome,
> Marko.
>
> http://markorodriguez.com <http://markorodriguez.com/>
>
>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Gremlin-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected]
> <mailto:[email protected]>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/gremlin-users/98255596-3d17-4b92-a82f-3d219eaffd81%40googlegroups.com
>
> <https://groups.google.com/d/msgid/gremlin-users/98255596-3d17-4b92-a82f-3d219eaffd81%40googlegroups.com?utm_medium=email&utm_source=footer>.
> For more options, visit https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>.