Hello, > I have only one idea: do traversal API users still really have to know > whether they use a GraphComputer or GraphActors? In other words, can the > withEngine options not just be some illuminating token constants for users > that just want to have the traversal() returned (LOCAL, LOCAL_DISTRIBUTED, > DISTRIBUTED)? Of course, the more extended API will be useful for a minority > of power users that want to optimize an ActorProgram for a specific use case.
So, Matthias Bröcheler, for a few years now, wanted something like a TraversalEngineReasoningStrategy. This would be a DecorationStrategy that would look at the traversal and make a best guess as whether to execute iterator-style, compute-style, or actors-style. For instance: g.V().count() // computer g.V(1).out().count() // iterator g.V(1).repeat(out()).times(3).count() // actors I think for now withEngine() is the bare-bones necessity and we can get clever with reasoning and your enum-model down the line. Finally, note that I went with withProcessor() last night as the name :). GraphActors and GraphComputer both implement a new Processor interface (which is primarily a marker interface). Marko. > > Cheers, Marc > > Op woensdag 14 december 2016 18:46:44 UTC+1 schreef Marko A. Rodriguez: > Hello, > > For the last week I’ve been working on “distributed OLTP.” Gremlin has a > really nice architecture in that a traverser can be shipped around a cluster > and reattached to its respective element (vertex/edge/etc.) and step > (traversal) at the remote location and continue to compute. Thus, we can have > step-by-step query routing. > > https://issues.apache.org/jira/browse/TINKERPOP-1564 > <https://issues.apache.org/jira/browse/TINKERPOP-1564> > > With that, I’ve created GraphActors which is similar to GraphComputer. > However, there are some fundamental distinctions: > > 1. GraphActors assumes the boundary of computation is a Partition. > - GraphComputer assumes the boundary of computation is vertex > and its incident edges and properties. > 2. GraphActors assumes asynchronous computation with barriers at > Barrier steps. > - GraphComputer assumes (sorta) synchronous computation with a > barrier when all traversers have left their local vertex. > 3. GraphActors is traverser-centric and partition-bound. > - GraphComputer is vertex-centric and vertex-bound. > > In gremlin-core/ I’ve created a new set of interfaces off of process/. > > > https://github.com/apache/tinkerpop/tree/TINKERPOP-1564/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/actor > > <https://github.com/apache/tinkerpop/tree/TINKERPOP-1564/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/actor> > GraphActors <=> GraphComputer > MasterActor <=> setup()/terminate(). > WorkerActor <=> execute() > ActorProgram <=> VertexProgram > > The parallel between GraphComputer and GraphActors are strong. In short, a > (hardcore) user can create an ActorProgram and submit it to a GraphActors. > The ActorProgram will effect a distributed, asynchronous, partition-bound > message passing algorithm and return a Future<Result>. There is one > ActorProgram in particular the executes a Gremlin traversal. > > > https://github.com/apache/tinkerpop/blob/TINKERPOP-1564/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/actor/traversal/TraversalActorProgram.java > > <https://github.com/apache/tinkerpop/blob/TINKERPOP-1564/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/actor/traversal/TraversalActorProgram.java> > master actor program: > https://github.com/apache/tinkerpop/blob/TINKERPOP-1564/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/actor/traversal/TraversalMasterProgram.java > > <https://github.com/apache/tinkerpop/blob/TINKERPOP-1564/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/actor/traversal/TraversalMasterProgram.java> > worker actors program: > https://github.com/apache/tinkerpop/blob/TINKERPOP-1564/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/actor/traversal/TraversalWorkerProgram.java > > <https://github.com/apache/tinkerpop/blob/TINKERPOP-1564/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/actor/traversal/TraversalWorkerProgram.java> > > Pretty simple. Besides some problems I’m having with serialization stuff in > GroupStep, the ProcessSuite passes. > > Now, its up to a provider to implement the GraphActors interfaces. Welcome > akka-gremlin/. > > > https://github.com/apache/tinkerpop/blob/TINKERPOP-1564/akka-gremlin/src/main/java/org/apache/tinkerpop/gremlin/akka/process/actor/AkkaGraphActors.java > > <https://github.com/apache/tinkerpop/blob/TINKERPOP-1564/akka-gremlin/src/main/java/org/apache/tinkerpop/gremlin/akka/process/actor/AkkaGraphActors.java> > master actor: > https://github.com/apache/tinkerpop/blob/TINKERPOP-1564/akka-gremlin/src/main/java/org/apache/tinkerpop/gremlin/akka/process/actor/MasterActor.java > > <https://github.com/apache/tinkerpop/blob/TINKERPOP-1564/akka-gremlin/src/main/java/org/apache/tinkerpop/gremlin/akka/process/actor/MasterActor.java> > worker actor: > https://github.com/apache/tinkerpop/blob/TINKERPOP-1564/akka-gremlin/src/main/java/org/apache/tinkerpop/gremlin/akka/process/actor/WorkerActor.java > > <https://github.com/apache/tinkerpop/blob/TINKERPOP-1564/akka-gremlin/src/main/java/org/apache/tinkerpop/gremlin/akka/process/actor/WorkerActor.java> > mailbox system: > https://github.com/apache/tinkerpop/blob/TINKERPOP-1564/akka-gremlin/src/main/java/org/apache/tinkerpop/gremlin/akka/process/actor/ActorMailbox.java > > <https://github.com/apache/tinkerpop/blob/TINKERPOP-1564/akka-gremlin/src/main/java/org/apache/tinkerpop/gremlin/akka/process/actor/ActorMailbox.java> > > Dead simple! > > What we should have done with TinkerPop from the start is include the notion > of a Partition. For this branch, I’ve added two concepts Partition and > Partitioner. > > > https://github.com/apache/tinkerpop/blob/TINKERPOP-1564/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/Partitioner.java > > <https://github.com/apache/tinkerpop/blob/TINKERPOP-1564/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/Partitioner.java> > > https://github.com/apache/tinkerpop/blob/TINKERPOP-1564/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/Partition.java > > <https://github.com/apache/tinkerpop/blob/TINKERPOP-1564/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/Partition.java> > > Why is this cool? This is cool because if GraphActors system knows the > partitions of the underlying Graph data, then it can immediately process the > Graph data in a distributed manner. No need to write a custom “InputFormat.” > We should have done this from the start because then GraphComputer could do > the same. For instance, spark-gremlin/ can run over TinkerGraph as it doesn’t > care about TinkerGraph, it cares about Partition “input splits.” By adding > this layer of information, ANY Graph can work with ANY GraphComputer or > GraphActors. I have yet to create PartitionInputRDD and PartitionInputFormat, > but that will be next and at that point, GraphComputers are agnostic to the > underlying implementation. > > So there you have it. Your thoughts on the matter would be most appreciated. > > Things still left to do: > > * We need a concept of a traversal engine. That is, something like: > - g.withEngine(SparkGraphComputer.class) // it knows its > a GraphComputer so thats the engine. > - g.withEngine(AkkaGraphActors.class) // it knows its a > GraphActors so thats the engine. > - g.withEngine(Iterator.class) // this means, > just iterate the traversal locally :). > * GraphComputer semantics are a restricted version of GraphActors > semantics. > - GraphActors becomes GraphComputer when the Partitions are > defined by vertices. > - I think I can unify the two and thus, we could have > SparkGraphActors. > > > > Here is some fun playing: > > > \,,,/ > (o o) > -----oOOo-(3)-oOOo----- > plugin activated: tinkerpop.server > plugin activated: tinkerpop.utilities > plugin activated: tinkerpop.tinkergraph > gremlin> :install org.apache.tinkerpop akka-gremlin 3.3.0-SNAPSHOT > ==>Loaded: [org.apache.tinkerpop, akka-gremlin, 3.3.0-SNAPSHOT] > gremlin> graph = TinkerFactory.createModern() > ==>tinkergraph[vertices:6 edges:6] > gremlin> graph.partitioner() > ==>partitioner[globalpartitioner:1] > gremlin> partitioner = new HashPartitioner(graph.partitioner(),3) // > lets create 3 logical partitions over TinkerGraph > ==>partitioner[hashpartitioner:3] > gremlin> g = graph.traversal().withStrategies(new > ActorProgramStrategy(AkkaGraphActors,partitioner)) // in the future > withEngine() will be used > ==>graphtraversalsource[tinkergraph[vertices:6 edges:6], actors] > gremlin> g.V().repeat(out()).times(2).values('name') > ==>lop > ==>ripple > gremlin> g.V().repeat(both()).times(2).groupCount().by(out().in().count()) > // beyond the star graph! > ==>[0:13,3:3,4:7,5:7] > gremlin> g.V().match( // > distributed pattern matching > ......1> __.as('a').out('created').as('b'), > ......2> __.as('b').has('name', 'lop'), > ......3> __.as('b').in('created').as('c'), > ......4> __.as('c').has('age', 29)). > ......5> select('a','c').by('name') > ==>[a:marko,c:marko] > ==>[a:josh,c:marko] > ==>[a:peter,c:marko] > gremlin> > > Now imagine this executing over various providers: > > 1. A sharded graph database (Titan/DSEGraph/OrientDB): the traversers > move between machines so they are always data local processing. > 2. A replicated graph database (Neo4j): logical partitions are created > so that each machine is responsible for a subgraph of the full graph (load > balancing and parallization). > 3. A single machine graph database (TinkerGraph): logical partitions > are created so that each core of the machine is responsible for a subgraph > (parallization). > > Pretty neat, eh? > > Ideas are more than welcome, > Marko. > > http://markorodriguez.com <http://markorodriguez.com/> > > > > > -- > You received this message because you are subscribed to the Google Groups > "Gremlin-users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to gremlin-users+unsubscr...@googlegroups.com > <mailto:gremlin-users+unsubscr...@googlegroups.com>. > To view this discussion on the web visit > https://groups.google.com/d/msgid/gremlin-users/98255596-3d17-4b92-a82f-3d219eaffd81%40googlegroups.com > > <https://groups.google.com/d/msgid/gremlin-users/98255596-3d17-4b92-a82f-3d219eaffd81%40googlegroups.com?utm_medium=email&utm_source=footer>. > For more options, visit https://groups.google.com/d/optout > <https://groups.google.com/d/optout>.