Re: GraphActors as a new distributed computing framework in TinkerPop

2016-12-14 Thread HadoopMarc
Hi Marko,

This is pretty neat indeed! It will enhance the productivity of gremlin 
query writers when queries can be copied from the fora without worries 
about the kind of backend used and whether it is an analytical (OLAP) or a 
transactional query (OLTP). I have only one idea: do traversal API users 
still really have to know whether they use a GraphComputer or GraphActors? 
In other words, can the withEngine options not just be some illuminating 
token constants for users that just want to have the traversal() returned 
(LOCAL, LOCAL_DISTRIBUTED, DISTRIBUTED)?  Of course, the more extended API 
will be useful for a minority of power users that want to optimize an 
ActorProgram for a specific use case.

Cheers,Marc

Op woensdag 14 december 2016 18:46:44 UTC+1 schreef Marko A. Rodriguez:
>
> Hello,
>
> For the last week I’ve been working on “distributed OLTP.” Gremlin has a 
> really nice architecture in that a traverser can be shipped around a 
> cluster and reattached to its respective element (vertex/edge/etc.) and 
> step (traversal) at the remote location and continue to compute. Thus, we 
> can have step-by-step query routing.
>
> https://issues.apache.org/jira/browse/TINKERPOP-1564
>
> With that, I’ve created GraphActors which is similar to GraphComputer. 
> However, there are some fundamental distinctions:
>
> 1. GraphActors assumes the boundary of computation is a Partition.
> - GraphComputer assumes the boundary of computation is vertex and its 
> incident edges and properties.
> 2. GraphActors assumes asynchronous computation with barriers at Barrier 
> steps.
> - GraphComputer assumes (sorta) synchronous computation with a barrier 
> when all traversers have left their local vertex.
> 3. GraphActors is traverser-centric and partition-bound.
> - GraphComputer is vertex-centric and vertex-bound.
>
> In gremlin-core/ I’ve created a new set of interfaces off of process/.
>
>
> https://github.com/apache/tinkerpop/tree/TINKERPOP-1564/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/actor
> GraphActors <=> GraphComputer
> MasterActor <=> setup()/terminate().
> WorkerActor <=> execute()
> ActorProgram <=> VertexProgram
>
> The parallel between GraphComputer and GraphActors are strong. In short, a 
> (hardcore) user can create an ActorProgram and submit it to a GraphActors. 
> The ActorProgram will effect a distributed, asynchronous, partition-bound 
> message passing algorithm and return a Future. There is one 
> ActorProgram in particular the executes a Gremlin traversal.
>
>
> https://github.com/apache/tinkerpop/blob/TINKERPOP-1564/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/actor/traversal/TraversalActorProgram.java
>   master actor program: 
> https://github.com/apache/tinkerpop/blob/TINKERPOP-1564/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/actor/traversal/TraversalMasterProgram.java
>   worker actors program: 
> https://github.com/apache/tinkerpop/blob/TINKERPOP-1564/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/actor/traversal/TraversalWorkerProgram.java
>
> Pretty simple. Besides some problems I’m having with serialization stuff 
> in GroupStep, the ProcessSuite passes.
>
> Now, its up to a provider to implement the GraphActors interfaces. Welcome 
> akka-gremlin/.
>
>
> https://github.com/apache/tinkerpop/blob/TINKERPOP-1564/akka-gremlin/src/main/java/org/apache/tinkerpop/gremlin/akka/process/actor/AkkaGraphActors.java
>   master actor: 
> https://github.com/apache/tinkerpop/blob/TINKERPOP-1564/akka-gremlin/src/main/java/org/apache/tinkerpop/gremlin/akka/process/actor/MasterActor.java
>   worker actor: 
> https://github.com/apache/tinkerpop/blob/TINKERPOP-1564/akka-gremlin/src/main/java/org/apache/tinkerpop/gremlin/akka/process/actor/WorkerActor.java
> mailbox system: 
> https://github.com/apache/tinkerpop/blob/TINKERPOP-1564/akka-gremlin/src/main/java/org/apache/tinkerpop/gremlin/akka/process/actor/ActorMailbox.java
>
> Dead simple!
>
> What we should have done with TinkerPop from the start is include the 
> notion of a Partition. For this branch, I’ve added two concepts Partition 
> and Partitioner.
>
>
> https://github.com/apache/tinkerpop/blob/TINKERPOP-1564/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/Partitioner.java
>
> https://github.com/apache/tinkerpop/blob/TINKERPOP-1564/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/Partition.java
>
> Why is this cool? This is cool because if GraphActors system knows the 
> partitions of the underlying Graph data, then it can immediately process 
> the Graph data in a distributed manner. No need to write a custom 
> “InputFormat.” We should have done this from the start because then 
> GraphComputer could do the same. For instance, spark-gremlin/ can run over 
> TinkerGraph as it doesn’t care about TinkerGraph, it cares about Partition 
> “input splits.” By adding this layer of information, ANY Graph can work 
> with ANY GraphComp

GraphActors as a new distributed computing framework in TinkerPop

2016-12-14 Thread Marko Rodriguez
Hello,

For the last week I’ve been working on “distributed OLTP.” Gremlin has a really 
nice architecture in that a traverser can be shipped around a cluster and 
reattached to its respective element (vertex/edge/etc.) and step (traversal) at 
the remote location and continue to compute. Thus, we can have step-by-step 
query routing.

https://issues.apache.org/jira/browse/TINKERPOP-1564 


With that, I’ve created GraphActors which is similar to GraphComputer. However, 
there are some fundamental distinctions:

1. GraphActors assumes the boundary of computation is a Partition.
- GraphComputer assumes the boundary of computation is vertex 
and its incident edges and properties.
2. GraphActors assumes asynchronous computation with barriers at 
Barrier steps.
- GraphComputer assumes (sorta) synchronous computation with a 
barrier when all traversers have left their local vertex.
3. GraphActors is traverser-centric and partition-bound.
- GraphComputer is vertex-centric and vertex-bound.

In gremlin-core/ I’ve created a new set of interfaces off of process/.


https://github.com/apache/tinkerpop/tree/TINKERPOP-1564/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/actor
 

GraphActors <=> GraphComputer
MasterActor <=> setup()/terminate().
WorkerActor <=> execute()
ActorProgram <=> VertexProgram

The parallel between GraphComputer and GraphActors are strong. In short, a 
(hardcore) user can create an ActorProgram and submit it to a GraphActors. The 
ActorProgram will effect a distributed, asynchronous, partition-bound message 
passing algorithm and return a Future. There is one ActorProgram in 
particular the executes a Gremlin traversal.


https://github.com/apache/tinkerpop/blob/TINKERPOP-1564/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/actor/traversal/TraversalActorProgram.java
 

  master actor program: 
https://github.com/apache/tinkerpop/blob/TINKERPOP-1564/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/actor/traversal/TraversalMasterProgram.java
 

  worker actors program: 
https://github.com/apache/tinkerpop/blob/TINKERPOP-1564/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/actor/traversal/TraversalWorkerProgram.java
 


Pretty simple. Besides some problems I’m having with serialization stuff in 
GroupStep, the ProcessSuite passes.

Now, its up to a provider to implement the GraphActors interfaces. Welcome 
akka-gremlin/.


https://github.com/apache/tinkerpop/blob/TINKERPOP-1564/akka-gremlin/src/main/java/org/apache/tinkerpop/gremlin/akka/process/actor/AkkaGraphActors.java
 

  master actor: 
https://github.com/apache/tinkerpop/blob/TINKERPOP-1564/akka-gremlin/src/main/java/org/apache/tinkerpop/gremlin/akka/process/actor/MasterActor.java
 

  worker actor: 
https://github.com/apache/tinkerpop/blob/TINKERPOP-1564/akka-gremlin/src/main/java/org/apache/tinkerpop/gremlin/akka/process/actor/WorkerActor.java
 

mailbox system: 
https://github.com/apache/tinkerpop/blob/TINKERPOP-1564/akka-gremlin/src/main/java/org/apache/tinkerpop/gremlin/akka/process/actor/ActorMailbox.java
 


Dead simple!

What we should have done with TinkerPop from the start is include the notion of 
a Partition. For this branch, I’ve added two concepts Partition and Partitioner.


https://github.com/apache/tinkerpop/blob/TINKERPOP-1564/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/Partitioner.java
 


Re: Open graph effort at the Linux Foundation

2016-12-14 Thread calvin . wu
+1 I'm looking forward to this new project~

On Friday, 9 December 2016 03:24:33 UTC+8, Jason Plurad wrote:
>
> Many folks in the Titan community have continued to reach out wondering 
> how to continue development on an Apache-licensed, open source, and 
> scalable graph database with pluggable backends. I want to let you know 
> that the Linux Foundation is establishing an open community graph project, 
> including developers from various backend providers, to fulfill that need. 
> The logistics for this new home are being finalized, and it will carry on 
> the open source heritage of Titan with open governance. The Apache license 
> will be maintained, and the community will operate along the same 
> principles of an Apache project. Once naming the new project is complete, 
> all are welcome to join, contribute, and drive forward this scalable graph 
> solution.
>
> -- Jason
>