Re: [TinkerPop] GraphActors as a new distributed computing framework in TinkerPop

Marko Rodriguez Thu, 15 Dec 2016 08:48:03 -0800

Hi,

> How will this get deployed? Each database instance (alternatively 
> gremlin-server) shipping a version of akka-actor and akka-cluster?


This is a good question. As I’m seeing it lately, I think we treat it just like 
spark-gremlin/. That is, lets assume a multi-machine graph database:

        1. User has a graph database across 3 nodes in a cluster.
        2. User has Akka Cluster setup on those 3 nodes. (like they would have 
SparkServer or Hadoop).
        3. akka-gremlin/ “jobs” have a configuration with information about the 
Akka cluster and the graph database partitions.

Thus, I don’t think GremlinServer really needs to come into play. However, I 
sort of think that down the line, GremlinServer should support the spawning of 
“services.” For instance, it would be great if GremlinServer, when deployed, it 
could spawn a SparkServer cluster or an Akka Cluster… This removes the headache 
for users having to install and configure stuff. It would be great if 
GremlinServer was like a Docker or something.

        bin/gremlin-server.sh —i akka.gremlin.plugin —c akka.properties

Dunno. Stephen would have more to say.

> What does it mean for performance? Here's my understanding... thoughts?
> 
>       1. A sharded graph database: as long as the data is local it'll scale 
> linearly, then it needs some synchronisation (i.e. hand off the traversal to 
> the instance where the data is local again). I.e. there'll be a sweet spot of 
> replication vs. shards for each use case. 
>       2. A replicated graph database: should scale linearly for most 
> traversals
>       3. A single machine graph database: should scale linearly for most 
> traversals

So there will be traverser migration when a traverser no longer references data 
in its current partition. That is a message pass. You don’t want just full 
replication because then you aren’t load balancing your traversals across 
machines. Even if you have a replicated graph database, you will want to create 
logical partitions so that traversers will be forced to move between machines. 
When its worth doing that or when you should just use standard iterator Gremlin 
execution is a fine line… how much data will your traversal touch?

Marko.

Re: [TinkerPop] GraphActors as a new distributed computing framework in TinkerPop

Reply via email to