Hi,
> How will this get deployed? Each database instance (alternatively
> gremlin-server) shipping a version of akka-actor and akka-cluster?
This is a good question. As I’m seeing it lately, I think we treat it just like
spark-gremlin/. That is, lets assume a multi-machine graph database:
1. User has a graph database across 3 nodes in a cluster.
2. User has Akka Cluster setup on those 3 nodes. (like they would have
SparkServer or Hadoop).
3. akka-gremlin/ “jobs” have a configuration with information about the
Akka cluster and the graph database partitions.
Thus, I don’t think GremlinServer really needs to come into play. However, I
sort of think that down the line, GremlinServer should support the spawning of
“services.” For instance, it would be great if GremlinServer, when deployed, it
could spawn a SparkServer cluster or an Akka Cluster… This removes the headache
for users having to install and configure stuff. It would be great if
GremlinServer was like a Docker or something.
bin/gremlin-server.sh —i akka.gremlin.plugin —c akka.properties
Dunno. Stephen would have more to say.
> What does it mean for performance? Here's my understanding... thoughts?
>
> 1. A sharded graph database: as long as the data is local it'll scale
> linearly, then it needs some synchronisation (i.e. hand off the traversal to
> the instance where the data is local again). I.e. there'll be a sweet spot of
> replication vs. shards for each use case.
> 2. A replicated graph database: should scale linearly for most
> traversals
> 3. A single machine graph database: should scale linearly for most
> traversals
So there will be traverser migration when a traverser no longer references data
in its current partition. That is a message pass. You don’t want just full
replication because then you aren’t load balancing your traversals across
machines. Even if you have a replicated graph database, you will want to create
logical partitions so that traversers will be forced to move between machines.
When its worth doing that or when you should just use standard iterator Gremlin
execution is a fine line… how much data will your traversal touch?
Marko.