The questions that occur to me are somewhat broad, so I apologize if they distract from the intended topic. However, I do feel they are related to a proper IO design.
Would the readGraph API be suitable for a continuously streaming loader, e.g. to parse an activity stream, or is it only used for finite inputs? Would the writeGraph API be suitable for a continuously streaming extractor, e.g. to write an external transaction log, or to synchronize a replica, or is it only used for finite outputs? What is the expected behavior when there is simultaneous access, e.g. queries occurring during readGraph, or mutations occurring during writeGraph? On Thu, Apr 30, 2015 at 9:01 AM, Marko Rodriguez <[email protected]> wrote: > Hi, > > Stephen is interested in making sure that Graph.io() works cleanly for > both OLTP and OLAP. In particular, making sure that io().readGraph() and > io().writeGraph() can be used in both OLTP and OLAP situations seamlessly > much like Gremlin does for traversals. > > ------------ > > OLAP graph writing will occur via a (yet to be written) > BulkLoaderVertexProgram. BulkLoaderVertexProgram takes a Graph (with > vertices/edges) and writes to another Graph. In essence, two graphs, where > the first graph has the data and the second is empty. I always expected > this to typically happen via Hadoop (HadoopGraph) -> VendorDatabase > (VendorGraph). However, while most distributed graph database vendors will > leverage Hadoop/Giraph/Spark for their OLAP bulk loading operations because > of HDFS, we can't always assume this -- especially in the context of OLAP > Graph.io(). > > Thus, BulkLoaderVertexProgram shouldn't just operate on Graph->Graph, but > can optionally stream in a file as well, File->Graph. This means we have to > get into the concept of "InputSplits" at the gremlin-core level. A quick > and dirty is to simply serially load the graph data from a file, this is > not the optimal solution, but can move us forward on the Graph.io() API. > > To the API of Graph.io(). This would mean, like Traversal, the user can > specify a Computer to use to do the readGraph(). > > graph.io().readGraph(file, graph.compute(MyGraphComputer.class)) > > For writeGraph() > > graph.io().writeGraph(file,graph.compute(MyGraphComputer.class)) > > > Where, "file" can be a directory in both situations and each "worker" of > the GraphComputer reads/writes a split. > > Thoughts?, > Marko. > > http://markorodriguez.com > >
