Re: OLAP and Graph.io()

Marko Rodriguez Thu, 30 Apr 2015 11:50:31 -0700

Hey Stephen,

Yes, it would need access to the custom serializers. Perhaps this is the same 
problem that we are having with GryoPool. GryoInput/OutputFormats use Gryo, but 
they don't have an easy way of getting the serializers.


?,
Marko.

http://markorodriguez.com

On Apr 30, 2015, at 12:46 PM, Stephen Mallette <[email protected]> wrote:

> It would be nice if this change could just be treated as an overload to
> read/writeGraph() so in that sense it sounds good to me.  I presume that
> the underlying work done by the BulkLoader/DumperVertexProgram would simply
> be using the existing read/writeVertex functions on the GraphReader/Writer
> implementations themselves.  In that way,
> the BulkLoader/DumperVertexProgram would have access to any custom
> serializers required by the Graph instance.
> 
> On Thu, Apr 30, 2015 at 12:01 PM, Marko Rodriguez <[email protected]>
> wrote:
> 
>> Hi,
>> 
>> Stephen is interested in making sure that Graph.io() works cleanly for
>> both OLTP and OLAP. In particular, making sure that io().readGraph() and
>> io().writeGraph() can be used in both OLTP and OLAP situations seamlessly
>> much like Gremlin does for traversals.
>> 
>> ------------
>> 
>> OLAP graph writing will occur via a (yet to be written)
>> BulkLoaderVertexProgram. BulkLoaderVertexProgram takes a Graph (with
>> vertices/edges) and writes to another Graph. In essence, two graphs, where
>> the first graph has the data and the second is empty. I always expected
>> this to typically happen via Hadoop (HadoopGraph) -> VendorDatabase
>> (VendorGraph). However, while most distributed graph database vendors will
>> leverage Hadoop/Giraph/Spark for their OLAP bulk loading operations because
>> of HDFS, we can't always assume this -- especially in the context of OLAP
>> Graph.io().
>> 
>> Thus, BulkLoaderVertexProgram shouldn't just operate on Graph->Graph, but
>> can optionally stream in a file as well, File->Graph. This means we have to
>> get into the concept of "InputSplits" at the gremlin-core level. A quick
>> and dirty is to simply serially load the graph data from a file, this is
>> not the optimal solution, but can move us forward on the Graph.io() API.
>> 
>> To the API of Graph.io(). This would mean, like Traversal, the user can
>> specify a Computer to use to do the readGraph().
>> 
>>        graph.io().readGraph(file, graph.compute(MyGraphComputer.class))
>> 
>> For writeGraph()
>> 
>>        graph.io().writeGraph(file,graph.compute(MyGraphComputer.class))
>> 
>> 
>> Where, "file" can be a directory in both situations and each "worker" of
>> the GraphComputer reads/writes a split.
>> 
>> Thoughts?,
>> Marko.
>> 
>> http://markorodriguez.com
>> 
>>

Re: OLAP and Graph.io()

Reply via email to