Hey Stephen, Yes, it would need access to the custom serializers. Perhaps this is the same problem that we are having with GryoPool. GryoInput/OutputFormats use Gryo, but they don't have an easy way of getting the serializers.
?, Marko. http://markorodriguez.com On Apr 30, 2015, at 12:46 PM, Stephen Mallette <[email protected]> wrote: > It would be nice if this change could just be treated as an overload to > read/writeGraph() so in that sense it sounds good to me. I presume that > the underlying work done by the BulkLoader/DumperVertexProgram would simply > be using the existing read/writeVertex functions on the GraphReader/Writer > implementations themselves. In that way, > the BulkLoader/DumperVertexProgram would have access to any custom > serializers required by the Graph instance. > > On Thu, Apr 30, 2015 at 12:01 PM, Marko Rodriguez <[email protected]> > wrote: > >> Hi, >> >> Stephen is interested in making sure that Graph.io() works cleanly for >> both OLTP and OLAP. In particular, making sure that io().readGraph() and >> io().writeGraph() can be used in both OLTP and OLAP situations seamlessly >> much like Gremlin does for traversals. >> >> ------------ >> >> OLAP graph writing will occur via a (yet to be written) >> BulkLoaderVertexProgram. BulkLoaderVertexProgram takes a Graph (with >> vertices/edges) and writes to another Graph. In essence, two graphs, where >> the first graph has the data and the second is empty. I always expected >> this to typically happen via Hadoop (HadoopGraph) -> VendorDatabase >> (VendorGraph). However, while most distributed graph database vendors will >> leverage Hadoop/Giraph/Spark for their OLAP bulk loading operations because >> of HDFS, we can't always assume this -- especially in the context of OLAP >> Graph.io(). >> >> Thus, BulkLoaderVertexProgram shouldn't just operate on Graph->Graph, but >> can optionally stream in a file as well, File->Graph. This means we have to >> get into the concept of "InputSplits" at the gremlin-core level. A quick >> and dirty is to simply serially load the graph data from a file, this is >> not the optimal solution, but can move us forward on the Graph.io() API. >> >> To the API of Graph.io(). This would mean, like Traversal, the user can >> specify a Computer to use to do the readGraph(). >> >> graph.io().readGraph(file, graph.compute(MyGraphComputer.class)) >> >> For writeGraph() >> >> graph.io().writeGraph(file,graph.compute(MyGraphComputer.class)) >> >> >> Where, "file" can be a directory in both situations and each "worker" of >> the GraphComputer reads/writes a split. >> >> Thoughts?, >> Marko. >> >> http://markorodriguez.com >> >>
