Re: HdfsTinkerGraph

Marko Rodriguez Mon, 28 Dec 2015 10:13:11 -0800

Hi Jason,

From what I can tell, you are trying to load HDFS data into TinkerGraph. The 
means by which you are doing this is by creating an HDFSTinkerGraph which wraps 
a TinkerGraph and has some HDFS data loading capabilities in it (e.g. 
loadGraph()).


I don't think this is a good idea. If you want to load HDFS data into 
TinkerGraph, I would do it as such 

        HadoopGraph --BulkLoaderVertexProgram--> TinkerGraph.

*** Kuppitz' BulkLoaderVertexProgramTest demonstrates how to do this.

With the BulkLoaderVertexProgram model, you get a few benefits. 

        1. HadoopGraph can load any InputFormat so not just HDFS files. For 
example, it could load Spark RDDs, CassandraInputFormat, etc.
        2. We don't have "yet another graph implementation" to maintain, 
explain, and document.
        3. HdfsTinkerGraph will only scale to the size of a single machine RAM 
and thus is a bit of "toy implementation."
                - If you really want to use TinkerGraph for Hadoop-data, its a 
"corner case" that can be solved using BulkLoaderVertexProgram.
        4. If you don't want to use BulkLoaderVertexProgram, then just do 
hadoopGraph.io().writeGraph(), tinkerGraph.io().readGraph().
        
Thoughts?,
Marko.

http://markorodriguez.com

On Dec 23, 2015, at 8:26 AM, Jason Plurad <[email protected]> wrote:

> I've been playing around with this repo to enable writing a graph out to
> HDFS. I haven't tested it at scale, but it seems to work for the basic BLVP
> write graph scenario with the Grateful Dead graph.
> 
> https://github.com/pluradj/incubator-tinkerpop/commits/hdfstinkergraph
> https://github.com/pluradj/titan/tree/titan11-hdfstinkergraph
> 
> Does this approach make sense? Would it scale?
> 
> I'd appreciate any comments or feedback. Thanks!
> 
> -- Jason

Re: HdfsTinkerGraph

Reply via email to