Hi Jason,
From what I can tell, you are trying to load HDFS data into TinkerGraph. The
means by which you are doing this is by creating an HDFSTinkerGraph which wraps
a TinkerGraph and has some HDFS data loading capabilities in it (e.g.
loadGraph()).
I don't think this is a good idea. If you want to load HDFS data into
TinkerGraph, I would do it as such
HadoopGraph --BulkLoaderVertexProgram--> TinkerGraph.
*** Kuppitz' BulkLoaderVertexProgramTest demonstrates how to do this.
With the BulkLoaderVertexProgram model, you get a few benefits.
1. HadoopGraph can load any InputFormat so not just HDFS files. For
example, it could load Spark RDDs, CassandraInputFormat, etc.
2. We don't have "yet another graph implementation" to maintain,
explain, and document.
3. HdfsTinkerGraph will only scale to the size of a single machine RAM
and thus is a bit of "toy implementation."
- If you really want to use TinkerGraph for Hadoop-data, its a
"corner case" that can be solved using BulkLoaderVertexProgram.
4. If you don't want to use BulkLoaderVertexProgram, then just do
hadoopGraph.io().writeGraph(), tinkerGraph.io().readGraph().
Thoughts?,
Marko.
http://markorodriguez.com
On Dec 23, 2015, at 8:26 AM, Jason Plurad <[email protected]> wrote:
> I've been playing around with this repo to enable writing a graph out to
> HDFS. I haven't tested it at scale, but it seems to work for the basic BLVP
> write graph scenario with the Grateful Dead graph.
>
> https://github.com/pluradj/incubator-tinkerpop/commits/hdfstinkergraph
> https://github.com/pluradj/titan/tree/titan11-hdfstinkergraph
>
> Does this approach make sense? Would it scale?
>
> I'd appreciate any comments or feedback. Thanks!
>
> -- Jason