Hello,
It is possible for us to provide a DefaultInputRDD and DefaultInputFormat to
allow any OLTP graph system to easily load the data into Giraph/Spark/etc.
https://issues.apache.org/jira/browse/TINKERPOP3-1015
This is a "quick and dirty" as its single threaded -- no splits. It uses
Graph.vertices() to stream in the vertices one at a time.
Would people be interested in this feature? It would allow you to, for example,
use Spark with Neo4j. Also, another thing we could do to make this efficient is:
List<Iterator<Vertex>> Graph.vertexSplits(int numberOfSplits)
Then each graph provider can specify how to do parallel reads. The default
implementation would be:
List<Iterator<Vertex>> splits = new ArrayList<>(numberOfSplits);
list.add(this.vertices());
return splits;
Anywho…. random idea as I was doing some Spark InputRDD test suite stuff.
Take care,
Marko.
http://markorodriguez.com