[DISCUSS] DefaultInputRDD and DefaultInputFormat

Marko Rodriguez Wed, 02 Dec 2015 12:23:54 -0800

Hello,

It is possible for us to provide a DefaultInputRDD and DefaultInputFormat to 
allow any OLTP graph system to easily load the data into Giraph/Spark/etc.


        https://issues.apache.org/jira/browse/TINKERPOP3-1015

This is a "quick and dirty" as its single threaded -- no splits. It uses 
Graph.vertices() to stream in the vertices one at a time.

Would people be interested in this feature? It would allow you to, for example, 
use Spark with Neo4j. Also, another thing we could do to make this efficient is:

        List<Iterator<Vertex>> Graph.vertexSplits(int numberOfSplits)

Then each graph provider can specify how to do parallel reads. The default 
implementation would be:
        
        List<Iterator<Vertex>> splits = new ArrayList<>(numberOfSplits);
        list.add(this.vertices());
        return splits;

Anywho…. random idea as I was doing some Spark InputRDD test suite stuff.

Take care,
Marko.

http://markorodriguez.com

[DISCUSS] DefaultInputRDD and DefaultInputFormat

Reply via email to