Re: [DISCUSS] DefaultInputRDD and DefaultInputFormat

Ran Magen Thu, 03 Dec 2015 17:21:05 -0800

After digging some more in the code, I retract my ill-informed question.

Apologies,
Ran



On Thu, 3 Dec 2015 at 23:11 Ran Magen <[email protected]> wrote:

> This would be great for me.
> In Unopop we want to enable running heavy queries in a distributed manner.
> We figured we could implement some kind of UnipopSparkComputer that
> utilizes the current Spark implementation, but from a quick check we didn't
> find an obvious way to do that.
>
> Might DefaultInputRDD be a good solution for us?
>
> Cheers,
> Ran
>
> On Wed, 2 Dec 2015 at 22:23 Marko Rodriguez <[email protected]> wrote:
>
>> Hello,
>>
>> It is possible for us to provide a DefaultInputRDD and DefaultInputFormat
>> to allow any OLTP graph system to easily load the data into
>> Giraph/Spark/etc.
>>
>>         https://issues.apache.org/jira/browse/TINKERPOP3-1015
>>
>> This is a "quick and dirty" as its single threaded -- no splits. It uses
>> Graph.vertices() to stream in the vertices one at a time.
>>
>> Would people be interested in this feature? It would allow you to, for
>> example, use Spark with Neo4j. Also, another thing we could do to make this
>> efficient is:
>>
>>         List<Iterator<Vertex>> Graph.vertexSplits(int numberOfSplits)
>>
>> Then each graph provider can specify how to do parallel reads. The
>> default implementation would be:
>>
>>         List<Iterator<Vertex>> splits = new ArrayList<>(numberOfSplits);
>>         list.add(this.vertices());
>>         return splits;
>>
>> Anywho…. random idea as I was doing some Spark InputRDD test suite stuff.
>>
>> Take care,
>> Marko.
>>
>> http://markorodriguez.com
>>
>>

Re: [DISCUSS] DefaultInputRDD and DefaultInputFormat

Reply via email to