Re: spark disk-to-disk

Koert Kuipers Mon, 23 Mar 2015 10:58:53 -0700

there is a way to reinstate the partitioner, but that requires
sc.objectFile to read exactly what i wrote, which means sc.objectFile
should never split files on reading (a feature of hadoop file inputformat
that gets in the way here).


On Mon, Mar 23, 2015 at 1:39 PM, Koert Kuipers <ko...@tresata.com> wrote:

> i just realized the major limitation is that i lose partitioning info...
>
> On Mon, Mar 23, 2015 at 1:34 AM, Reynold Xin <r...@databricks.com> wrote:
>
>>
>> On Sun, Mar 22, 2015 at 6:03 PM, Koert Kuipers <ko...@tresata.com> wrote:
>>
>>> so finally i can resort to:
>>> rdd.saveAsObjectFile(...)
>>> sc.objectFile(...)
>>> but that seems like a rather broken abstraction.
>>>
>>>
>> This seems like a fine solution to me.
>>
>>
>

Re: spark disk-to-disk

Reply via email to