there is a way to reinstate the partitioner, but that requires sc.objectFile to read exactly what i wrote, which means sc.objectFile should never split files on reading (a feature of hadoop file inputformat that gets in the way here).
On Mon, Mar 23, 2015 at 1:39 PM, Koert Kuipers <ko...@tresata.com> wrote: > i just realized the major limitation is that i lose partitioning info... > > On Mon, Mar 23, 2015 at 1:34 AM, Reynold Xin <r...@databricks.com> wrote: > >> >> On Sun, Mar 22, 2015 at 6:03 PM, Koert Kuipers <ko...@tresata.com> wrote: >> >>> so finally i can resort to: >>> rdd.saveAsObjectFile(...) >>> sc.objectFile(...) >>> but that seems like a rather broken abstraction. >>> >>> >> This seems like a fine solution to me. >> >> >