Did you specify partitioning column while saving data..
On Sep 3, 2015 5:41 AM, "Kohki Nishio" <tarop...@gmail.com> wrote:

> Hello experts,
>
> I have a huge json file (> 40G) and trying to use Parquet as a file
> format. Each entry has a unique identifier but other than that, it doesn't
> have 'well balanced value' column to partition it. Right now it just throws
> OOM and couldn't figure out what to do with it.
>
> It would be ideal if I could provide a partitioner based on the unique
> identifier value like computing its hash value or something.  One of the
> option would be to produce a hash value and add it as a separate column,
> but it doesn't sound right to me. Is there any other ways I can try ?
>
> Regards,
> --
> Kohki Nishio
>

Reply via email to