Did you specify partitioning column while saving data.. On Sep 3, 2015 5:41 AM, "Kohki Nishio" <tarop...@gmail.com> wrote:
> Hello experts, > > I have a huge json file (> 40G) and trying to use Parquet as a file > format. Each entry has a unique identifier but other than that, it doesn't > have 'well balanced value' column to partition it. Right now it just throws > OOM and couldn't figure out what to do with it. > > It would be ideal if I could provide a partitioner based on the unique > identifier value like computing its hash value or something. One of the > option would be to produce a hash value and add it as a separate column, > but it doesn't sound right to me. Is there any other ways I can try ? > > Regards, > -- > Kohki Nishio >