Any code / Parquet schema to provide? I'm not sure to understand which step fails right there...
On 3 September 2015 at 04:12, Raghavendra Pandey < raghavendra.pan...@gmail.com> wrote: > Did you specify partitioning column while saving data.. > On Sep 3, 2015 5:41 AM, "Kohki Nishio" <tarop...@gmail.com> wrote: > >> Hello experts, >> >> I have a huge json file (> 40G) and trying to use Parquet as a file >> format. Each entry has a unique identifier but other than that, it doesn't >> have 'well balanced value' column to partition it. Right now it just throws >> OOM and couldn't figure out what to do with it. >> >> It would be ideal if I could provide a partitioner based on the unique >> identifier value like computing its hash value or something. One of the >> option would be to produce a hash value and add it as a separate column, >> but it doesn't sound right to me. Is there any other ways I can try ? >> >> Regards, >> -- >> Kohki Nishio >> > -- *Adrien Mogenet* Head of Backend/Infrastructure adrien.moge...@contentsquare.com (+33)6.59.16.64.22 http://www.contentsquare.com 50, avenue Montaigne - 75008 Paris