Any code / Parquet schema to provide? I'm not sure to understand which step
fails right there...

On 3 September 2015 at 04:12, Raghavendra Pandey <
raghavendra.pan...@gmail.com> wrote:

> Did you specify partitioning column while saving data..
> On Sep 3, 2015 5:41 AM, "Kohki Nishio" <tarop...@gmail.com> wrote:
>
>> Hello experts,
>>
>> I have a huge json file (> 40G) and trying to use Parquet as a file
>> format. Each entry has a unique identifier but other than that, it doesn't
>> have 'well balanced value' column to partition it. Right now it just throws
>> OOM and couldn't figure out what to do with it.
>>
>> It would be ideal if I could provide a partitioner based on the unique
>> identifier value like computing its hash value or something.  One of the
>> option would be to produce a hash value and add it as a separate column,
>> but it doesn't sound right to me. Is there any other ways I can try ?
>>
>> Regards,
>> --
>> Kohki Nishio
>>
>


-- 

*Adrien Mogenet*
Head of Backend/Infrastructure
adrien.moge...@contentsquare.com
(+33)6.59.16.64.22
http://www.contentsquare.com
50, avenue Montaigne - 75008 Paris

Reply via email to