Re: Parquet OOME Java heap space on Large table on Insert Overwrite

Ryan Blue Wed, 29 Oct 2014 14:14:15 -0700

On 10/27/2014 12:20 PM, Suraj Nayak wrote:

On Tue, Oct 28, 2014 at 12:14 AM, Suraj Nayak <[email protected]
<mailto:[email protected]>> wrote:


    Hi Ryan,

    Thanks for the detailed info on total memory used.

    The output table is partitioned by 2 columns. 1 column has 2 output
    partition and other column has 187 partitions.

    parquet.block.size is default. I have not specified it anywhere. If
    you can help me get the exact value, it will be helpful.

    Am interested to know the tools for getting this done. Kindly share
    it :)


Suraj,

The parquet.block.size should be 128MB if you've not changed it. You canalways find this value in the configuration properties of your MR job(or the underlying job in the tracker if you're using Hive).

If you're comfortable writing your own MR job to do this conversion,then that works. You would just create keys from the data the match thepartition scheme you're using with Hive. Your mapper creates the key fora record and writes the (key, record) pair. Then the reducer just writesall of the values it receives.

If you don't want to do this yourself, you can take a look at Kite,which has this already built so that you can call it from a command-lineinterface [1].

rb

[1]:http://kitesdk.org/docs/current/guide/Using-the-Kite-CLI-to-Create-a-Dataset/



--
Ryan Blue
Software Engineer
Cloudera, Inc.

Re: Parquet OOME Java heap space on Large table on Insert Overwrite

Reply via email to