Hi Ryan,
Thanks for the reply!
The post was very useful to understand the the relationship between Parquet 
block size and HDFS block size.
I'm currently migrating a RCFile table to a Parquet table. Right now I'm 
partitioning by month and prefix of a column, and I have over 500k+ partitions 
in total. Does it hurt performance if I have that many partitions?

Thank you!
Tianqi

-----Original Message-----
From: Ryan Blue [mailto:[email protected]] 
Sent: Sunday, April 12, 2015 8:32 AM
To: [email protected]
Subject: Re: PARQUET_FILE_SIZE & parquet.block.size & dfs.blocksize

On 04/10/2015 04:24 PM, Tianqi Tong wrote:
> Hi Parquet,
> Is there anywhere that I can find the documentation about the explanation and 
> relationships for the following configurations:
>
> set PARQUET_FILE_SIZE=x;
> set parquet.block.size=y;
> set dfs.blocksize=z;
>
> Right now I'm populating a table but hard to find the best configuration of 
> those parameters.
> Thanks!
>
> Tianqi Tong

Tianqi,

Here's a post I wrote on row group and block sizes:

   http://ingest.tips/2015/01/31/parquet-row-group-size/

I'm not sure what PARQUET_FILE_SIZE is. What are you using to write?

rb


-- 
Ryan Blue
Software Engineer
Cloudera, Inc.

Reply via email to