Hi Ryan, Thanks for the reply! The post was very useful to understand the the relationship between Parquet block size and HDFS block size. I'm currently migrating a RCFile table to a Parquet table. Right now I'm partitioning by month and prefix of a column, and I have over 500k+ partitions in total. Does it hurt performance if I have that many partitions?
Thank you! Tianqi -----Original Message----- From: Ryan Blue [mailto:[email protected]] Sent: Sunday, April 12, 2015 8:32 AM To: [email protected] Subject: Re: PARQUET_FILE_SIZE & parquet.block.size & dfs.blocksize On 04/10/2015 04:24 PM, Tianqi Tong wrote: > Hi Parquet, > Is there anywhere that I can find the documentation about the explanation and > relationships for the following configurations: > > set PARQUET_FILE_SIZE=x; > set parquet.block.size=y; > set dfs.blocksize=z; > > Right now I'm populating a table but hard to find the best configuration of > those parameters. > Thanks! > > Tianqi Tong Tianqi, Here's a post I wrote on row group and block sizes: http://ingest.tips/2015/01/31/parquet-row-group-size/ I'm not sure what PARQUET_FILE_SIZE is. What are you using to write? rb -- Ryan Blue Software Engineer Cloudera, Inc.
