[ 
https://issues.apache.org/jira/browse/PARQUET-166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14634273#comment-14634273
 ] 

Ryan Blue commented on PARQUET-166:
-----------------------------------

The first part of this, ensuring that the row group size is less than the block 
size, was added in PARQUET-306 along with row group padding. We should 
determine whether the second part is worth doing. People will ignore warnings 
and would not appreciate errors.

> Validate parquet row group size and HDFS block size
> ---------------------------------------------------
>
>                 Key: PARQUET-166
>                 URL: https://issues.apache.org/jira/browse/PARQUET-166
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-mr
>            Reporter: Ryan Blue
>
> The OutputFormat should verify that {{parquet.block.size < dfs.blocksize}} to 
> avoid bad performance. In addition, we could check that {{(dfs.blocksize % 
> parquet.block.size) < 1MB}} to ensure that some number of row groups is 
> approximately the size of an HDFS block.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to