[
https://issues.apache.org/jira/browse/PARQUET-166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14634273#comment-14634273
]
Ryan Blue commented on PARQUET-166:
-----------------------------------
The first part of this, ensuring that the row group size is less than the block
size, was added in PARQUET-306 along with row group padding. We should
determine whether the second part is worth doing. People will ignore warnings
and would not appreciate errors.
> Validate parquet row group size and HDFS block size
> ---------------------------------------------------
>
> Key: PARQUET-166
> URL: https://issues.apache.org/jira/browse/PARQUET-166
> Project: Parquet
> Issue Type: Bug
> Components: parquet-mr
> Reporter: Ryan Blue
>
> The OutputFormat should verify that {{parquet.block.size < dfs.blocksize}} to
> avoid bad performance. In addition, we could check that {{(dfs.blocksize %
> parquet.block.size) < 1MB}} to ensure that some number of row groups is
> approximately the size of an HDFS block.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)