[
https://issues.apache.org/jira/browse/PARQUET-216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14364582#comment-14364582
]
Jacques Nadeau commented on PARQUET-216:
----------------------------------------
I think we need substantially more information on this before a decision should
be made. We haven't even scratched the possibilities with regards to page
pruning. By making the pages very small, we substantially reduce the
effectiveness of zonemap-like algorithms. I'm currently
experimenting/prototyping moving page headers to the footer and adding advanced
statistics to the page metadata. This will likely allow vast improvements over
current patterns but is only reasonable in the case that pages are reasonably
large.
I think a lot more work should go into trying to leverage the current default
page sizes because they are based on sound algorithmic grounding. Only after
we've explored that should we consider moving to a new size. In the meantime,
people can always change the option if it is helpful for their particular
workload.
> Decrease the default page size to 64k
> -------------------------------------
>
> Key: PARQUET-216
> URL: https://issues.apache.org/jira/browse/PARQUET-216
> Project: Parquet
> Issue Type: Bug
> Components: parquet-mr
> Reporter: Ryan Blue
> Fix For: 1.6.0
>
>
> Impala uses 64k for a page size and recommends smaller page sizes over the
> current 1MB default. We're considering an absolute minimum row group size of
> 1MB, so it makes sense to dial the page size down a bit.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)