[ 
https://issues.apache.org/jira/browse/PARQUET-216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14364582#comment-14364582
 ] 

Jacques Nadeau commented on PARQUET-216:
----------------------------------------

I think we need substantially more information on this before a decision should 
be made.  We haven't even scratched the possibilities with regards to page 
pruning.  By making the pages very small, we substantially reduce the 
effectiveness of zonemap-like algorithms.  I'm currently 
experimenting/prototyping moving page headers to the footer and adding advanced 
statistics to the page metadata.  This will likely allow vast improvements over 
current patterns but is only reasonable in the case that pages are reasonably 
large.   

I think a lot more work should go into trying to leverage the current default 
page sizes because they are based on sound algorithmic grounding.  Only after 
we've explored that should we consider moving to a new size.  In the meantime, 
people can always change the option if it is helpful for their particular 
workload.

> Decrease the default page size to 64k
> -------------------------------------
>
>                 Key: PARQUET-216
>                 URL: https://issues.apache.org/jira/browse/PARQUET-216
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-mr
>            Reporter: Ryan Blue
>             Fix For: 1.6.0
>
>
> Impala uses 64k for a page size and recommends smaller page sizes over the 
> current 1MB default. We're considering an absolute minimum row group size of 
> 1MB, so it makes sense to dial the page size down a bit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to