The Rust implementation uses 1MB pages by default[1]

Andrew

[1]:
https://github.com/apache/arrow-rs/blob/bd5d4a59db5d6d0e1b3bdf00644dbaf317f3be03/parquet/src/file/properties.rs#L28-L29

On Thu, May 23, 2024 at 4:10 AM Fokko Driesprong <[email protected]> wrote:

> Hey Antoine,
>
> Thanks for raising this. In Iceberg we also use the 1 MiB page size:
>
>
> https://github.com/apache/iceberg/blob/b3c25fb7608934d975a054b353823ca001ca3742/core/src/main/java/org/apache/iceberg/TableProperties.java#L133
>
> Kind regards,
> Fokko
>
> Op do 23 mei 2024 om 10:06 schreef Antoine Pitrou <[email protected]>:
>
> >
> > Hello,
> >
> > The Parquet format itself (or at least the README) recommends a 8 kiB
> > page size, suggesting that data pages are the unit of computation.
> >
> > However, Parquet C++ has long chosen a 1 MiB page size by default (*),
> > suggesting that data pages are considered as the unit of IO there.
> >
> > (*) even bumping it to 64 MiB at some point, perhaps by mistake:
> >
> >
> https://github.com/apache/arrow/commit/4078b876e0cc7503f4da16693ce7901a6ae503d3
> >
> > What are the typical choices in other writers?
> >
> > Regards
> >
> > Antoine.
> >
> >
> >
>

Reply via email to