Hello,

The Parquet format itself (or at least the README) recommends a 8 kiB
page size, suggesting that data pages are the unit of computation.

However, Parquet C++ has long chosen a 1 MiB page size by default (*),
suggesting that data pages are considered as the unit of IO there.

(*) even bumping it to 64 MiB at some point, perhaps by mistake:
https://github.com/apache/arrow/commit/4078b876e0cc7503f4da16693ce7901a6ae503d3

What are the typical choices in other writers?

Regards

Antoine.


Reply via email to