Hello,
The Parquet format itself (or at least the README) recommends a 8 kiB page size, suggesting that data pages are the unit of computation. However, Parquet C++ has long chosen a 1 MiB page size by default (*), suggesting that data pages are considered as the unit of IO there. (*) even bumping it to 64 MiB at some point, perhaps by mistake: https://github.com/apache/arrow/commit/4078b876e0cc7503f4da16693ce7901a6ae503d3 What are the typical choices in other writers? Regards Antoine.
