I believe Ed added these statistics into parquet-rs[1] as well. We have
also enabled them by default and haven't seen any performance issues.

Andrew

[1] https://github.com/apache/arrow-rs/pull/6105

On Tue, Jan 14, 2025 at 9:38 AM Gang Wu <[email protected]> wrote:

> Hi,
>
> The C++ Parquet implementation in the Apache Arrow (namely the parquet-cpp)
> has
> added Page Index support since 13.0.0. Recently SizeStatistics support is
> also
> added in 19.0.0. Both features are disabled by default. We did a benchmark
> and
> the result showed that we can enable them by default with acceptable
> penalties.
> Therefore I opened a PR [1] to turn on them by default. The benchmark
> result
> is also available in this PR. Any feedback is welcome. If there is no
> objection,
> we will merge this PR and release it with Apache Arrow 20.0.0.
>
> [1] https://github.com/apache/arrow/pull/45249
>
> Best,
> Gang
>

Reply via email to