I believe Ed added these statistics into parquet-rs[1] as well. We have also enabled them by default and haven't seen any performance issues.
Andrew [1] https://github.com/apache/arrow-rs/pull/6105 On Tue, Jan 14, 2025 at 9:38 AM Gang Wu <[email protected]> wrote: > Hi, > > The C++ Parquet implementation in the Apache Arrow (namely the parquet-cpp) > has > added Page Index support since 13.0.0. Recently SizeStatistics support is > also > added in 19.0.0. Both features are disabled by default. We did a benchmark > and > the result showed that we can enable them by default with acceptable > penalties. > Therefore I opened a PR [1] to turn on them by default. The benchmark > result > is also available in this PR. Any feedback is welcome. If there is no > objection, > we will merge this PR and release it with Apache Arrow 20.0.0. > > [1] https://github.com/apache/arrow/pull/45249 > > Best, > Gang >
