Is this the library used by DuckDB? As I've heard that it doesn't add statistics to parquet files, which is unfortunate
On Tue, 14 Jan 2025 at 15:13, Andrew Lamb <[email protected]> wrote: > I believe Ed added these statistics into parquet-rs[1] as well. We have > also enabled them by default and haven't seen any performance issues. > > Andrew > > [1] https://github.com/apache/arrow-rs/pull/6105 > > On Tue, Jan 14, 2025 at 9:38 AM Gang Wu <[email protected]> wrote: > > > Hi, > > > > The C++ Parquet implementation in the Apache Arrow (namely the > parquet-cpp) > > has > > added Page Index support since 13.0.0. Recently SizeStatistics support is > > also > > added in 19.0.0. Both features are disabled by default. We did a > benchmark > > and > > the result showed that we can enable them by default with acceptable > > penalties. > > Therefore I opened a PR [1] to turn on them by default. The benchmark > > result > > is also available in this PR. Any feedback is welcome. If there is no > > objection, > > we will merge this PR and release it with Apache Arrow 20.0.0. > > > > [1] https://github.com/apache/arrow/pull/45249 > > > > Best, > > Gang > > >
