Is this the library used by DuckDB? As I've heard that it doesn't add
statistics to parquet files, which is unfortunate

On Tue, 14 Jan 2025 at 15:13, Andrew Lamb <[email protected]> wrote:

> I believe Ed added these statistics into parquet-rs[1] as well. We have
> also enabled them by default and haven't seen any performance issues.
>
> Andrew
>
> [1] https://github.com/apache/arrow-rs/pull/6105
>
> On Tue, Jan 14, 2025 at 9:38 AM Gang Wu <[email protected]> wrote:
>
> > Hi,
> >
> > The C++ Parquet implementation in the Apache Arrow (namely the
> parquet-cpp)
> > has
> > added Page Index support since 13.0.0. Recently SizeStatistics support is
> > also
> > added in 19.0.0. Both features are disabled by default. We did a
> benchmark
> > and
> > the result showed that we can enable them by default with acceptable
> > penalties.
> > Therefore I opened a PR [1] to turn on them by default. The benchmark
> > result
> > is also available in this PR. Any feedback is welcome. If there is no
> > objection,
> > we will merge this PR and release it with Apache Arrow 20.0.0.
> >
> > [1] https://github.com/apache/arrow/pull/45249
> >
> > Best,
> > Gang
> >
>

Reply via email to