Hey Rahul, Aihua,

I was looking into the same thing.

The PR that you're referring to, was already included since 1.15.0
<https://github.com/apache/parquet-java/commits/apache-parquet-1.15.0>.
Iceberg currently uses Parquet 1.15.2
<https://github.com/apache/iceberg/blob/76ff67c658066bd7d05ce4ce54a1d6340ee0a899/gradle/libs.versions.toml#L80>.
I don't see anything obvious in the changelog
<https://github.com/apache/parquet-java/releases/tag/apache-parquet-1.16.0-rc2>
that might have caused the increase in size. Let me do a git bisect to find
out the PR that introduced the change.

Kind regards,
Fokko

Op di 2 sep 2025 om 14:11 schreef Rahul Sharma
<[email protected]>:

> Hi Aihua,
>
> Regarding the Iceberg failure, which parquet-java version is the test
> passing for? I suspect that the failure might be related to
> size-statistics. Could you try running the test with
> `parquet.size.statistics.enabled=false`. This flag was added in this PR
> <https://github.com/apache/parquet-java/pull/3060>.
>
> Thanks,
> Rahul
>
>
> On Tue, Sep 2, 2025 at 3:07 AM Aihua Xu <[email protected]> wrote:
>
> > Checked checksum and signature and ran unit tests.
> >
> > I'm also running the tests against Iceberg. Notice one failure
> > <
> >
> https://github.com/apache/iceberg/blob/main/spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/actions/TestRewriteDataFilesAction.java#L308
> > >
> > that
> > is from Iceberg format version 3 that is writing row lineage. Seems the
> > file size increases after the version upgrade and I haven’t yet
> pinpointed
> > the exact change causing it. But I don't think that is a blocker for this
> > release though.
> >
> > org.opentest4j.AssertionFailedError: [Did not have the expected number of
> > files]
> > expected: 20
> >  but was: 21
> > at
> >
> >
> org.apache.iceberg.spark.actions.TestRewriteDataFilesAction.shouldHaveFiles(TestRewriteDataFilesAction.java:2144)
> > at
> >
> >
> org.apache.iceberg.spark.actions.TestRewriteDataFilesAction.testBinPackAfterPartitionChange(TestRewriteDataFilesAction.java:321)
> >
> >
> > On Mon, Sep 1, 2025 at 12:16 AM Gábor Szádovszky <[email protected]>
> wrote:
> >
> > > I've checked tarball content, checksum, and signature. Executed unit
> > tests,
> > > and also some of our internal tests. All passed.
> > >
> > > +1 (binding)
> > >
> > > Gang Wu <[email protected]> ezt írta (időpont: 2025. aug. 30., Szo,
> > 8:47):
> > >
> > > > Hi everyone,
> > > >
> > > > I propose the following RC to be released as the official Apache
> > Parquet
> > > > Java 1.16.0 release.
> > > >
> > > > The commit id is 402c3810c372d29603e181771acebfecc71bef61
> > > > * This corresponds to the tag: apache-parquet-1.16.0-rc2
> > > > *
> > > >
> > > >
> > >
> >
> https://github.com/apache/parquet-java/tree/402c3810c372d29603e181771acebfecc71bef61
> > > >
> > > > The release tarball, signature, and checksums are here:
> > > > *
> > >
> https://dist.apache.org/repos/dist/dev/parquet/apache-parquet-1.16.0-rc2
> > > >
> > > > You can find the KEYS file here:
> > > > * https://downloads.apache.org/parquet/KEYS
> > > >
> > > > You can find the changelog here:
> > > > *
> > > >
> > > >
> > >
> >
> https://github.com/apache/parquet-java/releases/tag/apache-parquet-1.16.0-rc2
> > > >
> > > > Binary artifacts are staged in Nexus here:
> > > > *
> > >
> https://repository.apache.org/content/groups/staging/org/apache/parquet/
> > > >
> > > > Please download, verify, and test.
> > > >
> > > > Please vote in the next 72 hours.
> > > >
> > > > [ ] +1 Release this as Apache Parquet Java 1.16.0
> > > > [ ] +0
> > > > [ ] -1 Do not release this because...
> > > >
> > > > Thanks,
> > > > Gang
> > > >
> > >
> >
>

Reply via email to