Hi Matt and community,

I have created the Parquet file test cases
<https://github.com/apache/parquet-testing/pull/91/files> with the variant
logical type on top of what Ryan created. I have verified the variant
implementation in Parquet-Java (see
https://github.com/apache/parquet-java/pull/3258/files). Matt, can you test
out against the new data set in GO?

Thanks,
Aihua

On Mon, Jul 28, 2025 at 2:05 PM Ryan Blue <[email protected]> wrote:

> Yes, as Aihua mentioned we know that the annotation is missing. Iceberg
> uses released versions of Parquet so we won't be able to produce files with
> the annotation until there's a parquet-java release. This is one reason why
> I've been pushing to get the annotation released regardless of the status
> of the spec.
>
> On Mon, Jul 28, 2025 at 1:29 PM Aihua Xu <[email protected]> wrote:
>
> > Hi Matt,
> >
> > Yeah. That's expected. I mentioned in the description of
> > https://github.com/apache/parquet-java/pull/3258. Because we haven't
> > released Parquet-Java including the variant logical type (we can't have a
> > release until we have done the validation and finalized the spec),
> Iceberg
> > cannot upgrade and write the Parquet file with that.
> >
> > I have a workaround in the test using something below to get them
> recognize
> > them as a variant type.
> >
> > (if (type.getName().equals("var") || type.getLogicalTypeAnnotation()
> > instanceof LogicalTypeAnnotation.VariantLogicalTypeAnnotation)
> >
> > I'm wondering if you can do the same in the GO test?
> >
> > Thanks,
> > Aihua
> >
> > On Mon, Jul 28, 2025 at 11:25 AM Matt Topol <[email protected]>
> > wrote:
> >
> > > @Aihua Xu <[email protected]> @RyanBlue it appears that the parquet
> > files
> > > in the indicated PR (
> > > https://github.com/apache/parquet-testing/pull/90/files) do not seem
> to
> > > currently have the `Variant` logical type specified in their schema. Is
> > > this intentional? The Go implementation won't (and shouldn't) recognize
> > the
> > > files as containing variant values without the logical type being used,
> > > instead they are simply read as structs. Should this be updated?
> > >
> > > Should I write something to have Go generate the files locally to test
> > > against?
> > >
> > > --Matt
> > >
> > > On Sun, Jul 27, 2025 at 5:45 PM Aihua Xu <[email protected]> wrote:
> > >
> > >> Thanks Matt.
> > >>
> > >>
> > >> > On Jul 27, 2025, at 2:31 PM, Matt Topol <[email protected]>
> > wrote:
> > >> >
> > >> > I'll work this week on getting the Go implementation to use the
> same
> > >> > testing files and ensure compatibility.
> > >> >
> > >> >> On Sun, Jul 27, 2025, 5:28 PM Aihua Xu <[email protected]> wrote:
> > >> >>
> > >> >> Hi all,
> > >> >>
> > >> >> Following up on the test effort to validate the compatibility of
> the
> > >> >> Variant implementation:
> > >> >>
> > >> >> Ryan has contributed test cases
> > >> >> <https://github.com/apache/parquet-testing/pull/90/files> from
> > Iceberg
> > >> >> (see PR
> > >> >> #13654 <https://github.com/apache/iceberg/pull/13654>), which I
> used
> > >> to
> > >> >> verify <https://github.com/apache/parquet-java/pull/3258/> the
> > Variant
> > >> >> implementation in Parquet-Java. The validation surfaced a few minor
> > >> issues,
> > >> >> but overall the results confirm compatibility between the two
> > >> >> implementations.
> > >> >>
> > >> >> Let me know if you have any questions or additional follow-up
> > requests.
> > >> >>
> > >> >> Thanks,
> > >> >>
> > >> >> Aihua
> > >> >>
> > >> >>
> > >> >>
> > >> >> On Wed, Jul 23, 2025 at 2:24 AM Andrew Lamb <
> [email protected]>
> > >> >> wrote:
> > >> >>
> > >> >>> I agree the parquet-testing repo should have example Parquet files
> > >> >> storing
> > >> >>> variants.
> > >> >>>
> > >> >>> It was brought to my attention recently that the duckdb folks made
> > >> some
> > >> >>> testing files[1] based on the Iceberg test suite.
> > >> >>>
> > >> >>> Perhaps we can add those files to parquet-testing as part of [2].
> > >> >>>
> > >> >>> I expect we'll get to testing the Rust shredding implementation in
> > 2-3
> > >> >>> weeks at which time I will likely help try and push this forward.
> It
> > >> >> would
> > >> >>> be great if someone else wanted to help do it beforehand.
> > >> >>>
> > >> >>> Andrew
> > >> >>>
> > >> >>> [1]: https://github.com/duckdb/duckdb/pull/18224
> > >> >>> [2]: https://github.com/apache/parquet-testing/issues/75
> > >> >>>
> > >> >>>> On Wed, Jul 23, 2025 at 1:14 AM Gang Wu <[email protected]>
> wrote:
> > >> >>>
> > >> >>>> I was under the impression that parquet-testing does not yet have
> > >> >> Parquet
> > >> >>>> files with variant type annotations.
> > >> >>>>
> > >> >>>> Is this still the case? If not, should we add some (shredded and
> > >> >>>> unshredded) files produced by Java and Go implementations?
> > >> >>>>
> > >> >>>> On Wed, Jul 23, 2025 at 3:18 AM Aihua Xu <[email protected]>
> > wrote:
> > >> >>>>
> > >> >>>>> Thanks Matt for the comment and working on the GO variant.
> > >> >>>>>
> > >> >>>>> Micah, that’s a good point. Let me check out the coverage
> > >> >> completeness
> > >> >>>> for
> > >> >>>>> these two implementations.
> > >> >>>>>
> > >> >>>>>
> > >> >>>>>
> > >> >>>>>> On Jul 22, 2025, at 10:01 AM, Matt Topol <
> [email protected]
> > >
> > >> >>>> wrote:
> > >> >>>>>>
> > >> >>>>>> Assuming that the files with variants in
> > >> >>>>>> https://github.com/apache/parquet-testing are generated by
> > >> >>>> parquet-java,
> > >> >>>>>> then we at least have confirmed that the Go implementation is
> > able
> > >> >> to
> > >> >>>>> read
> > >> >>>>>> variant files that are written by the Java implementation. So
> > >> >> there's
> > >> >>>> at
> > >> >>>>>> least some testing of the two implementations against each
> other.
> > >> >>>>>>
> > >> >>>>>> --Matt
> > >> >>>>>>
> > >> >>>>>>> On Tue, Jul 22, 2025 at 12:29 AM Micah Kornfield <
> > >> >>>> [email protected]
> > >> >>>>>>
> > >> >>>>>>> wrote:
> > >> >>>>>>>
> > >> >>>>>>> Have we tested the two implementations against one another?
> > >> >>>>>>>
> > >> >>>>>>>> On Mon, Jul 21, 2025 at 9:14 PM Aihua Xu <[email protected]>
> > >> >>> wrote:
> > >> >>>>>>>>
> > >> >>>>>>>> Hi community,
> > >> >>>>>>>>
> > >> >>>>>>>> Per the Parquet specification requirements, two reference
> > >> >>>>> implementations
> > >> >>>>>>>> are needed to finalize the Variant logical type. Both Java
> and
> > Go
> > >> >>>>>>>> implementations now support variant encoding and shredding.
> > >> >>>>>>>>
> > >> >>>>>>>> Java already has the encoding and shredding implementations
> in
> > >> >>> place:
> > >> >>>>>>>> apache/parquet-java#3197 <
> > >> >>>>>>> https://github.com/apache/parquet-java/pull/3197
> > >> >>>>>>>>>
> > >> >>>>>>>> apache/parquet-java#3202 <
> > >> >>>>>>> https://github.com/apache/parquet-java/pull/3202
> > >> >>>>>>>>>
> > >> >>>>>>>> apache/parquet-java#3223
> > >> >>>>>>>> <https://github.com/apache/parquet-java/issues/3223>
> > >> >>>>>>>> apache/parquet-java#3211
> > >> >>>>>>>> <https://github.com/apache/parquet-java/issues/3211>
> > >> >>>>>>>>
> > >> >>>>>>>> Go also includes encoding and shredding support:
> > >> >>>>>>>> apache/arrow-go#344 <
> > https://github.com/apache/arrow-go/pull/344
> > >> >>>
> > >> >>>>>>>> apache/arrow-go#434 <
> > https://github.com/apache/arrow-go/pull/434
> > >> >>>
> > >> >>>>>>>>
> > >> >>>>>>>> I propose that we remove the "under development" notes from
> the
> > >> >>>>>>>> documentation and move forward with finalizing the
> > specification
> > >> >>> (PR
> > >> >>>>> #509
> > >> >>>>>>>> <https://github.com/apache/parquet-format/pull/509>).
> > >> >>>>>>>> This vote will be open for at least 72 hours.
> > >> >>>>>>>>
> > >> >>>>>>>> [ ] +1 Finalize Varint and Shredding Spec
> > >> >>>>>>>> [ ] +0
> > >> >>>>>>>> [ ] -1 Do not release this because...
> > >> >>>>>>>>
> > >> >>>>>>>
> > >> >>>>>
> > >> >>>>
> > >> >>>
> > >> >>
> > >>
> > >
> >
>

Reply via email to