Hi Matt and community, I have created the Parquet file test cases <https://github.com/apache/parquet-testing/pull/91/files> with the variant logical type on top of what Ryan created. I have verified the variant implementation in Parquet-Java (see https://github.com/apache/parquet-java/pull/3258/files). Matt, can you test out against the new data set in GO?
Thanks, Aihua On Mon, Jul 28, 2025 at 2:05 PM Ryan Blue <[email protected]> wrote: > Yes, as Aihua mentioned we know that the annotation is missing. Iceberg > uses released versions of Parquet so we won't be able to produce files with > the annotation until there's a parquet-java release. This is one reason why > I've been pushing to get the annotation released regardless of the status > of the spec. > > On Mon, Jul 28, 2025 at 1:29 PM Aihua Xu <[email protected]> wrote: > > > Hi Matt, > > > > Yeah. That's expected. I mentioned in the description of > > https://github.com/apache/parquet-java/pull/3258. Because we haven't > > released Parquet-Java including the variant logical type (we can't have a > > release until we have done the validation and finalized the spec), > Iceberg > > cannot upgrade and write the Parquet file with that. > > > > I have a workaround in the test using something below to get them > recognize > > them as a variant type. > > > > (if (type.getName().equals("var") || type.getLogicalTypeAnnotation() > > instanceof LogicalTypeAnnotation.VariantLogicalTypeAnnotation) > > > > I'm wondering if you can do the same in the GO test? > > > > Thanks, > > Aihua > > > > On Mon, Jul 28, 2025 at 11:25 AM Matt Topol <[email protected]> > > wrote: > > > > > @Aihua Xu <[email protected]> @RyanBlue it appears that the parquet > > files > > > in the indicated PR ( > > > https://github.com/apache/parquet-testing/pull/90/files) do not seem > to > > > currently have the `Variant` logical type specified in their schema. Is > > > this intentional? The Go implementation won't (and shouldn't) recognize > > the > > > files as containing variant values without the logical type being used, > > > instead they are simply read as structs. Should this be updated? > > > > > > Should I write something to have Go generate the files locally to test > > > against? > > > > > > --Matt > > > > > > On Sun, Jul 27, 2025 at 5:45 PM Aihua Xu <[email protected]> wrote: > > > > > >> Thanks Matt. > > >> > > >> > > >> > On Jul 27, 2025, at 2:31 PM, Matt Topol <[email protected]> > > wrote: > > >> > > > >> > I'll work this week on getting the Go implementation to use the > same > > >> > testing files and ensure compatibility. > > >> > > > >> >> On Sun, Jul 27, 2025, 5:28 PM Aihua Xu <[email protected]> wrote: > > >> >> > > >> >> Hi all, > > >> >> > > >> >> Following up on the test effort to validate the compatibility of > the > > >> >> Variant implementation: > > >> >> > > >> >> Ryan has contributed test cases > > >> >> <https://github.com/apache/parquet-testing/pull/90/files> from > > Iceberg > > >> >> (see PR > > >> >> #13654 <https://github.com/apache/iceberg/pull/13654>), which I > used > > >> to > > >> >> verify <https://github.com/apache/parquet-java/pull/3258/> the > > Variant > > >> >> implementation in Parquet-Java. The validation surfaced a few minor > > >> issues, > > >> >> but overall the results confirm compatibility between the two > > >> >> implementations. > > >> >> > > >> >> Let me know if you have any questions or additional follow-up > > requests. > > >> >> > > >> >> Thanks, > > >> >> > > >> >> Aihua > > >> >> > > >> >> > > >> >> > > >> >> On Wed, Jul 23, 2025 at 2:24 AM Andrew Lamb < > [email protected]> > > >> >> wrote: > > >> >> > > >> >>> I agree the parquet-testing repo should have example Parquet files > > >> >> storing > > >> >>> variants. > > >> >>> > > >> >>> It was brought to my attention recently that the duckdb folks made > > >> some > > >> >>> testing files[1] based on the Iceberg test suite. > > >> >>> > > >> >>> Perhaps we can add those files to parquet-testing as part of [2]. > > >> >>> > > >> >>> I expect we'll get to testing the Rust shredding implementation in > > 2-3 > > >> >>> weeks at which time I will likely help try and push this forward. > It > > >> >> would > > >> >>> be great if someone else wanted to help do it beforehand. > > >> >>> > > >> >>> Andrew > > >> >>> > > >> >>> [1]: https://github.com/duckdb/duckdb/pull/18224 > > >> >>> [2]: https://github.com/apache/parquet-testing/issues/75 > > >> >>> > > >> >>>> On Wed, Jul 23, 2025 at 1:14 AM Gang Wu <[email protected]> > wrote: > > >> >>> > > >> >>>> I was under the impression that parquet-testing does not yet have > > >> >> Parquet > > >> >>>> files with variant type annotations. > > >> >>>> > > >> >>>> Is this still the case? If not, should we add some (shredded and > > >> >>>> unshredded) files produced by Java and Go implementations? > > >> >>>> > > >> >>>> On Wed, Jul 23, 2025 at 3:18 AM Aihua Xu <[email protected]> > > wrote: > > >> >>>> > > >> >>>>> Thanks Matt for the comment and working on the GO variant. > > >> >>>>> > > >> >>>>> Micah, that’s a good point. Let me check out the coverage > > >> >> completeness > > >> >>>> for > > >> >>>>> these two implementations. > > >> >>>>> > > >> >>>>> > > >> >>>>> > > >> >>>>>> On Jul 22, 2025, at 10:01 AM, Matt Topol < > [email protected] > > > > > >> >>>> wrote: > > >> >>>>>> > > >> >>>>>> Assuming that the files with variants in > > >> >>>>>> https://github.com/apache/parquet-testing are generated by > > >> >>>> parquet-java, > > >> >>>>>> then we at least have confirmed that the Go implementation is > > able > > >> >> to > > >> >>>>> read > > >> >>>>>> variant files that are written by the Java implementation. So > > >> >> there's > > >> >>>> at > > >> >>>>>> least some testing of the two implementations against each > other. > > >> >>>>>> > > >> >>>>>> --Matt > > >> >>>>>> > > >> >>>>>>> On Tue, Jul 22, 2025 at 12:29 AM Micah Kornfield < > > >> >>>> [email protected] > > >> >>>>>> > > >> >>>>>>> wrote: > > >> >>>>>>> > > >> >>>>>>> Have we tested the two implementations against one another? > > >> >>>>>>> > > >> >>>>>>>> On Mon, Jul 21, 2025 at 9:14 PM Aihua Xu <[email protected]> > > >> >>> wrote: > > >> >>>>>>>> > > >> >>>>>>>> Hi community, > > >> >>>>>>>> > > >> >>>>>>>> Per the Parquet specification requirements, two reference > > >> >>>>> implementations > > >> >>>>>>>> are needed to finalize the Variant logical type. Both Java > and > > Go > > >> >>>>>>>> implementations now support variant encoding and shredding. > > >> >>>>>>>> > > >> >>>>>>>> Java already has the encoding and shredding implementations > in > > >> >>> place: > > >> >>>>>>>> apache/parquet-java#3197 < > > >> >>>>>>> https://github.com/apache/parquet-java/pull/3197 > > >> >>>>>>>>> > > >> >>>>>>>> apache/parquet-java#3202 < > > >> >>>>>>> https://github.com/apache/parquet-java/pull/3202 > > >> >>>>>>>>> > > >> >>>>>>>> apache/parquet-java#3223 > > >> >>>>>>>> <https://github.com/apache/parquet-java/issues/3223> > > >> >>>>>>>> apache/parquet-java#3211 > > >> >>>>>>>> <https://github.com/apache/parquet-java/issues/3211> > > >> >>>>>>>> > > >> >>>>>>>> Go also includes encoding and shredding support: > > >> >>>>>>>> apache/arrow-go#344 < > > https://github.com/apache/arrow-go/pull/344 > > >> >>> > > >> >>>>>>>> apache/arrow-go#434 < > > https://github.com/apache/arrow-go/pull/434 > > >> >>> > > >> >>>>>>>> > > >> >>>>>>>> I propose that we remove the "under development" notes from > the > > >> >>>>>>>> documentation and move forward with finalizing the > > specification > > >> >>> (PR > > >> >>>>> #509 > > >> >>>>>>>> <https://github.com/apache/parquet-format/pull/509>). > > >> >>>>>>>> This vote will be open for at least 72 hours. > > >> >>>>>>>> > > >> >>>>>>>> [ ] +1 Finalize Varint and Shredding Spec > > >> >>>>>>>> [ ] +0 > > >> >>>>>>>> [ ] -1 Do not release this because... > > >> >>>>>>>> > > >> >>>>>>> > > >> >>>>> > > >> >>>> > > >> >>> > > >> >> > > >> > > > > > >
