Sounds good. Anyone who is interested, please take a look at https://github.com/apache/arrow-go/pull/455 and we can discuss it there.
On Wed, Jul 30, 2025 at 2:15 PM Matt Topol <zotthewiz...@gmail.com> wrote: > Thanks Aihua, > > I'm away for the rest of the week but I'll retest on Tuesday when I > return. > > Given the discussion we're having on the PR it seems that we should > propose some clarifying updates to the spec wording to clear up some > ambiguities. Can you and anyone else take a look at that this week? > > --Matt > > On Wed, Jul 30, 2025, 5:07 PM Aihua Xu <aihu...@gmail.com> wrote: > >> Hi Matt and community, >> >> I have created the Parquet file test cases >> <https://github.com/apache/parquet-testing/pull/91/files> with the >> variant logical type on top of what Ryan created. I have verified the >> variant implementation in Parquet-Java (see >> https://github.com/apache/parquet-java/pull/3258/files). Matt, can you >> test out against the new data set in GO? >> >> Thanks, >> Aihua >> >> On Mon, Jul 28, 2025 at 2:05 PM Ryan Blue <rdb...@gmail.com> wrote: >> >>> Yes, as Aihua mentioned we know that the annotation is missing. Iceberg >>> uses released versions of Parquet so we won't be able to produce files >>> with >>> the annotation until there's a parquet-java release. This is one reason >>> why >>> I've been pushing to get the annotation released regardless of the status >>> of the spec. >>> >>> On Mon, Jul 28, 2025 at 1:29 PM Aihua Xu <aihu...@gmail.com> wrote: >>> >>> > Hi Matt, >>> > >>> > Yeah. That's expected. I mentioned in the description of >>> > https://github.com/apache/parquet-java/pull/3258. Because we haven't >>> > released Parquet-Java including the variant logical type (we can't >>> have a >>> > release until we have done the validation and finalized the spec), >>> Iceberg >>> > cannot upgrade and write the Parquet file with that. >>> > >>> > I have a workaround in the test using something below to get them >>> recognize >>> > them as a variant type. >>> > >>> > (if (type.getName().equals("var") || type.getLogicalTypeAnnotation() >>> > instanceof LogicalTypeAnnotation.VariantLogicalTypeAnnotation) >>> > >>> > I'm wondering if you can do the same in the GO test? >>> > >>> > Thanks, >>> > Aihua >>> > >>> > On Mon, Jul 28, 2025 at 11:25 AM Matt Topol <zotthewiz...@gmail.com> >>> > wrote: >>> > >>> > > @Aihua Xu <aihu...@gmail.com> @RyanBlue it appears that the parquet >>> > files >>> > > in the indicated PR ( >>> > > https://github.com/apache/parquet-testing/pull/90/files) do not >>> seem to >>> > > currently have the `Variant` logical type specified in their schema. >>> Is >>> > > this intentional? The Go implementation won't (and shouldn't) >>> recognize >>> > the >>> > > files as containing variant values without the logical type being >>> used, >>> > > instead they are simply read as structs. Should this be updated? >>> > > >>> > > Should I write something to have Go generate the files locally to >>> test >>> > > against? >>> > > >>> > > --Matt >>> > > >>> > > On Sun, Jul 27, 2025 at 5:45 PM Aihua Xu <aihu...@gmail.com> wrote: >>> > > >>> > >> Thanks Matt. >>> > >> >>> > >> >>> > >> > On Jul 27, 2025, at 2:31 PM, Matt Topol <zotthewiz...@gmail.com> >>> > wrote: >>> > >> > >>> > >> > I'll work this week on getting the Go implementation to use the >>> same >>> > >> > testing files and ensure compatibility. >>> > >> > >>> > >> >> On Sun, Jul 27, 2025, 5:28 PM Aihua Xu <aihu...@gmail.com> >>> wrote: >>> > >> >> >>> > >> >> Hi all, >>> > >> >> >>> > >> >> Following up on the test effort to validate the compatibility of >>> the >>> > >> >> Variant implementation: >>> > >> >> >>> > >> >> Ryan has contributed test cases >>> > >> >> <https://github.com/apache/parquet-testing/pull/90/files> from >>> > Iceberg >>> > >> >> (see PR >>> > >> >> #13654 <https://github.com/apache/iceberg/pull/13654>), which I >>> used >>> > >> to >>> > >> >> verify <https://github.com/apache/parquet-java/pull/3258/> the >>> > Variant >>> > >> >> implementation in Parquet-Java. The validation surfaced a few >>> minor >>> > >> issues, >>> > >> >> but overall the results confirm compatibility between the two >>> > >> >> implementations. >>> > >> >> >>> > >> >> Let me know if you have any questions or additional follow-up >>> > requests. >>> > >> >> >>> > >> >> Thanks, >>> > >> >> >>> > >> >> Aihua >>> > >> >> >>> > >> >> >>> > >> >> >>> > >> >> On Wed, Jul 23, 2025 at 2:24 AM Andrew Lamb < >>> andrewlam...@gmail.com> >>> > >> >> wrote: >>> > >> >> >>> > >> >>> I agree the parquet-testing repo should have example Parquet >>> files >>> > >> >> storing >>> > >> >>> variants. >>> > >> >>> >>> > >> >>> It was brought to my attention recently that the duckdb folks >>> made >>> > >> some >>> > >> >>> testing files[1] based on the Iceberg test suite. >>> > >> >>> >>> > >> >>> Perhaps we can add those files to parquet-testing as part of >>> [2]. >>> > >> >>> >>> > >> >>> I expect we'll get to testing the Rust shredding implementation >>> in >>> > 2-3 >>> > >> >>> weeks at which time I will likely help try and push this >>> forward. It >>> > >> >> would >>> > >> >>> be great if someone else wanted to help do it beforehand. >>> > >> >>> >>> > >> >>> Andrew >>> > >> >>> >>> > >> >>> [1]: https://github.com/duckdb/duckdb/pull/18224 >>> > >> >>> [2]: https://github.com/apache/parquet-testing/issues/75 >>> > >> >>> >>> > >> >>>> On Wed, Jul 23, 2025 at 1:14 AM Gang Wu <ust...@gmail.com> >>> wrote: >>> > >> >>> >>> > >> >>>> I was under the impression that parquet-testing does not yet >>> have >>> > >> >> Parquet >>> > >> >>>> files with variant type annotations. >>> > >> >>>> >>> > >> >>>> Is this still the case? If not, should we add some (shredded >>> and >>> > >> >>>> unshredded) files produced by Java and Go implementations? >>> > >> >>>> >>> > >> >>>> On Wed, Jul 23, 2025 at 3:18 AM Aihua Xu <aihu...@gmail.com> >>> > wrote: >>> > >> >>>> >>> > >> >>>>> Thanks Matt for the comment and working on the GO variant. >>> > >> >>>>> >>> > >> >>>>> Micah, that’s a good point. Let me check out the coverage >>> > >> >> completeness >>> > >> >>>> for >>> > >> >>>>> these two implementations. >>> > >> >>>>> >>> > >> >>>>> >>> > >> >>>>> >>> > >> >>>>>> On Jul 22, 2025, at 10:01 AM, Matt Topol < >>> zotthewiz...@gmail.com >>> > > >>> > >> >>>> wrote: >>> > >> >>>>>> >>> > >> >>>>>> Assuming that the files with variants in >>> > >> >>>>>> https://github.com/apache/parquet-testing are generated by >>> > >> >>>> parquet-java, >>> > >> >>>>>> then we at least have confirmed that the Go implementation is >>> > able >>> > >> >> to >>> > >> >>>>> read >>> > >> >>>>>> variant files that are written by the Java implementation. So >>> > >> >> there's >>> > >> >>>> at >>> > >> >>>>>> least some testing of the two implementations against each >>> other. >>> > >> >>>>>> >>> > >> >>>>>> --Matt >>> > >> >>>>>> >>> > >> >>>>>>> On Tue, Jul 22, 2025 at 12:29 AM Micah Kornfield < >>> > >> >>>> emkornfi...@gmail.com >>> > >> >>>>>> >>> > >> >>>>>>> wrote: >>> > >> >>>>>>> >>> > >> >>>>>>> Have we tested the two implementations against one another? >>> > >> >>>>>>> >>> > >> >>>>>>>> On Mon, Jul 21, 2025 at 9:14 PM Aihua Xu < >>> aihu...@gmail.com> >>> > >> >>> wrote: >>> > >> >>>>>>>> >>> > >> >>>>>>>> Hi community, >>> > >> >>>>>>>> >>> > >> >>>>>>>> Per the Parquet specification requirements, two reference >>> > >> >>>>> implementations >>> > >> >>>>>>>> are needed to finalize the Variant logical type. Both Java >>> and >>> > Go >>> > >> >>>>>>>> implementations now support variant encoding and shredding. >>> > >> >>>>>>>> >>> > >> >>>>>>>> Java already has the encoding and shredding >>> implementations in >>> > >> >>> place: >>> > >> >>>>>>>> apache/parquet-java#3197 < >>> > >> >>>>>>> https://github.com/apache/parquet-java/pull/3197 >>> > >> >>>>>>>>> >>> > >> >>>>>>>> apache/parquet-java#3202 < >>> > >> >>>>>>> https://github.com/apache/parquet-java/pull/3202 >>> > >> >>>>>>>>> >>> > >> >>>>>>>> apache/parquet-java#3223 >>> > >> >>>>>>>> <https://github.com/apache/parquet-java/issues/3223> >>> > >> >>>>>>>> apache/parquet-java#3211 >>> > >> >>>>>>>> <https://github.com/apache/parquet-java/issues/3211> >>> > >> >>>>>>>> >>> > >> >>>>>>>> Go also includes encoding and shredding support: >>> > >> >>>>>>>> apache/arrow-go#344 < >>> > https://github.com/apache/arrow-go/pull/344 >>> > >> >>> >>> > >> >>>>>>>> apache/arrow-go#434 < >>> > https://github.com/apache/arrow-go/pull/434 >>> > >> >>> >>> > >> >>>>>>>> >>> > >> >>>>>>>> I propose that we remove the "under development" notes >>> from the >>> > >> >>>>>>>> documentation and move forward with finalizing the >>> > specification >>> > >> >>> (PR >>> > >> >>>>> #509 >>> > >> >>>>>>>> <https://github.com/apache/parquet-format/pull/509>). >>> > >> >>>>>>>> This vote will be open for at least 72 hours. >>> > >> >>>>>>>> >>> > >> >>>>>>>> [ ] +1 Finalize Varint and Shredding Spec >>> > >> >>>>>>>> [ ] +0 >>> > >> >>>>>>>> [ ] -1 Do not release this because... >>> > >> >>>>>>>> >>> > >> >>>>>>> >>> > >> >>>>> >>> > >> >>>> >>> > >> >>> >>> > >> >> >>> > >> >>> > > >>> > >>> >>