Sounds good.

Anyone who is interested, please take a look at
https://github.com/apache/arrow-go/pull/455 and we can discuss it there.

On Wed, Jul 30, 2025 at 2:15 PM Matt Topol <zotthewiz...@gmail.com> wrote:

> Thanks Aihua,
>
> I'm away for the rest of the week but I'll retest on Tuesday when I
> return.
>
> Given the discussion we're having on the PR it seems that we should
> propose some clarifying updates to the spec wording to clear up some
> ambiguities. Can you and anyone else take a look at that this week?
>
> --Matt
>
> On Wed, Jul 30, 2025, 5:07 PM Aihua Xu <aihu...@gmail.com> wrote:
>
>> Hi Matt and community,
>>
>> I have created the Parquet file test cases
>> <https://github.com/apache/parquet-testing/pull/91/files> with the
>> variant logical type on top of what Ryan created. I have verified the
>> variant implementation in Parquet-Java (see
>> https://github.com/apache/parquet-java/pull/3258/files). Matt, can you
>> test out against the new data set in GO?
>>
>> Thanks,
>> Aihua
>>
>> On Mon, Jul 28, 2025 at 2:05 PM Ryan Blue <rdb...@gmail.com> wrote:
>>
>>> Yes, as Aihua mentioned we know that the annotation is missing. Iceberg
>>> uses released versions of Parquet so we won't be able to produce files
>>> with
>>> the annotation until there's a parquet-java release. This is one reason
>>> why
>>> I've been pushing to get the annotation released regardless of the status
>>> of the spec.
>>>
>>> On Mon, Jul 28, 2025 at 1:29 PM Aihua Xu <aihu...@gmail.com> wrote:
>>>
>>> > Hi Matt,
>>> >
>>> > Yeah. That's expected. I mentioned in the description of
>>> > https://github.com/apache/parquet-java/pull/3258. Because we haven't
>>> > released Parquet-Java including the variant logical type (we can't
>>> have a
>>> > release until we have done the validation and finalized the spec),
>>> Iceberg
>>> > cannot upgrade and write the Parquet file with that.
>>> >
>>> > I have a workaround in the test using something below to get them
>>> recognize
>>> > them as a variant type.
>>> >
>>> > (if (type.getName().equals("var") || type.getLogicalTypeAnnotation()
>>> > instanceof LogicalTypeAnnotation.VariantLogicalTypeAnnotation)
>>> >
>>> > I'm wondering if you can do the same in the GO test?
>>> >
>>> > Thanks,
>>> > Aihua
>>> >
>>> > On Mon, Jul 28, 2025 at 11:25 AM Matt Topol <zotthewiz...@gmail.com>
>>> > wrote:
>>> >
>>> > > @Aihua Xu <aihu...@gmail.com> @RyanBlue it appears that the parquet
>>> > files
>>> > > in the indicated PR (
>>> > > https://github.com/apache/parquet-testing/pull/90/files) do not
>>> seem to
>>> > > currently have the `Variant` logical type specified in their schema.
>>> Is
>>> > > this intentional? The Go implementation won't (and shouldn't)
>>> recognize
>>> > the
>>> > > files as containing variant values without the logical type being
>>> used,
>>> > > instead they are simply read as structs. Should this be updated?
>>> > >
>>> > > Should I write something to have Go generate the files locally to
>>> test
>>> > > against?
>>> > >
>>> > > --Matt
>>> > >
>>> > > On Sun, Jul 27, 2025 at 5:45 PM Aihua Xu <aihu...@gmail.com> wrote:
>>> > >
>>> > >> Thanks Matt.
>>> > >>
>>> > >>
>>> > >> > On Jul 27, 2025, at 2:31 PM, Matt Topol <zotthewiz...@gmail.com>
>>> > wrote:
>>> > >> >
>>> > >> > I'll work this week on getting the Go implementation to use the
>>> same
>>> > >> > testing files and ensure compatibility.
>>> > >> >
>>> > >> >> On Sun, Jul 27, 2025, 5:28 PM Aihua Xu <aihu...@gmail.com>
>>> wrote:
>>> > >> >>
>>> > >> >> Hi all,
>>> > >> >>
>>> > >> >> Following up on the test effort to validate the compatibility of
>>> the
>>> > >> >> Variant implementation:
>>> > >> >>
>>> > >> >> Ryan has contributed test cases
>>> > >> >> <https://github.com/apache/parquet-testing/pull/90/files> from
>>> > Iceberg
>>> > >> >> (see PR
>>> > >> >> #13654 <https://github.com/apache/iceberg/pull/13654>), which I
>>> used
>>> > >> to
>>> > >> >> verify <https://github.com/apache/parquet-java/pull/3258/> the
>>> > Variant
>>> > >> >> implementation in Parquet-Java. The validation surfaced a few
>>> minor
>>> > >> issues,
>>> > >> >> but overall the results confirm compatibility between the two
>>> > >> >> implementations.
>>> > >> >>
>>> > >> >> Let me know if you have any questions or additional follow-up
>>> > requests.
>>> > >> >>
>>> > >> >> Thanks,
>>> > >> >>
>>> > >> >> Aihua
>>> > >> >>
>>> > >> >>
>>> > >> >>
>>> > >> >> On Wed, Jul 23, 2025 at 2:24 AM Andrew Lamb <
>>> andrewlam...@gmail.com>
>>> > >> >> wrote:
>>> > >> >>
>>> > >> >>> I agree the parquet-testing repo should have example Parquet
>>> files
>>> > >> >> storing
>>> > >> >>> variants.
>>> > >> >>>
>>> > >> >>> It was brought to my attention recently that the duckdb folks
>>> made
>>> > >> some
>>> > >> >>> testing files[1] based on the Iceberg test suite.
>>> > >> >>>
>>> > >> >>> Perhaps we can add those files to parquet-testing as part of
>>> [2].
>>> > >> >>>
>>> > >> >>> I expect we'll get to testing the Rust shredding implementation
>>> in
>>> > 2-3
>>> > >> >>> weeks at which time I will likely help try and push this
>>> forward. It
>>> > >> >> would
>>> > >> >>> be great if someone else wanted to help do it beforehand.
>>> > >> >>>
>>> > >> >>> Andrew
>>> > >> >>>
>>> > >> >>> [1]: https://github.com/duckdb/duckdb/pull/18224
>>> > >> >>> [2]: https://github.com/apache/parquet-testing/issues/75
>>> > >> >>>
>>> > >> >>>> On Wed, Jul 23, 2025 at 1:14 AM Gang Wu <ust...@gmail.com>
>>> wrote:
>>> > >> >>>
>>> > >> >>>> I was under the impression that parquet-testing does not yet
>>> have
>>> > >> >> Parquet
>>> > >> >>>> files with variant type annotations.
>>> > >> >>>>
>>> > >> >>>> Is this still the case? If not, should we add some (shredded
>>> and
>>> > >> >>>> unshredded) files produced by Java and Go implementations?
>>> > >> >>>>
>>> > >> >>>> On Wed, Jul 23, 2025 at 3:18 AM Aihua Xu <aihu...@gmail.com>
>>> > wrote:
>>> > >> >>>>
>>> > >> >>>>> Thanks Matt for the comment and working on the GO variant.
>>> > >> >>>>>
>>> > >> >>>>> Micah, that’s a good point. Let me check out the coverage
>>> > >> >> completeness
>>> > >> >>>> for
>>> > >> >>>>> these two implementations.
>>> > >> >>>>>
>>> > >> >>>>>
>>> > >> >>>>>
>>> > >> >>>>>> On Jul 22, 2025, at 10:01 AM, Matt Topol <
>>> zotthewiz...@gmail.com
>>> > >
>>> > >> >>>> wrote:
>>> > >> >>>>>>
>>> > >> >>>>>> Assuming that the files with variants in
>>> > >> >>>>>> https://github.com/apache/parquet-testing are generated by
>>> > >> >>>> parquet-java,
>>> > >> >>>>>> then we at least have confirmed that the Go implementation is
>>> > able
>>> > >> >> to
>>> > >> >>>>> read
>>> > >> >>>>>> variant files that are written by the Java implementation. So
>>> > >> >> there's
>>> > >> >>>> at
>>> > >> >>>>>> least some testing of the two implementations against each
>>> other.
>>> > >> >>>>>>
>>> > >> >>>>>> --Matt
>>> > >> >>>>>>
>>> > >> >>>>>>> On Tue, Jul 22, 2025 at 12:29 AM Micah Kornfield <
>>> > >> >>>> emkornfi...@gmail.com
>>> > >> >>>>>>
>>> > >> >>>>>>> wrote:
>>> > >> >>>>>>>
>>> > >> >>>>>>> Have we tested the two implementations against one another?
>>> > >> >>>>>>>
>>> > >> >>>>>>>> On Mon, Jul 21, 2025 at 9:14 PM Aihua Xu <
>>> aihu...@gmail.com>
>>> > >> >>> wrote:
>>> > >> >>>>>>>>
>>> > >> >>>>>>>> Hi community,
>>> > >> >>>>>>>>
>>> > >> >>>>>>>> Per the Parquet specification requirements, two reference
>>> > >> >>>>> implementations
>>> > >> >>>>>>>> are needed to finalize the Variant logical type. Both Java
>>> and
>>> > Go
>>> > >> >>>>>>>> implementations now support variant encoding and shredding.
>>> > >> >>>>>>>>
>>> > >> >>>>>>>> Java already has the encoding and shredding
>>> implementations in
>>> > >> >>> place:
>>> > >> >>>>>>>> apache/parquet-java#3197 <
>>> > >> >>>>>>> https://github.com/apache/parquet-java/pull/3197
>>> > >> >>>>>>>>>
>>> > >> >>>>>>>> apache/parquet-java#3202 <
>>> > >> >>>>>>> https://github.com/apache/parquet-java/pull/3202
>>> > >> >>>>>>>>>
>>> > >> >>>>>>>> apache/parquet-java#3223
>>> > >> >>>>>>>> <https://github.com/apache/parquet-java/issues/3223>
>>> > >> >>>>>>>> apache/parquet-java#3211
>>> > >> >>>>>>>> <https://github.com/apache/parquet-java/issues/3211>
>>> > >> >>>>>>>>
>>> > >> >>>>>>>> Go also includes encoding and shredding support:
>>> > >> >>>>>>>> apache/arrow-go#344 <
>>> > https://github.com/apache/arrow-go/pull/344
>>> > >> >>>
>>> > >> >>>>>>>> apache/arrow-go#434 <
>>> > https://github.com/apache/arrow-go/pull/434
>>> > >> >>>
>>> > >> >>>>>>>>
>>> > >> >>>>>>>> I propose that we remove the "under development" notes
>>> from the
>>> > >> >>>>>>>> documentation and move forward with finalizing the
>>> > specification
>>> > >> >>> (PR
>>> > >> >>>>> #509
>>> > >> >>>>>>>> <https://github.com/apache/parquet-format/pull/509>).
>>> > >> >>>>>>>> This vote will be open for at least 72 hours.
>>> > >> >>>>>>>>
>>> > >> >>>>>>>> [ ] +1 Finalize Varint and Shredding Spec
>>> > >> >>>>>>>> [ ] +0
>>> > >> >>>>>>>> [ ] -1 Do not release this because...
>>> > >> >>>>>>>>
>>> > >> >>>>>>>
>>> > >> >>>>>
>>> > >> >>>>
>>> > >> >>>
>>> > >> >>
>>> > >>
>>> > >
>>> >
>>>
>>

Reply via email to