I see, thanks for the clarifications!

I will work on porting the Spark Java implementation to parquet-java.

Spark also has a (partial) python implementation for the Variant binary
format, but it needs a bit more work to complete.

Thanks,
Gene

On Wed, Dec 4, 2024 at 6:11 AM Andrew Lamb <[email protected]> wrote:

> We also were discussing trying to implement variant in Rust[1], but it was
> hard due to a lack of other implementations or example data to test against
>
> Maybe once there is a draft POC for Java, we could whip up something for
> Rust that did the same
>
> [1]: https://github.com/apache/arrow-rs/issues/6736
>
> On Wed, Dec 4, 2024 at 4:57 AM Gang Wu <[email protected]> wrote:
>
> > > With regards to Variant implementations, for Java, don't we need the
> > format
> > > released before the implementation can be provided (I thought
> > parquet-java
> > > consumed a released parquet-format jar in its build)?
> >
> > For parquet-java, usually the PoC PR is based on a locally built
> > parquet-format
> > with an unreleased version when the spec change is under review. Once the
> > vote
> > has been passed and a new parquet-format is released, the PoC PR gets
> > rebased
> > on the released format for a final review. Below are some examples:
> >
> > float16: https://github.com/apache/parquet-java/pull/1142
> > size stats: https://github.com/apache/parquet-java/pull/1177
> > geometry: https://github.com/apache/parquet-java/pull/2971
> >
> > Best,
> > Gang
> >
> > On Wed, Dec 4, 2024 at 2:57 PM Micah Kornfield <[email protected]>
> > wrote:
> >
> > > Hi Gene,
> > >
> > > Before release, I added a proposal to have a shredding version added to
> > the
> > > annotation (https://github.com/apache/parquet-format/pull/474), it
> would
> > > be
> > > good to discuss if people think there is value in this.
> > >
> > >
> > >
> > > > However, there was a discussion [2] on the requirement of two PoC
> > > reference
> > > > implementations when promoting a new format change.
> > >
> > >
> > > With regards to Variant implementations, for Java, don't we need the
> > format
> > > released before the implementation can be provided (I thought
> > parquet-java
> > > consumed a released parquet-format jar in its build)?
> > >
> > >
> > > > However, there was a discussion [2] on the requirement of two PoC
> > > reference
> > > > implementations when promoting a new format change. There are also
> > > concerns
> > > > from the variant logical type PR [3] against parquet-java. This is
> > > > something to
> > > > discuss in the community if we want to make the variant type an
> > > exception.
> > >
> > >
> > > I thought the compromise we came to is that the documentation  for
> > Variant
> > > states that it is still experimental (maybe we should add this as a
> > comment
> > > to parquet.thrift as well to make this very clear) . I was under the
> > > impression that Variant would stay experimental until the 2
> > implementations
> > > were complete.  I think we should clarify the scope of what we think is
> > > acceptable for the implementations but that should probably be a
> separate
> > > thread).  I also have some concerns about some current variant spec
> after
> > > reviewing initial spec and the proposed shredding simplification [1],
> > which
> > > I'll raise on a separate thread.
> > >
> > > Thanks,
> > > Micah
> > >
> > > [1] https://github.com/apache/parquet-format/pull/461
> > >
> > >
> > >
> > > On Tue, Dec 3, 2024 at 10:28 PM Gang Wu <[email protected]> wrote:
> > >
> > > > Hi Gene,
> > > >
> > > > Thanks for your effort on adding variant type to the parquet-format!
> > For
> > > > the next
> > > > release, I'd like to include the geometry type [1] as well which is
> > also
> > > > targeted
> > > > for the Iceberg V3 spec. I can volunteer to be the release manager.
> > > >
> > > > However, there was a discussion [2] on the requirement of two PoC
> > > reference
> > > > implementations when promoting a new format change. There are also
> > > concerns
> > > > from the variant logical type PR [3] against parquet-java. This is
> > > > something to
> > > > discuss in the community if we want to make the variant type an
> > > exception.
> > > >
> > > > [1] https://github.com/apache/parquet-format/pull/240
> > > > [2] https://lists.apache.org/thread/f9379yx0lf5gtpkgyv922pvowtzy4kmm
> > > > [3] https://github.com/apache/parquet-java/pull/3072
> > > >
> > > > Best,
> > > > Gang
> > > >
> > > > On Wed, Dec 4, 2024 at 2:08 PM Gene Pang <[email protected]>
> wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > We updated parquet-format <
> https://github.com/apache/parquet-format>
> > > to
> > > > > include the Variant logical type annotation. Would someone be able
> to
> > > > > release parquet-format (and create the necessary artifacts) so that
> > > > > parquet-java can be updated to depend on the new release? This
> would
> > > > enable
> > > > > adding implementation in parquet-java.
> > > > >
> > > > > Thanks!
> > > > > Gene
> > > > >
> > > >
> > >
> >
>

Reply via email to