If there is no objection, I will prepare the release candidate of
parquet-format 2.11.0 and send out the vote early next week.

On Mon, Feb 17, 2025 at 8:47 PM Gang Wu <[email protected]> wrote:

> Thanks Fokko for bringing this up! Yes, I can be the release manager
> if the community reaches a consensus.
>
> Best,
> Gang
>
> On Mon, Feb 17, 2025 at 6:58 PM Fokko Driesprong <[email protected]> wrote:
>
>> Hey everyone,
>>
>> I would love to bubble this back up to the top of our mailboxes.
>>
>>    - For Variant, various implementations are in flight: Java in
>>    Parquet-Java <https://github.com/apache/parquet-java/pull/3117> and
>>    Iceberg-Java <https://github.com/apache/iceberg/pull/12139>, C++
>>    <https://github.com/apache/arrow/pull/45375> in Arrow, Python
>>    <
>> https://github.com/apache/spark/blob/master/python/pyspark/sql/variant_utils.py
>> >
>>    in Spark, and the Arrow Rust community also expressed interest
>>    <https://github.com/apache/arrow-rs/issues/6736>.
>>    - For Geometry/Geography, we see a C++ PR
>>    <https://github.com/apache/arrow/pull/45459> in Arrow, Java in Parquet
>>    <https://github.com/apache/parquet-java/pull/2971>, but the vote has
>>    just passed last week. We also see that geo support has been added to
>>    Iceberg <https://github.com/apache/iceberg/pull/10981>.
>>
>> Both Variant and Geo have been voted for and merged in the format spec. To
>> maintain momentum I think it would be good to get the thrift definitions
>> and the Java convenience JAR out.
>>
>> Does anyone have any questions or concerns about getting this out? Gang,
>> you mentioned that you would like to volunteer as release manager, are
>> you still available? :)
>>
>> Kind regards,
>> Fokko
>>
>>
>> Op do 5 dec 2024 om 05:33 schreef Gene Pang <[email protected]>:
>>
>> > I see, thanks for the clarifications!
>> >
>> > I will work on porting the Spark Java implementation to parquet-java.
>> >
>> > Spark also has a (partial) python implementation for the Variant binary
>> > format, but it needs a bit more work to complete.
>> >
>> > Thanks,
>> > Gene
>> >
>> > On Wed, Dec 4, 2024 at 6:11 AM Andrew Lamb <[email protected]>
>> wrote:
>> >
>> > > We also were discussing trying to implement variant in Rust[1], but it
>> > was
>> > > hard due to a lack of other implementations or example data to test
>> > against
>> > >
>> > > Maybe once there is a draft POC for Java, we could whip up something
>> for
>> > > Rust that did the same
>> > >
>> > > [1]: https://github.com/apache/arrow-rs/issues/6736
>> > >
>> > > On Wed, Dec 4, 2024 at 4:57 AM Gang Wu <[email protected]> wrote:
>> > >
>> > > > > With regards to Variant implementations, for Java, don't we need
>> the
>> > > > format
>> > > > > released before the implementation can be provided (I thought
>> > > > parquet-java
>> > > > > consumed a released parquet-format jar in its build)?
>> > > >
>> > > > For parquet-java, usually the PoC PR is based on a locally built
>> > > > parquet-format
>> > > > with an unreleased version when the spec change is under review.
>> Once
>> > the
>> > > > vote
>> > > > has been passed and a new parquet-format is released, the PoC PR
>> gets
>> > > > rebased
>> > > > on the released format for a final review. Below are some examples:
>> > > >
>> > > > float16: https://github.com/apache/parquet-java/pull/1142
>> > > > size stats: https://github.com/apache/parquet-java/pull/1177
>> > > > geometry: https://github.com/apache/parquet-java/pull/2971
>> > > >
>> > > > Best,
>> > > > Gang
>> > > >
>> > > > On Wed, Dec 4, 2024 at 2:57 PM Micah Kornfield <
>> [email protected]>
>> > > > wrote:
>> > > >
>> > > > > Hi Gene,
>> > > > >
>> > > > > Before release, I added a proposal to have a shredding version
>> added
>> > to
>> > > > the
>> > > > > annotation (https://github.com/apache/parquet-format/pull/474),
>> it
>> > > would
>> > > > > be
>> > > > > good to discuss if people think there is value in this.
>> > > > >
>> > > > >
>> > > > >
>> > > > > > However, there was a discussion [2] on the requirement of two
>> PoC
>> > > > > reference
>> > > > > > implementations when promoting a new format change.
>> > > > >
>> > > > >
>> > > > > With regards to Variant implementations, for Java, don't we need
>> the
>> > > > format
>> > > > > released before the implementation can be provided (I thought
>> > > > parquet-java
>> > > > > consumed a released parquet-format jar in its build)?
>> > > > >
>> > > > >
>> > > > > > However, there was a discussion [2] on the requirement of two
>> PoC
>> > > > > reference
>> > > > > > implementations when promoting a new format change. There are
>> also
>> > > > > concerns
>> > > > > > from the variant logical type PR [3] against parquet-java. This
>> is
>> > > > > > something to
>> > > > > > discuss in the community if we want to make the variant type an
>> > > > > exception.
>> > > > >
>> > > > >
>> > > > > I thought the compromise we came to is that the documentation  for
>> > > > Variant
>> > > > > states that it is still experimental (maybe we should add this as
>> a
>> > > > comment
>> > > > > to parquet.thrift as well to make this very clear) . I was under
>> the
>> > > > > impression that Variant would stay experimental until the 2
>> > > > implementations
>> > > > > were complete.  I think we should clarify the scope of what we
>> think
>> > is
>> > > > > acceptable for the implementations but that should probably be a
>> > > separate
>> > > > > thread).  I also have some concerns about some current variant
>> spec
>> > > after
>> > > > > reviewing initial spec and the proposed shredding simplification
>> [1],
>> > > > which
>> > > > > I'll raise on a separate thread.
>> > > > >
>> > > > > Thanks,
>> > > > > Micah
>> > > > >
>> > > > > [1] https://github.com/apache/parquet-format/pull/461
>> > > > >
>> > > > >
>> > > > >
>> > > > > On Tue, Dec 3, 2024 at 10:28 PM Gang Wu <[email protected]> wrote:
>> > > > >
>> > > > > > Hi Gene,
>> > > > > >
>> > > > > > Thanks for your effort on adding variant type to the
>> > parquet-format!
>> > > > For
>> > > > > > the next
>> > > > > > release, I'd like to include the geometry type [1] as well
>> which is
>> > > > also
>> > > > > > targeted
>> > > > > > for the Iceberg V3 spec. I can volunteer to be the release
>> manager.
>> > > > > >
>> > > > > > However, there was a discussion [2] on the requirement of two
>> PoC
>> > > > > reference
>> > > > > > implementations when promoting a new format change. There are
>> also
>> > > > > concerns
>> > > > > > from the variant logical type PR [3] against parquet-java. This
>> is
>> > > > > > something to
>> > > > > > discuss in the community if we want to make the variant type an
>> > > > > exception.
>> > > > > >
>> > > > > > [1] https://github.com/apache/parquet-format/pull/240
>> > > > > > [2]
>> > https://lists.apache.org/thread/f9379yx0lf5gtpkgyv922pvowtzy4kmm
>> > > > > > [3] https://github.com/apache/parquet-java/pull/3072
>> > > > > >
>> > > > > > Best,
>> > > > > > Gang
>> > > > > >
>> > > > > > On Wed, Dec 4, 2024 at 2:08 PM Gene Pang <[email protected]>
>> > > wrote:
>> > > > > >
>> > > > > > > Hi,
>> > > > > > >
>> > > > > > > We updated parquet-format <
>> > > https://github.com/apache/parquet-format>
>> > > > > to
>> > > > > > > include the Variant logical type annotation. Would someone be
>> > able
>> > > to
>> > > > > > > release parquet-format (and create the necessary artifacts) so
>> > that
>> > > > > > > parquet-java can be updated to depend on the new release? This
>> > > would
>> > > > > > enable
>> > > > > > > adding implementation in parquet-java.
>> > > > > > >
>> > > > > > > Thanks!
>> > > > > > > Gene
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>

Reply via email to