If there is no objection, I will prepare the release candidate of parquet-format 2.11.0 and send out the vote early next week.
On Mon, Feb 17, 2025 at 8:47 PM Gang Wu <[email protected]> wrote: > Thanks Fokko for bringing this up! Yes, I can be the release manager > if the community reaches a consensus. > > Best, > Gang > > On Mon, Feb 17, 2025 at 6:58 PM Fokko Driesprong <[email protected]> wrote: > >> Hey everyone, >> >> I would love to bubble this back up to the top of our mailboxes. >> >> - For Variant, various implementations are in flight: Java in >> Parquet-Java <https://github.com/apache/parquet-java/pull/3117> and >> Iceberg-Java <https://github.com/apache/iceberg/pull/12139>, C++ >> <https://github.com/apache/arrow/pull/45375> in Arrow, Python >> < >> https://github.com/apache/spark/blob/master/python/pyspark/sql/variant_utils.py >> > >> in Spark, and the Arrow Rust community also expressed interest >> <https://github.com/apache/arrow-rs/issues/6736>. >> - For Geometry/Geography, we see a C++ PR >> <https://github.com/apache/arrow/pull/45459> in Arrow, Java in Parquet >> <https://github.com/apache/parquet-java/pull/2971>, but the vote has >> just passed last week. We also see that geo support has been added to >> Iceberg <https://github.com/apache/iceberg/pull/10981>. >> >> Both Variant and Geo have been voted for and merged in the format spec. To >> maintain momentum I think it would be good to get the thrift definitions >> and the Java convenience JAR out. >> >> Does anyone have any questions or concerns about getting this out? Gang, >> you mentioned that you would like to volunteer as release manager, are >> you still available? :) >> >> Kind regards, >> Fokko >> >> >> Op do 5 dec 2024 om 05:33 schreef Gene Pang <[email protected]>: >> >> > I see, thanks for the clarifications! >> > >> > I will work on porting the Spark Java implementation to parquet-java. >> > >> > Spark also has a (partial) python implementation for the Variant binary >> > format, but it needs a bit more work to complete. >> > >> > Thanks, >> > Gene >> > >> > On Wed, Dec 4, 2024 at 6:11 AM Andrew Lamb <[email protected]> >> wrote: >> > >> > > We also were discussing trying to implement variant in Rust[1], but it >> > was >> > > hard due to a lack of other implementations or example data to test >> > against >> > > >> > > Maybe once there is a draft POC for Java, we could whip up something >> for >> > > Rust that did the same >> > > >> > > [1]: https://github.com/apache/arrow-rs/issues/6736 >> > > >> > > On Wed, Dec 4, 2024 at 4:57 AM Gang Wu <[email protected]> wrote: >> > > >> > > > > With regards to Variant implementations, for Java, don't we need >> the >> > > > format >> > > > > released before the implementation can be provided (I thought >> > > > parquet-java >> > > > > consumed a released parquet-format jar in its build)? >> > > > >> > > > For parquet-java, usually the PoC PR is based on a locally built >> > > > parquet-format >> > > > with an unreleased version when the spec change is under review. >> Once >> > the >> > > > vote >> > > > has been passed and a new parquet-format is released, the PoC PR >> gets >> > > > rebased >> > > > on the released format for a final review. Below are some examples: >> > > > >> > > > float16: https://github.com/apache/parquet-java/pull/1142 >> > > > size stats: https://github.com/apache/parquet-java/pull/1177 >> > > > geometry: https://github.com/apache/parquet-java/pull/2971 >> > > > >> > > > Best, >> > > > Gang >> > > > >> > > > On Wed, Dec 4, 2024 at 2:57 PM Micah Kornfield < >> [email protected]> >> > > > wrote: >> > > > >> > > > > Hi Gene, >> > > > > >> > > > > Before release, I added a proposal to have a shredding version >> added >> > to >> > > > the >> > > > > annotation (https://github.com/apache/parquet-format/pull/474), >> it >> > > would >> > > > > be >> > > > > good to discuss if people think there is value in this. >> > > > > >> > > > > >> > > > > >> > > > > > However, there was a discussion [2] on the requirement of two >> PoC >> > > > > reference >> > > > > > implementations when promoting a new format change. >> > > > > >> > > > > >> > > > > With regards to Variant implementations, for Java, don't we need >> the >> > > > format >> > > > > released before the implementation can be provided (I thought >> > > > parquet-java >> > > > > consumed a released parquet-format jar in its build)? >> > > > > >> > > > > >> > > > > > However, there was a discussion [2] on the requirement of two >> PoC >> > > > > reference >> > > > > > implementations when promoting a new format change. There are >> also >> > > > > concerns >> > > > > > from the variant logical type PR [3] against parquet-java. This >> is >> > > > > > something to >> > > > > > discuss in the community if we want to make the variant type an >> > > > > exception. >> > > > > >> > > > > >> > > > > I thought the compromise we came to is that the documentation for >> > > > Variant >> > > > > states that it is still experimental (maybe we should add this as >> a >> > > > comment >> > > > > to parquet.thrift as well to make this very clear) . I was under >> the >> > > > > impression that Variant would stay experimental until the 2 >> > > > implementations >> > > > > were complete. I think we should clarify the scope of what we >> think >> > is >> > > > > acceptable for the implementations but that should probably be a >> > > separate >> > > > > thread). I also have some concerns about some current variant >> spec >> > > after >> > > > > reviewing initial spec and the proposed shredding simplification >> [1], >> > > > which >> > > > > I'll raise on a separate thread. >> > > > > >> > > > > Thanks, >> > > > > Micah >> > > > > >> > > > > [1] https://github.com/apache/parquet-format/pull/461 >> > > > > >> > > > > >> > > > > >> > > > > On Tue, Dec 3, 2024 at 10:28 PM Gang Wu <[email protected]> wrote: >> > > > > >> > > > > > Hi Gene, >> > > > > > >> > > > > > Thanks for your effort on adding variant type to the >> > parquet-format! >> > > > For >> > > > > > the next >> > > > > > release, I'd like to include the geometry type [1] as well >> which is >> > > > also >> > > > > > targeted >> > > > > > for the Iceberg V3 spec. I can volunteer to be the release >> manager. >> > > > > > >> > > > > > However, there was a discussion [2] on the requirement of two >> PoC >> > > > > reference >> > > > > > implementations when promoting a new format change. There are >> also >> > > > > concerns >> > > > > > from the variant logical type PR [3] against parquet-java. This >> is >> > > > > > something to >> > > > > > discuss in the community if we want to make the variant type an >> > > > > exception. >> > > > > > >> > > > > > [1] https://github.com/apache/parquet-format/pull/240 >> > > > > > [2] >> > https://lists.apache.org/thread/f9379yx0lf5gtpkgyv922pvowtzy4kmm >> > > > > > [3] https://github.com/apache/parquet-java/pull/3072 >> > > > > > >> > > > > > Best, >> > > > > > Gang >> > > > > > >> > > > > > On Wed, Dec 4, 2024 at 2:08 PM Gene Pang <[email protected]> >> > > wrote: >> > > > > > >> > > > > > > Hi, >> > > > > > > >> > > > > > > We updated parquet-format < >> > > https://github.com/apache/parquet-format> >> > > > > to >> > > > > > > include the Variant logical type annotation. Would someone be >> > able >> > > to >> > > > > > > release parquet-format (and create the necessary artifacts) so >> > that >> > > > > > > parquet-java can be updated to depend on the new release? This >> > > would >> > > > > > enable >> > > > > > > adding implementation in parquet-java. >> > > > > > > >> > > > > > > Thanks! >> > > > > > > Gene >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> >
