I see, thanks for the clarifications! I will work on porting the Spark Java implementation to parquet-java.
Spark also has a (partial) python implementation for the Variant binary format, but it needs a bit more work to complete. Thanks, Gene On Wed, Dec 4, 2024 at 6:11 AM Andrew Lamb <[email protected]> wrote: > We also were discussing trying to implement variant in Rust[1], but it was > hard due to a lack of other implementations or example data to test against > > Maybe once there is a draft POC for Java, we could whip up something for > Rust that did the same > > [1]: https://github.com/apache/arrow-rs/issues/6736 > > On Wed, Dec 4, 2024 at 4:57 AM Gang Wu <[email protected]> wrote: > > > > With regards to Variant implementations, for Java, don't we need the > > format > > > released before the implementation can be provided (I thought > > parquet-java > > > consumed a released parquet-format jar in its build)? > > > > For parquet-java, usually the PoC PR is based on a locally built > > parquet-format > > with an unreleased version when the spec change is under review. Once the > > vote > > has been passed and a new parquet-format is released, the PoC PR gets > > rebased > > on the released format for a final review. Below are some examples: > > > > float16: https://github.com/apache/parquet-java/pull/1142 > > size stats: https://github.com/apache/parquet-java/pull/1177 > > geometry: https://github.com/apache/parquet-java/pull/2971 > > > > Best, > > Gang > > > > On Wed, Dec 4, 2024 at 2:57 PM Micah Kornfield <[email protected]> > > wrote: > > > > > Hi Gene, > > > > > > Before release, I added a proposal to have a shredding version added to > > the > > > annotation (https://github.com/apache/parquet-format/pull/474), it > would > > > be > > > good to discuss if people think there is value in this. > > > > > > > > > > > > > However, there was a discussion [2] on the requirement of two PoC > > > reference > > > > implementations when promoting a new format change. > > > > > > > > > With regards to Variant implementations, for Java, don't we need the > > format > > > released before the implementation can be provided (I thought > > parquet-java > > > consumed a released parquet-format jar in its build)? > > > > > > > > > > However, there was a discussion [2] on the requirement of two PoC > > > reference > > > > implementations when promoting a new format change. There are also > > > concerns > > > > from the variant logical type PR [3] against parquet-java. This is > > > > something to > > > > discuss in the community if we want to make the variant type an > > > exception. > > > > > > > > > I thought the compromise we came to is that the documentation for > > Variant > > > states that it is still experimental (maybe we should add this as a > > comment > > > to parquet.thrift as well to make this very clear) . I was under the > > > impression that Variant would stay experimental until the 2 > > implementations > > > were complete. I think we should clarify the scope of what we think is > > > acceptable for the implementations but that should probably be a > separate > > > thread). I also have some concerns about some current variant spec > after > > > reviewing initial spec and the proposed shredding simplification [1], > > which > > > I'll raise on a separate thread. > > > > > > Thanks, > > > Micah > > > > > > [1] https://github.com/apache/parquet-format/pull/461 > > > > > > > > > > > > On Tue, Dec 3, 2024 at 10:28 PM Gang Wu <[email protected]> wrote: > > > > > > > Hi Gene, > > > > > > > > Thanks for your effort on adding variant type to the parquet-format! > > For > > > > the next > > > > release, I'd like to include the geometry type [1] as well which is > > also > > > > targeted > > > > for the Iceberg V3 spec. I can volunteer to be the release manager. > > > > > > > > However, there was a discussion [2] on the requirement of two PoC > > > reference > > > > implementations when promoting a new format change. There are also > > > concerns > > > > from the variant logical type PR [3] against parquet-java. This is > > > > something to > > > > discuss in the community if we want to make the variant type an > > > exception. > > > > > > > > [1] https://github.com/apache/parquet-format/pull/240 > > > > [2] https://lists.apache.org/thread/f9379yx0lf5gtpkgyv922pvowtzy4kmm > > > > [3] https://github.com/apache/parquet-java/pull/3072 > > > > > > > > Best, > > > > Gang > > > > > > > > On Wed, Dec 4, 2024 at 2:08 PM Gene Pang <[email protected]> > wrote: > > > > > > > > > Hi, > > > > > > > > > > We updated parquet-format < > https://github.com/apache/parquet-format> > > > to > > > > > include the Variant logical type annotation. Would someone be able > to > > > > > release parquet-format (and create the necessary artifacts) so that > > > > > parquet-java can be updated to depend on the new release? This > would > > > > enable > > > > > adding implementation in parquet-java. > > > > > > > > > > Thanks! > > > > > Gene > > > > > > > > > > > > > > >
