I think it would be cleaner to have a parquet-format release with the finalized spec first. Referencing it in the parquet-java release would state clearly that it is (supposed to) working according to the finalized specification.
Gabor Gang Wu <[email protected]> ezt írta (időpont: 2025. aug. 25., H, 4:48): > The vote [1] for finalizing variant spec has passed so it's time to revive > this discussion. > > I just checked all the commits [2] to parquet-format since the last release > and found > that there is no thrift definition change. All commits are about > clarification or fixing typos. > Should we skip the format release and directly jump to the parquet-java > release? > > [1] https://lists.apache.org/thread/mr2voh7twz2hql4y59x5c7o32kntmbvm > [2] > https://github.com/apache/parquet-format/commits/master/?since=2025-03-24 > > Best, > Gang > > > On Wed, Aug 20, 2025 at 9:58 AM Gang Wu <[email protected]> wrote: > > > Thanks for the heads up! > > > > Yes, I think a formal vote is required before merging the PR. > > > > Best, > > Gang > > > > On Wed, Aug 20, 2025 at 12:36 AM Aihua Xu <[email protected]> wrote: > > > >> Hi community, > >> > >> Let me know if a vote process is needed or we can review in > >> https://github.com/apache/parquet-format/pull/509 (which is to remove > the > >> under development lines). > >> > >> Thanks, > >> Aihua > >> > >> On Mon, Aug 18, 2025 at 10:53 AM Aihua Xu <[email protected]> wrote: > >> > >> > Hi Micah and community, > >> > > >> > We’ve generated the test files from Go (PR #94 > >> > <https://github.com/apache/parquet-testing/pull/94>) and successfully > >> > validated them in Parquet-Java (PR #3258 > >> > <https://github.com/apache/parquet-java/pull/3258>). During testing, > we > >> > identified two minor issues in the Go generation: > >> > > >> > 1. > >> > > >> > The spec version should be *1* instead of *0*. > >> > 2. > >> > > >> > The Parquet TIME type should be TIME(isAdjustedToUTC=false, MICROS) > >> > instead of TIME(isAdjustedToUTC=true, MICROS). > >> > > >> > These issues have already been addressed by Matt. > >> > > >> > Looking ahead, here’s what I propose for closing out the Variant > >> release: > >> > > >> > 1. > >> > > >> > Start a vote to finalize the Variant spec (removing the two lines > >> > under *active development*). > >> > 2. > >> > > >> > Start a vote for the Parquet-Java 1.16.0 release. > >> > > >> > Please share your thoughts on these next steps, or let me know if you > >> see > >> > anything else we should address before proceeding. > >> > > >> > Thanks, > >> > Aihua > >> > > >> > On Sun, Aug 17, 2025 at 9:28 PM Micah Kornfield < > [email protected]> > >> > wrote: > >> > > >> >> > > >> >> > You want to see if the write path in GO is compatible? Let > >> >> > me check with Matt on this. > >> >> > >> >> > >> >> Yes, IIUC, I think there are now multiple OSS reader implementations, > >> that > >> >> have all been validated against parquet-java writing. So I think it > is > >> >> important we validate a second writer can produce files that can be > >> read > >> >> by > >> >> parquet-java. > >> >> > >> >> Thanks, > >> >> Micah > >> >> > >> >> On Mon, Aug 11, 2025 at 9:17 AM Aihua Xu <[email protected]> wrote: > >> >> > >> >> > Hi Micah, > >> >> > > >> >> > What we have done is to generate a large set of the test cases from > >> the > >> >> > Iceberg project and validate in Java and GO. All of those > >> >> implementations > >> >> > are independent. You want to see if the write path in GO is > >> compatible? > >> >> Let > >> >> > me check with Matt on this. > >> >> > > >> >> > Thanks, > >> >> > Aihua > >> >> > > >> >> > On Sun, Aug 10, 2025 at 9:24 PM Micah Kornfield < > >> [email protected]> > >> >> > wrote: > >> >> > > >> >> > > > > >> >> > > > We have completed cross-language validation for variant and the > >> >> > > > implementation compatibility appears solid > >> >> > > > >> >> > > > >> >> > > Great, apologies if I missed it but did we verify Java being able > >> to > >> >> read > >> >> > > Go's output? > >> >> > > > >> >> > > On Fri, Aug 8, 2025 at 9:38 PM Aihua Xu <[email protected]> > wrote: > >> >> > > > >> >> > > > We have completed cross-language validation for variant and the > >> >> > > > implementation compatibility appears solid. Matt has raised > some > >> >> > comments > >> >> > > > regarding how to handle invalid cases. In fact, we had a long > >> >> > discussion > >> >> > > > during the spec development about whether to explicitly define > >> the > >> >> > > behavior > >> >> > > > for such cases. We should be able to clear that out soon. > >> >> > > > > >> >> > > > > >> >> > > > > On Aug 8, 2025, at 2:35 PM, Jia Yu <[email protected]> wrote: > >> >> > > > > > >> >> > > > > Hi Gang, > >> >> > > > > > >> >> > > > > Thanks for letting me know. > >> >> > > > > > >> >> > > > > Would it make sense to create a new Parquet Java branch that > >> >> includes > >> >> > > all > >> >> > > > > other commits except the Variant type implementation? That > >> way, we > >> >> > > could > >> >> > > > > release a version without Variant entirely. > >> >> > > > > > >> >> > > > > We’re eager to get the Geo type released, but at the same > >> time, we > >> >> > > don’t > >> >> > > > > want to rush the Variant work or ship something that’s not > >> fully > >> >> > ready. > >> >> > > > > > >> >> > > > > Thanks, > >> >> > > > > Jia > >> >> > > > > > >> >> > > > >> On Fri, Aug 8, 2025 at 1:25 AM Gang Wu <[email protected]> > >> wrote: > >> >> > > > >> > >> >> > > > >> parquet-cpp does not implement variant type yet, so it is > >> safe to > >> >> > > > release > >> >> > > > >> the geo types. IIUC, there is no easy way to block users > from > >> >> > > producing > >> >> > > > >> files with variant types in parquet-java, so this is the > main > >> >> > concern. > >> >> > > > >> > >> >> > > > >> Perhaps Aihua can provide an update on the progress? > >> >> > > > >> > >> >> > > > >> Best, > >> >> > > > >> Gang > >> >> > > > >> > >> >> > > > >> > >> >> > > > >> > >> >> > > > >>> On Fri, Aug 8, 2025 at 5:11 AM Jia Yu <[email protected]> > >> wrote: > >> >> > > > >>> > >> >> > > > >>> Hi all, > >> >> > > > >>> > >> >> > > > >>> Thank you for all your hard work on Parquet. > >> >> > > > >>> > >> >> > > > >>> Sorry for my ignorance, but I’d like to better understand > why > >> >> the > >> >> > > > Parquet > >> >> > > > >>> Java release for Geo types is currently tied to the Variant > >> type > >> >> > > work. > >> >> > > > >>> Arrow C++ (Parquet C++) has already been released with Geo > >> type > >> >> > > > support, > >> >> > > > >>> and it doesn’t seem to have encountered similar issues. > >> >> > > > >>> > >> >> > > > >>> The Geo type support in Iceberg has been stalled for > several > >> >> months > >> >> > > > >> because > >> >> > > > >>> the Iceberg PMC cannot review or merge the implementation > >> until > >> >> > > > there’s a > >> >> > > > >>> corresponding Parquet Java release. > >> >> > > > >>> > >> >> > > > >>> Would it be possible to proceed with a new Parquet Java > >> release > >> >> for > >> >> > > > Geo, > >> >> > > > >>> and mark the Variant type as experimental or keep it > behind a > >> >> > feature > >> >> > > > >> flag? > >> >> > > > >>> > >> >> > > > >>> I’d really appreciate your thoughts on this and am looking > >> >> forward > >> >> > to > >> >> > > > >> your > >> >> > > > >>> response. > >> >> > > > >>> > >> >> > > > >>> Thanks, > >> >> > > > >>> Jia > >> >> > > > >>> > >> >> > > > >>> > >> >> > > > >>> > >> >> > > > >>>> On Fri, Jul 18, 2025 at 10:33 AM Aihua Xu < > >> [email protected]> > >> >> > > wrote: > >> >> > > > >>> > >> >> > > > >>>> Seems the concern from Gabor is that we should finalize > the > >> >> > Variant > >> >> > > > >> spec > >> >> > > > >>> ( > >> >> > > > >>>> > >> >> > > > >> > >> >> > > > >> >> > >> https://github.com/apache/parquet-format/blob/master/VariantEncoding.md > >> >> > > > >>>> and > >> >> > > > >>>> > >> >> > > > >> > >> >> > > > > >> >> > > >> >> > >> > https://github.com/apache/parquet-format/blob/master/VariantShredding.md > >> >> > > > >>> ), > >> >> > > > >>>> have a parquet-format release, and then move forward with > >> >> > > parquet-java > >> >> > > > >>>> release. I totally agree. > >> >> > > > >>>> > >> >> > > > >>>> We should have met the requirement with two reference > >> >> > > implementations > >> >> > > > >> for > >> >> > > > >>>> Variant in open source and I will start a VOTE thread > >> >> separately > >> >> > to > >> >> > > > >> close > >> >> > > > >>>> out the Variant spec if no objections. > >> >> > > > >>>> > >> >> > > > >>>> Thanks for the discussions. > >> >> > > > >>>> Aihua > >> >> > > > >>>> > >> >> > > > >>>> > >> >> > > > >>>> On Thu, Jul 17, 2025 at 3:41 AM Andrew Lamb < > >> >> > [email protected] > >> >> > > > > >> >> > > > >>>> wrote: > >> >> > > > >>>> > >> >> > > > >>>>>> At this point, I’d like to check if we have enough > >> >> > implementation > >> >> > > > >>>>> coverage > >> >> > > > >>>>>> to move forward with finalizing the Variant spec. Would > it > >> >> make > >> >> > > > >> sense > >> >> > > > >>>> to > >> >> > > > >>>>>> start a vote thread at this stage? > >> >> > > > >>>>> > >> >> > > > >>>>> In my opinion we have sufficient open source > >> implementations > >> >> (the > >> >> > > > >>> Golang > >> >> > > > >>>>> implementation on arrow-go) and a vote to finalize the > spec > >> >> would > >> >> > > be > >> >> > > > >>>>> appropriate (and welcome) > >> >> > > > >>>>> > >> >> > > > >>>>> From my experience working on the Rust implementation so > >> far, > >> >> I > >> >> > > have > >> >> > > > >>>> found > >> >> > > > >>>>> the spec clear and easy to understand, the design well > >> thought > >> >> > out, > >> >> > > > >> and > >> >> > > > >>>>> have not encountered anything that would require any > >> changes. > >> >> > > > >>>>> > >> >> > > > >>>>> Kudos to the team who designed and wrote the spec for > this > >> >> > feature, > >> >> > > > >>>>> Andrew > >> >> > > > >>>>> > >> >> > > > >>>>> > >> >> > > > >>>>> > >> >> > > > >>>>> On Thu, Jul 17, 2025 at 2:08 AM Jia Yu <[email protected] > > > >> >> wrote: > >> >> > > > >>>>> > >> >> > > > >>>>>> Thanks Aihua! > >> >> > > > >>>>>> > >> >> > > > >>>>>> The geo type implementation in Iceberg is currently > >> blocked > >> >> by > >> >> > > this > >> >> > > > >>>>>> release. Really looking forward to it. > >> >> > > > >>>>>> > >> >> > > > >>>>>> Jia > >> >> > > > >>>>>> > >> >> > > > >>>>>> On Wed, Jul 16, 2025 at 10:47 PM Gábor Szádovszky < > >> >> > > > >> [email protected]> > >> >> > > > >>>>>> wrote: > >> >> > > > >>>>>> > >> >> > > > >>>>>>> My concern was related to the current stage of the > >> Variant > >> >> > > > >>>>> specification > >> >> > > > >>>>>>> and the fact that we started talking about releasing > >> >> > parquet-java > >> >> > > > >>>> with > >> >> > > > >>>>>>> Variant features. > >> >> > > > >>>>>>> If we formally release parquet-format with the > finalized > >> >> > Variant > >> >> > > > >>> spec > >> >> > > > >>>>>>> first, then I have no concerns about writing Variant > >> values > >> >> in > >> >> > > > >> the > >> >> > > > >>>>>> upcoming > >> >> > > > >>>>>>> parquet-java release. Otherwise, we need to block it by > >> >> default > >> >> > > > >> and > >> >> > > > >>>>> mark > >> >> > > > >>>>>> it > >> >> > > > >>>>>>> as an experimental feature. > >> >> > > > >>>>>>> > >> >> > > > >>>>>>> Cheers, > >> >> > > > >>>>>>> Gabor > >> >> > > > >>>>>>> > >> >> > > > >>>>>>> Aihua Xu <[email protected]> ezt írta (időpont: 2025. > >> júl. > >> >> > 16., > >> >> > > > >>> Sze, > >> >> > > > >>>>>>> 19:37): > >> >> > > > >>>>>>> > >> >> > > > >>>>>>>> Hi Gabor and all, > >> >> > > > >>>>>>>> > >> >> > > > >>>>>>>> Here’s my current understanding of the progress on the > >> >> > > > >> *Variant* > >> >> > > > >>>>>> support > >> >> > > > >>>>>>> in > >> >> > > > >>>>>>>> Parquet: > >> >> > > > >>>>>>>> > >> >> > > > >>>>>>>> - > >> >> > > > >>>>>>>> > >> >> > > > >>>>>>>> Per Parquet's requirements, we need at least two > >> >> reference > >> >> > > > >>>>>>>> implementations to finalize the Variant logical type > >> >> > > > >>>>> specification. > >> >> > > > >>>>>>>> - > >> >> > > > >>>>>>>> > >> >> > > > >>>>>>>> The community is actively working on Java, Go, and > >> Rust > >> >> > > > >>>>>>> implementations: > >> >> > > > >>>>>>>> - > >> >> > > > >>>>>>>> > >> >> > > > >>>>>>>> Java already has the encoding and shredding > >> >> > > > >> implementations > >> >> > > > >>>> in > >> >> > > > >>>>>>> place: > >> >> > > > >>>>>>>> - > >> >> > > > >>>>>>>> > >> >> > > > >>>>>>>> Variant Decoding < > >> >> > > > >>>>>>>> https://github.com/apache/parquet-java/pull/3197> > >> >> > > > >>>>>>>> - > >> >> > > > >>>>>>>> > >> >> > > > >>>>>>>> Variant Encoding < > >> >> > > > >>>>>>>> https://github.com/apache/parquet-java/pull/3202> > >> >> > > > >>>>>>>> - > >> >> > > > >>>>>>>> > >> >> > > > >>>>>>>> Variant Shredding Writer > >> >> > > > >>>>>>>> < > >> >> https://github.com/apache/parquet-java/issues/3223> > >> >> > > > >>>>>>>> - > >> >> > > > >>>>>>>> > >> >> > > > >>>>>>>> Variant Shredding Reader > >> >> > > > >>>>>>>> < > >> >> https://github.com/apache/parquet-java/issues/3211> > >> >> > > > >>>>>>>> - > >> >> > > > >>>>>>>> > >> >> > > > >>>>>>>> Go also includes encoding and shredding support: > >> >> > > > >>>>>>>> - > >> >> > > > >>>>>>>> > >> >> > > > >>>>>>>> Variant Encoding/Decoding > >> >> > > > >>>>>>>> <https://github.com/apache/arrow-go/pull/344> > >> >> > > > >>>>>>>> - > >> >> > > > >>>>>>>> > >> >> > > > >>>>>>>> Variant Shredding < > >> >> > > > >>>>>> https://github.com/apache/arrow-go/pull/434> > >> >> > > > >>>>>>>> - > >> >> > > > >>>>>>>> > >> >> > > > >>>>>>>> Rust is currently working on the shredding > >> >> > > > >> implementation. > >> >> > > > >>>>>>>> > >> >> > > > >>>>>>>> In addition to these, we already have a full Variant > >> >> > > > >>> implementation > >> >> > > > >>>>> in > >> >> > > > >>>>>>>> Apache Iceberg, as well as in some closed-source > >> engines. > >> >> > > > >>>>>>>> > >> >> > > > >>>>>>>> At this point, I’d like to check if we have enough > >> >> > > > >> implementation > >> >> > > > >>>>>>> coverage > >> >> > > > >>>>>>>> to move forward with finalizing the Variant spec. > Would > >> it > >> >> > make > >> >> > > > >>>> sense > >> >> > > > >>>>>> to > >> >> > > > >>>>>>>> start a vote thread at this stage? > >> >> > > > >>>>>>>> > >> >> > > > >>>>>>>> Ultimately, our goal is to release a new version of > >> >> > > > >>> parquet-format > >> >> > > > >>>>> and > >> >> > > > >>>>>>>> parquet-java that includes the Variant logical type, > so > >> >> that > >> >> > > > >>>> Iceberg > >> >> > > > >>>>>> and > >> >> > > > >>>>>>>> other engines can officially depend on it and proceed > >> with > >> >> > > > >>> further > >> >> > > > >>>>>>>> implementation. > >> >> > > > >>>>>>>> > >> >> > > > >>>>>>>> Let me know your thoughts and how we should proceed. > >> >> > > > >>>>>>>> > >> >> > > > >>>>>>>> Thanks, > >> >> > > > >>>>>>>> > >> >> > > > >>>>>>>> Aihua > >> >> > > > >>>>>>>> > >> >> > > > >>>>>>>> On Sun, Jul 13, 2025 at 10:08 PM Gábor Szádovszky < > >> >> > > > >>>> [email protected]> > >> >> > > > >>>>>>>> wrote: > >> >> > > > >>>>>>>> > >> >> > > > >>>>>>>>> Hi, > >> >> > > > >>>>>>>>> > >> >> > > > >>>>>>>>> I was not able to open the recordings of the last > >> meeting > >> >> > > > >>> because > >> >> > > > >>>>> of > >> >> > > > >>>>>>>>> permission issues. (Shouldn't these be accessible for > >> >> > > > >> anyone?) > >> >> > > > >>>>>>>>> So, I'm not sure if you have talked about this, but > the > >> >> > > > >> Variant > >> >> > > > >>>>> spec > >> >> > > > >>>>>> is > >> >> > > > >>>>>>>>> still not final. Since parquet-java already has > Variant > >> >> > > > >>> support, > >> >> > > > >>>>> how > >> >> > > > >>>>>> do > >> >> > > > >>>>>>>> we > >> >> > > > >>>>>>>>> prevent writing potentially invalid Variant data with > >> the > >> >> > > > >>> proper > >> >> > > > >>>>>>> logical > >> >> > > > >>>>>>>>> types we will use for the finalized spec? Is it > behind > >> a > >> >> > > > >>> feature > >> >> > > > >>>>>> flag? > >> >> > > > >>>>>>>>> > >> >> > > > >>>>>>>>> Cheers, > >> >> > > > >>>>>>>>> Gabor > >> >> > > > >>>>>>>>> > >> >> > > > >>>>>>>>> Aihua Xu <[email protected]> ezt írta (időpont: > 2025. > >> >> júl. > >> >> > > > >>> 11., > >> >> > > > >>>> P, > >> >> > > > >>>>>>>> 19:33): > >> >> > > > >>>>>>>>> > >> >> > > > >>>>>>>>>> Hi community, > >> >> > > > >>>>>>>>>> > >> >> > > > >>>>>>>>>> As discussed in the last community sync-up meeting, > >> I'd > >> >> > > > >> like > >> >> > > > >>> to > >> >> > > > >>>>>>> proceed > >> >> > > > >>>>>>>>>> with releasing *Parquet-Java 1.16.0*, which will > >> include > >> >> > > > >>>> support > >> >> > > > >>>>>> for > >> >> > > > >>>>>>>>>> *geo-type* and *variant*. > >> >> > > > >>>>>>>>>> > >> >> > > > >>>>>>>>>> Please let me know if you have any objections or if > >> you > >> >> > > > >> have > >> >> > > > >>>> any > >> >> > > > >>>>>>>> upcoming > >> >> > > > >>>>>>>>>> changes you'd like to include in this release. > >> >> > > > >>>>>>>>>> Thanks, > >> >> > > > >>>>>>>>>> Aihua > >> >> > > > >>>>>>>>>> > >> >> > > > >>>>>>>>> > >> >> > > > >>>>>>>> > >> >> > > > >>>>>>> > >> >> > > > >>>>>> > >> >> > > > >>>>> > >> >> > > > >>>> > >> >> > > > >>> > >> >> > > > >> > >> >> > > > > >> >> > > > >> >> > > >> >> > >> > > >> > > >
