> Being part of Apache org? It is also a question of IP clearance and ownership.
On Thu, May 16, 2024, at 3:11 PM, Rok Mihevc wrote: > What are the benefits of a parquet implementation being part of Apache > Parquet vs another Apache project vs something else entirely? > Being part of Apache org? Branding? Voting rights? > If motivations are clear, solutions might be more readily apparent. > > Rok > > On Thu, May 16, 2024 at 2:36 PM Raphael Taylor-Davies > <r.taylordav...@googlemail.com.invalid> wrote: > >> I'm curious where the other arrow parquet implementations fit into this, >> if at all? For context, the original Rust implementation was largely the >> work of Chao Sun, who I believe to be a parquet PMC member, but it was >> then donated to the arrow project, and has primarily been developed and >> maintained by individuals affiliated with the arrow project since then, >> myself included. I'm not suggesting all parquet implementations >> necessarily need to be governed by the parquet PMC, but perhaps what >> ever compromise we devise for parquet-cpp might equally be applied to >> the other parquet projects that fall under the arrow umbrella? >> >> Kind Regards, >> >> Raphael >> >> On 16/05/2024 13:26, Uwe L. Korn wrote: >> > I would actually consider someone who contributes to both communities at >> the same time to be a worthwhile addition to both projects. In my active >> years, we have mostly voted people into both projects; the order was not >> clear, though. >> > >> > Being a committer/PMC means that you want to bring the community around >> a project forward in the Apache way (with parquet-cpp this is given as it >> is part of the parquet community and also still in a project that is >> residing within the Apache org). >> > >> >> he told me that the contribution to >> >> parquet-cpp is no longer considered when promoting committers to >> >> Apache Parquet PMC. >> > As a Parquet PMC, I would strongly object to that and would be >> supportive of also making them a Parquet committer/PMC. >> > >> > Best >> > Uwe >> > >> > On Thu, May 16, 2024, at 2:19 PM, Gang Wu wrote: >> >> Hi, >> >> >> >> I share the same feeling with Antoine that parquet-cpp seems to be fully >> >> governed by Apache Arrow PMC, not the Apache Parquet PMC. I have >> >> once discussed this with Xinli and he told me that the contribution to >> >> parquet-cpp is no longer considered when promoting committers to >> >> Apache Parquet PMC. >> >> >> >> Best, >> >> Gang >> >> >> >> On Thu, May 16, 2024 at 4:29 PM Antoine Pitrou <anto...@python.org> >> wrote: >> >> >> >>> On Thu, 16 May 2024 10:08:42 +0200 >> >>> "Uwe L. Korn" <uw...@xhochy.com> wrote: >> >>>> On Tue, May 14, 2024, at 6:30 PM, Antoine Pitrou wrote: >> >>>>> AFAIK, the only Parquet implementation under the Apache Parquet >> project >> >>>>> is parquet-mr :-) >> >>>> This is not true. The parquet-cpp that resides in the arrow repository >> >>> is still controlled by the Apache Parquet PMC. Back then, we only >> merged >> >>> the codebases but kept control of it with the Apache Parquet project. I >> >>> know, it is hard to understand, but at least I have never seen a vote >> that >> >>> would move it out of the Apache Parquet's project "control". >> >>> >> >>> Ahah. Unfortunately, this doesn't match actual community practices. For >> >>> example, when it is decided to give (Arrow) commit rights to a frequent >> >>> Parquet C++ contributor, that decision is made among the Arrow PMC, not >> >>> the Parquet PMC. >> >>> >> >>> Perhaps there would be value in aligning the legal situation on the >> >>> _de facto_ situation? >> >>> >> >>> Regards >> >>> >> >>> Antoine. >> >>> >> >>> >> >>>> Best >> >>>> Uwe >> >>>>> >> >>>>> On Tue, 14 May 2024 10:58:58 +0200 >> >>>>> Rok Mihevc <rok.mih...@gmail.com> wrote: >> >>>>>> Second Raphael's point. >> >>>>>> Would it be reasonable to say specification change requires >> >>> implementation >> >>>>>> in two parquet implementations within Apache Parquet project? >> >>>>>> >> >>>>>> Rok >> >>>>>> >> >>>>>> On Tue, May 14, 2024 at 10:50 AM Gang Wu < >> >>> ustcwg-re5jqeeqqe8avxtiumw...@public.gmane.org> wrote: >> >>>>>>> IMHO, it looks more reasonable if a reference implementation is >> >>> required >> >>>>>>> to support most (not all) elements from the specification. >> >>>>>>> >> >>>>>>> Another question is: should we discuss (and vote for) each >> candidate >> >>>>>>> one by one? We can start with parquet-mr which is most well-known >> >>>>>>> implementation. >> >>>>>>> >> >>>>>>> Best, >> >>>>>>> Gang >> >>>>>>> >> >>>>>>> On Tue, May 14, 2024 at 4:41 PM Raphael Taylor-Davies >> >>>>>>> <r.taylordav...@googlemail.com.invalid> wrote: >> >>>>>>> >> >>>>>>>> Potentially it would be helpful to flip the question around. As >> >>> Andrew >> >>>>>>>> articulates, a reference implementation is required to implement >> >>> all >> >>>>>>>> elements from the specification, and therefore the major >> >>> consequence of >> >>>>>>>> labeling parquet-mr thusly would be that any specification change >> >>> would >> >>>>>>>> have to be implemented within parquet-mr as part of the >> >>> standardisation >> >>>>>>>> process. It would be insufficient for it to be implemented in, for >> >>>>>>>> example, two of the parquet implementations maintained by the >> >>> arrow >> >>>>>>>> project. I personally think that would be a shame and likely >> >>> exclude >> >>>>>>>> many people who would otherwise be interested in evolving the >> >>> parquet >> >>>>>>>> specification, but think that is at the core of this question. >> >>>>>>>> >> >>>>>>>> Kind Regards, >> >>>>>>>> >> >>>>>>>> Raphael >> >>>>>>>> >> >>>>>>>> On 13/05/2024 20:55, Andrew Lamb wrote: >> >>>>>>>>> Question: Should we label parquet-mr or any other parquet >> >>>>>>> implementations >> >>>>>>>>> "reference" implications"? >> >>>>>>>>> >> >>>>>>>>> This came up as part of Vinoo's great PR to list different >> >>> parquet >> >>>>>>>>> reference implementations[1][2]. >> >>>>>>>>> >> >>>>>>>>> The term "reference implementation" often has an official >> >>> connotation. >> >>>>>>>> For >> >>>>>>>>> example the wikipedia definition is "a program that implements >> >>> all >> >>>>>>>>> requirements from a corresponding specification. The reference >> >>>>>>>>> implementation ... should be considered the "correct" behavior >> >>> of any >> >>>>>>>> other >> >>>>>>>>> implementation of it."[3] >> >>>>>>>>> >> >>>>>>>>> Given the close association of parquet-mr to the parquet >> >>> standard, it >> >>>>>>> is >> >>>>>>>> a >> >>>>>>>>> natural candidate to label as "reference implementation." >> >>> However, it >> >>>>>>> is >> >>>>>>>>> not clear to me if there is consensus that it should be thusly >> >>> labeled. >> >>>>>>>>> I have a strong opinion that a consensus on this question would >> >>> be very >> >>>>>>>>> helpful. I don't actually have a strong opinion about the answer >> >>>>>>>>> >> >>>>>>>>> Andrew >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> [1]: >> >>> https://github.com/apache/parquet-site/pull/53#discussion_r1582882267 >> >>>>>>>>> [2]: >> >>> https://github.com/apache/parquet-site/pull/53#discussion_r1598283465 >> >>>>>>>>> [3]: https://en.wikipedia.org/wiki/Reference_implementation >> >>>>>>>>> >> >>> >> >>> >> >>> >>