> Being part of Apache org?

It is also a question of IP clearance and ownership. 

On Thu, May 16, 2024, at 3:11 PM, Rok Mihevc wrote:
> What are the benefits of a parquet implementation being part of Apache
> Parquet vs another Apache project vs something else entirely?
> Being part of Apache org? Branding? Voting rights?
> If motivations are clear, solutions might be more readily apparent.
>
> Rok
>
> On Thu, May 16, 2024 at 2:36 PM Raphael Taylor-Davies
> <r.taylordav...@googlemail.com.invalid> wrote:
>
>> I'm curious where the other arrow parquet implementations fit into this,
>> if at all? For context, the original Rust implementation was largely the
>> work of Chao Sun, who I believe to be a parquet PMC member, but it was
>> then donated to the arrow project, and has primarily been developed and
>> maintained by individuals affiliated with the arrow project since then,
>> myself included. I'm not suggesting all parquet implementations
>> necessarily need to be governed by the parquet PMC, but perhaps what
>> ever compromise we devise for parquet-cpp might equally be applied to
>> the other parquet projects that fall under the arrow umbrella?
>>
>> Kind Regards,
>>
>> Raphael
>>
>> On 16/05/2024 13:26, Uwe L. Korn wrote:
>> > I would actually consider someone who contributes to both communities at
>> the same time to be a worthwhile addition to both projects. In my active
>> years, we have mostly voted people into both projects; the order was not
>> clear, though.
>> >
>> > Being a committer/PMC means that you want to bring the community around
>> a project forward in the Apache way (with parquet-cpp this is given as it
>> is part of the parquet community and also still in a project that is
>> residing within the Apache org).
>> >
>> >> he told me that the contribution to
>> >> parquet-cpp is no longer considered when promoting committers to
>> >> Apache Parquet PMC.
>> > As a Parquet PMC, I would strongly object to that and would be
>> supportive of also making them a Parquet committer/PMC.
>> >
>> > Best
>> > Uwe
>> >
>> > On Thu, May 16, 2024, at 2:19 PM, Gang Wu wrote:
>> >> Hi,
>> >>
>> >> I share the same feeling with Antoine that parquet-cpp seems to be fully
>> >> governed by Apache Arrow PMC, not the Apache Parquet PMC. I have
>> >> once discussed this with Xinli and he told me that the contribution to
>> >> parquet-cpp is no longer considered when promoting committers to
>> >> Apache Parquet PMC.
>> >>
>> >> Best,
>> >> Gang
>> >>
>> >> On Thu, May 16, 2024 at 4:29 PM Antoine Pitrou <anto...@python.org>
>> wrote:
>> >>
>> >>> On Thu, 16 May 2024 10:08:42 +0200
>> >>> "Uwe L. Korn" <uw...@xhochy.com> wrote:
>> >>>> On Tue, May 14, 2024, at 6:30 PM, Antoine Pitrou wrote:
>> >>>>> AFAIK, the only Parquet implementation under the Apache Parquet
>> project
>> >>>>> is parquet-mr :-)
>> >>>> This is not true. The parquet-cpp that resides in the arrow repository
>> >>> is still controlled by the Apache Parquet PMC. Back then, we only
>> merged
>> >>> the codebases but kept control of it with the Apache Parquet project. I
>> >>> know, it is hard to understand, but at least I have never seen a vote
>> that
>> >>> would move it out of the Apache Parquet's project "control".
>> >>>
>> >>> Ahah. Unfortunately, this doesn't match actual community practices. For
>> >>> example, when it is decided to give (Arrow) commit rights to a frequent
>> >>> Parquet C++ contributor, that decision is made among the Arrow PMC, not
>> >>> the Parquet PMC.
>> >>>
>> >>> Perhaps there would be value in aligning the legal situation on the
>> >>> _de facto_ situation?
>> >>>
>> >>> Regards
>> >>>
>> >>> Antoine.
>> >>>
>> >>>
>> >>>> Best
>> >>>> Uwe
>> >>>>>
>> >>>>> On Tue, 14 May 2024 10:58:58 +0200
>> >>>>> Rok Mihevc <rok.mih...@gmail.com> wrote:
>> >>>>>> Second Raphael's point.
>> >>>>>> Would it be reasonable to say specification change requires
>> >>> implementation
>> >>>>>> in two parquet implementations within Apache Parquet project?
>> >>>>>>
>> >>>>>> Rok
>> >>>>>>
>> >>>>>> On Tue, May 14, 2024 at 10:50 AM Gang Wu <
>> >>> ustcwg-re5jqeeqqe8avxtiumw...@public.gmane.org> wrote:
>> >>>>>>> IMHO, it looks more reasonable if a reference implementation is
>> >>> required
>> >>>>>>> to support most (not all) elements from the specification.
>> >>>>>>>
>> >>>>>>> Another question is: should we discuss (and vote for) each
>> candidate
>> >>>>>>> one by one? We can start with parquet-mr which is most well-known
>> >>>>>>> implementation.
>> >>>>>>>
>> >>>>>>> Best,
>> >>>>>>> Gang
>> >>>>>>>
>> >>>>>>> On Tue, May 14, 2024 at 4:41 PM Raphael Taylor-Davies
>> >>>>>>> <r.taylordav...@googlemail.com.invalid> wrote:
>> >>>>>>>
>> >>>>>>>> Potentially it would be helpful to flip the question around. As
>> >>> Andrew
>> >>>>>>>> articulates, a reference implementation is required to implement
>> >>> all
>> >>>>>>>> elements from the specification, and therefore the major
>> >>> consequence of
>> >>>>>>>> labeling parquet-mr thusly would be that any specification change
>> >>> would
>> >>>>>>>> have to be implemented within parquet-mr as part of the
>> >>> standardisation
>> >>>>>>>> process. It would be insufficient for it to be implemented in, for
>> >>>>>>>> example, two of the parquet implementations maintained by the
>> >>> arrow
>> >>>>>>>> project. I personally think that would be a shame and likely
>> >>> exclude
>> >>>>>>>> many people who would otherwise be interested in evolving the
>> >>> parquet
>> >>>>>>>> specification, but think that is at the core of this question.
>> >>>>>>>>
>> >>>>>>>> Kind Regards,
>> >>>>>>>>
>> >>>>>>>> Raphael
>> >>>>>>>>
>> >>>>>>>> On 13/05/2024 20:55, Andrew Lamb wrote:
>> >>>>>>>>> Question: Should we label parquet-mr or any other parquet
>> >>>>>>> implementations
>> >>>>>>>>> "reference" implications"?
>> >>>>>>>>>
>> >>>>>>>>> This came up as part of Vinoo's great PR to list different
>> >>> parquet
>> >>>>>>>>> reference implementations[1][2].
>> >>>>>>>>>
>> >>>>>>>>> The term "reference implementation" often has an official
>> >>> connotation.
>> >>>>>>>> For
>> >>>>>>>>> example the wikipedia definition is "a program that implements
>> >>> all
>> >>>>>>>>> requirements from a corresponding specification. The reference
>> >>>>>>>>> implementation ... should be considered the "correct" behavior
>> >>> of any
>> >>>>>>>> other
>> >>>>>>>>> implementation of it."[3]
>> >>>>>>>>>
>> >>>>>>>>> Given the close association of parquet-mr to the parquet
>> >>> standard, it
>> >>>>>>> is
>> >>>>>>>> a
>> >>>>>>>>> natural candidate to label as "reference implementation."
>> >>> However, it
>> >>>>>>> is
>> >>>>>>>>> not clear to me if there is consensus that it should be thusly
>> >>> labeled.
>> >>>>>>>>> I have a strong opinion that a consensus on this question would
>> >>> be very
>> >>>>>>>>> helpful. I don't actually have a strong opinion about the answer
>> >>>>>>>>>
>> >>>>>>>>> Andrew
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> [1]:
>> >>> https://github.com/apache/parquet-site/pull/53#discussion_r1582882267
>> >>>>>>>>> [2]:
>> >>> https://github.com/apache/parquet-site/pull/53#discussion_r1598283465
>> >>>>>>>>> [3]:  https://en.wikipedia.org/wiki/Reference_implementation
>> >>>>>>>>>
>> >>>
>> >>>
>> >>>
>>

Reply via email to