Would it make sense to transfer all governance of the parquet-cpp
implementation to Apache Arrow? It seems like that's where we de facto are
already, so that would resolve these ambiguities and put it in line with
the Rust implementation.

Would the Parquet PMC be opposed to formalizing this change?

Neal

On Thu, Feb 2, 2023 at 6:48 AM Raphael Taylor-Davies
<r.taylordav...@googlemail.com.invalid> wrote:

> Hi,
>
> > Does the parquet rust implementation have a similar issue?
>
> Similar to the C++ implementation, the Rust implementation lives under
> the Apache Arrow umbrella and does not have any direct affiliation with
> the Apache Parquet project that I am aware of, beyond using the same
> format specification. However, as almost all of the users and
> contributions are with respect to the arrow interfaces, and not the
> parquet record APIs, there perhaps isn't the same ambiguity as
> encountered with the C++ implementation. I would expect all issues to be
> raised in the arrow-rs repository, and a PARQUET Jira only raised,
> likely by myself or whoever is triaging the issue, if there is some
> issue/ambiguity pertaining to the format itself.
>
> Kind Regards,
>
> Raphael
>
> On 02/02/2023 01:58, Gang Wu wrote:
> > Hi Will,
> >
> > AFAIK, the Apache Parquet community no longer considers contribution to
> > parquet-cpp when promoting new committers after the donation to Apache
> > Arrow.
> >
> > It would be a dilemma for the parquet-cpp contributors if none of the
> > Apache Arrow community or Apache Parquet community recognizes their work.
> >
> > Does the parquet rust implementation have a similar issue?
> >
> > Best,
> > Gang
> >
> > On Thu, Feb 2, 2023 at 3:27 AM Will Jones <will.jones...@gmail.com>
> wrote:
> >
> >> Hello,
> >>
> >> A while back, the Parquet C++ implementation was merged into the Apache
> >> Arrow monorepo [1]. As I understand it, this helped the development
> process
> >> immensely. However, I am noticing some governance issues because of it.
> >>
> >> First, it's not obvious where issues are supposed to be open: In Parquet
> >> Jira or Arrow GitHub issues. Looking back at some of the original
> >> discussion, it looks like the intention was
> >>
> >> * use PARQUET-XXX for issues relating to Parquet core
> >>> * use ARROW-XXX for issues relation to Arrow's consumption of Parquet
> >>> core (e.g. changes that are in parquet/arrow right now)
> >>>
> >> The README for the old parquet-cpp repo [3] states instead in it's
> >> migration note:
> >>
> >>   JIRA issues should continue to be opened in the PARQUET JIRA project.
> >>
> >>
> >> Either way, it doesn't seem like this process is obvious to people.
> Perhaps
> >> we could clarify this and add notices to Arrow's GitHub issues template?
> >>
> >> Second, committer status is a little unclear. I am a committer on Arrow,
> >> but not on Parquet right now. Does that mean I should only merge Parquet
> >> C++ PRs for code changes in parquet/arrow? Or that I shouldn't merge
> >> Parquet changes at all?
> >>
> >> Also, are the contributions to Arrow C++ Parquet being actively reviewed
> >> for potential new committers?
> >>
> >> Best,
> >>
> >> Will Jones
> >>
> >> [1] https://lists.apache.org/thread/76wzx2lsbwjl363bg066g8kdsocd03rw
> >> [2] https://lists.apache.org/thread/dkh6vjomcfyjlvoy83qdk9j5jgxk7n4j
> >> [3] https://github.com/apache/parquet-cpp
> >>
>

Reply via email to