It is great to see some additional enthusiasm and momentum around the
Apache Parquet implementation (congratulations on the release of parquet-mr
1.14[1]!).

As activity picks up, if the desire is to build more community around
Parquet, perhaps the Parquet PMC wants to encourage moving code back to
repositories managed by parquet (and out of arrow, for example). I realize
this would be a technical burden, but it might clarify communities and
committers.

Andrew

[1]: https://lists.apache.org/thread/2gggm938z0x9fx3wtwctfm5htsxlf3z4



On Fri, May 10, 2024 at 11:45 PM Matt Topol <zotthewiz...@gmail.com> wrote:

> I just wanted to also poke the question of non-Java developers who have
> worked on the other parquet implementations potentially being recognized as
> committers or otherwise on the Parquet project (speaking as the primary
> developer of the Go parquet implementation which also lives in the Arrow
> repository). It would be great to see some active contributors to
> parquet-cpp, parquet-go, or otherwise not just being considered but
> actively becoming committers.
>
> That's just my two cents from a community perspective.
>
> --Matt
>
> On Fri, May 10, 2024, 10:35 PM Jacob Wujciak <assignu...@apache.org>
> wrote:
>
> > Thank you, that sounds great! On first glance some seem to be rather old
> > and probably don't apply anymore.
> >
> > > BTW, do we really need to make a full copy of them to have a mirror in
> > the Arrow GitHub issues?
> >
> > I am not sure I understand what you mean? A full copy of the
> > open/closed/all issues? I'd say only the (remaining) open issues would be
> > fine.
> > For reference this is what an imported issue looks like:
> > https://github.com/apache/arrow/issues/30543
> >
> > Am Sa., 11. Mai 2024 um 04:09 Uhr schrieb Gang Wu <ust...@gmail.com>:
> >
> > > I can initiate the vote. But before the vote, I think we need to
> revisit
> > > the states of all unresolved tickets and close some as needed.
> > >
> > > BTW, do we really need to make a full copy of them to have a mirror
> > > in the Arrow GitHub issues?
> > >
> > > I'd like to seek a consensus here before sending the vote.
> > >
> > > Best,
> > > Gang
> > >
> > > On Sat, May 11, 2024 at 8:46 AM Jacob Wujciak <assignu...@apache.org>
> > > wrote:
> > >
> > > > Hello Everyone!
> > > >
> > > > It seems there is general agreement on this topic, it would be great
> > if a
> > > > committer/PMC could start a (lazy consensus) procedural vote.
> > > >
> > > > I will inquire how to handle the parquet-cpp component in jira
> (ideally
> > > > disabling it, not removing).
> > > > There are currently only ~70 open tickets for parquet-cpp, with the
> > > change
> > > > in repo it is probably easier to just move open tickets but I'll
> leave
> > > that
> > > > to Rok who managed the transition of Arrows 20k+ tickets too :D
> > > >
> > > > Thanks,
> > > > Jacob
> > > >
> > > > Arrow committer
> > > >
> > > > On 2024/04/25 05:31:18 Gang Wu wrote:
> > > > > I know we have some non-Java committers and PMCs. But after the
> > > > parquet-cpp
> > > > > donation, it seems that no one worked on Parquet from arrow (cpp,
> > rust,
> > > > go,
> > > > > etc.)
> > > > > and other projects are promoted as a Parquet committer. It would be
> > > > > inconvenient
> > > > > for non-Java Parquet developers to work with apache/parquet-format
> > and
> > > > > apache/parquet-testing repositories. Furthermore, votes from these
> > > > > developers
> > > > > are not binding for a format change in the ML.
> > > > >
> > > > > Best,
> > > > > Gang
> > > > >
> > > > > On Wed, Apr 24, 2024 at 8:42 PM Uwe L. Korn <uw...@xhochy.com>
> > wrote:
> > > > >
> > > > > > > Should we consider
> > > > > > > Parquet developers from other projects than parquet-mr as
> Parquet
> > > > > > commuters?
> > > > > >
> > > > > > We are doing this (speaking as a Parquet PMC who didn't work on
> > > > > > parquet-mr, but parquet-cpp).
> > > > > >
> > > > > > Best
> > > > > > Uwe
> > > > > >
> > > > > > On Wed, Apr 24, 2024, at 2:38 PM, Gang Wu wrote:
> > > > > > > +1 for moving parquet-cpp issues from Apache Jira to Arrow's
> > GitHub
> > > > > > issue.
> > > > > > >
> > > > > > > Besides, I want to echo Will's question in the thread. Should
> we
> > > > consider
> > > > > > > Parquet developers from other projects than parquet-mr as
> Parquet
> > > > > > commiters?
> > > > > > > Currently apache/parquet-format and apache/parquet-testing
> > > > repositories
> > > > > > are
> > > > > > > solely governed by Apache Parquet PMC. It would be better for
> the
> > > > entire
> > > > > > > Parquet community if developers with sufficient contributions
> to
> > > open
> > > > > > source
> > > > > > > Parquet projects (including but not limited to parquet-cpp,
> > > arrow-rs,
> > > > > > cudf,
> > > > > > > etc.)
> > > > > > > can be considered as Parquet committer and PMC.
> > > > > > >
> > > > > > > Best,
> > > > > > > Gang
> > > > > > >
> > > > > > > On Wed, Apr 24, 2024 at 7:04 PM Uwe L. Korn <uw...@xhochy.com>
> > > > wrote:
> > > > > > >
> > > > > > >> I would be very supportive of this move. The Parquet C++
> > > > development has
> > > > > > >> been under the umbrella of the Arrow repository for more than
> > > > five(six?)
> > > > > > >> years now. Thus, the issues should also be aligned with the
> > Arrow
> > > > > > project.
> > > > > > >>
> > > > > > >> Uwe
> > > > > > >>
> > > > > > >> On Tue, Apr 23, 2024, at 8:27 PM, Rok Mihevc wrote:
> > > > > > >> > Bumping this thread again to see if there is will to call
> for
> > a
> > > > vote
> > > > > > and
> > > > > > >> > move parquet-cpp issues from Apache Jira to Arrow's GitHub
> > issue
> > > > as
> > > > > > was
> > > > > > >> > done for Arrow.
> > > > > > >> > I'm willing to do the move as I already did it for Arrow.
> > > > > > >> >
> > > > > > >> > Rok
> > > > > > >> >
> > > > > > >> > On Sat, Apr 15, 2023 at 4:53 AM Micah Kornfield <
> > > > > > emkornfi...@apache.org>
> > > > > > >> > wrote:
> > > > > > >> >
> > > > > > >> >> Bumping this thread again to see in any Parquet PMC members
> > can
> > > > chime
> > > > > > >> >> in/maybe start a formal vote to move governance of
> > Parquet-CPP
> > > > under
> > > > > > the
> > > > > > >> >> umbrella.
> > > > > > >> >>
> > > > > > >> >> -Micah
> > > > > > >> >>
> > > > > > >> >> On 2023/02/02 10:34:25 Antoine Pitrou wrote:
> > > > > > >> >> >
> > > > > > >> >> >
> > > > > > >> >> > Hi Will,
> > > > > > >> >> >
> > > > > > >> >> > Le 01/02/2023 à 20:27, Will Jones a écrit :
> > > > > > >> >> > >
> > > > > > >> >> > > First, it's not obvious where issues are supposed to be
> > > > open: In
> > > > > > >> >> Parquet
> > > > > > >> >> > > Jira or Arrow GitHub issues. Looking back at some of
> the
> > > > original
> > > > > > >> >> > > discussion, it looks like the intention was
> > > > > > >> >> > >
> > > > > > >> >> > > * use PARQUET-XXX for issues relating to Parquet core
> > > > > > >> >> > >> * use ARROW-XXX for issues relation to Arrow's
> > consumption
> > > > of
> > > > > > >> Parquet
> > > > > > >> >> > >> core (e.g. changes that are in parquet/arrow right
> now)
> > > > > > >> >> > >>
> > > > > > >> >> > > The README for the old parquet-cpp repo [3] states
> > instead
> > > in
> > > > > > it's
> > > > > > >> >> > > migration note:
> > > > > > >> >> > >
> > > > > > >> >> > >   JIRA issues should continue to be opened in the
> PARQUET
> > > > JIRA
> > > > > > >> project.
> > > > > > >> >> > >
> > > > > > >> >> > > Either way, it doesn't seem like this process is
> obvious
> > to
> > > > > > people.
> > > > > > >> >> Perhaps
> > > > > > >> >> > > we could clarify this and add notices to Arrow's GitHub
> > > > issues
> > > > > > >> >> template?
> > > > > > >> >> >
> > > > > > >> >> > I agree we should clarify this. I have no personal
> > > preference,
> > > > but
> > > > > > I
> > > > > > >> >> will note
> > > > > > >> >> > that Github issues decrease friction as having a GH
> account
> > > is
> > > > > > already
> > > > > > >> >> necessary
> > > > > > >> >> > for submitting PRs.
> > > > > > >> >> >
> > > > > > >> >> > > Second, committer status is a little unclear. I am a
> > > > committer on
> > > > > > >> >> Arrow,
> > > > > > >> >> > > but not on Parquet right now. Does that mean I should
> > only
> > > > merge
> > > > > > >> >> Parquet
> > > > > > >> >> > > C++ PRs for code changes in parquet/arrow? Or that I
> > > > shouldn't
> > > > > > merge
> > > > > > >> >> > > Parquet changes at all?
> > > > > > >> >> >
> > > > > > >> >> > Since Parquet C++ is part of Arrow C++, you are allowed
> to
> > > > merge
> > > > > > >> Parquet
> > > > > > >> >> C++
> > > > > > >> >> > changes. As always you should ensure you have sufficient
> > > > > > understanding
> > > > > > >> >> of the
> > > > > > >> >> > contribution, and that it follows established practices:
> > > > > > >> >> >
> > https://arrow.apache.org/docs/dev/developers/reviewing.html
> > > > > > >> >> >
> > > > > > >> >> > > Also, are the contributions to Arrow C++ Parquet being
> > > > actively
> > > > > > >> >> reviewed
> > > > > > >> >> > > for potential new committers?
> > > > > > >> >> >
> > > > > > >> >> > I would certainly do.
> > > > > > >> >> >
> > > > > > >> >> > Regards
> > > > > > >> >> >
> > > > > > >> >> > Antoine.
> > > > > > >> >> >
> > > > > > >> >> >
> > > > > > >> >>
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to