It is great to see more momentum building.
I have myself a little bit more time to contribute to Parquet.

Personally I think moving it back would make sense.
*However* I have personally contributed a lot more to the Java than the C++
code base.
That move was done initially because people contributing to the Arrow and
Parquet C++ code bases were the same ones and circular dependencies were
getting in the way (does Parquet depend on Arrow or the other way around?
At the time it was both ways.). So to make this happen, we need enough
Parquet C++ contributors that would be happy with the move and clarify
which way the dependency goes. My take is that Parquet depends on Arrow but
I'd be curious to see what others think.
Julien

On Sat, May 11, 2024 at 2:51 AM Andrew Lamb <andrewlam...@gmail.com> wrote:

> It is great to see some additional enthusiasm and momentum around the
> Apache Parquet implementation (congratulations on the release of parquet-mr
> 1.14[1]!).
>
> As activity picks up, if the desire is to build more community around
> Parquet, perhaps the Parquet PMC wants to encourage moving code back to
> repositories managed by parquet (and out of arrow, for example). I realize
> this would be a technical burden, but it might clarify communities and
> committers.
>
> Andrew
>
> [1]: https://lists.apache.org/thread/2gggm938z0x9fx3wtwctfm5htsxlf3z4
>
>
>
> On Fri, May 10, 2024 at 11:45 PM Matt Topol <zotthewiz...@gmail.com>
> wrote:
>
> > I just wanted to also poke the question of non-Java developers who have
> > worked on the other parquet implementations potentially being recognized
> as
> > committers or otherwise on the Parquet project (speaking as the primary
> > developer of the Go parquet implementation which also lives in the Arrow
> > repository). It would be great to see some active contributors to
> > parquet-cpp, parquet-go, or otherwise not just being considered but
> > actively becoming committers.
> >
> > That's just my two cents from a community perspective.
> >
> > --Matt
> >
> > On Fri, May 10, 2024, 10:35 PM Jacob Wujciak <assignu...@apache.org>
> > wrote:
> >
> > > Thank you, that sounds great! On first glance some seem to be rather
> old
> > > and probably don't apply anymore.
> > >
> > > > BTW, do we really need to make a full copy of them to have a mirror
> in
> > > the Arrow GitHub issues?
> > >
> > > I am not sure I understand what you mean? A full copy of the
> > > open/closed/all issues? I'd say only the (remaining) open issues would
> be
> > > fine.
> > > For reference this is what an imported issue looks like:
> > > https://github.com/apache/arrow/issues/30543
> > >
> > > Am Sa., 11. Mai 2024 um 04:09 Uhr schrieb Gang Wu <ust...@gmail.com>:
> > >
> > > > I can initiate the vote. But before the vote, I think we need to
> > revisit
> > > > the states of all unresolved tickets and close some as needed.
> > > >
> > > > BTW, do we really need to make a full copy of them to have a mirror
> > > > in the Arrow GitHub issues?
> > > >
> > > > I'd like to seek a consensus here before sending the vote.
> > > >
> > > > Best,
> > > > Gang
> > > >
> > > > On Sat, May 11, 2024 at 8:46 AM Jacob Wujciak <assignu...@apache.org
> >
> > > > wrote:
> > > >
> > > > > Hello Everyone!
> > > > >
> > > > > It seems there is general agreement on this topic, it would be
> great
> > > if a
> > > > > committer/PMC could start a (lazy consensus) procedural vote.
> > > > >
> > > > > I will inquire how to handle the parquet-cpp component in jira
> > (ideally
> > > > > disabling it, not removing).
> > > > > There are currently only ~70 open tickets for parquet-cpp, with the
> > > > change
> > > > > in repo it is probably easier to just move open tickets but I'll
> > leave
> > > > that
> > > > > to Rok who managed the transition of Arrows 20k+ tickets too :D
> > > > >
> > > > > Thanks,
> > > > > Jacob
> > > > >
> > > > > Arrow committer
> > > > >
> > > > > On 2024/04/25 05:31:18 Gang Wu wrote:
> > > > > > I know we have some non-Java committers and PMCs. But after the
> > > > > parquet-cpp
> > > > > > donation, it seems that no one worked on Parquet from arrow (cpp,
> > > rust,
> > > > > go,
> > > > > > etc.)
> > > > > > and other projects are promoted as a Parquet committer. It would
> be
> > > > > > inconvenient
> > > > > > for non-Java Parquet developers to work with
> apache/parquet-format
> > > and
> > > > > > apache/parquet-testing repositories. Furthermore, votes from
> these
> > > > > > developers
> > > > > > are not binding for a format change in the ML.
> > > > > >
> > > > > > Best,
> > > > > > Gang
> > > > > >
> > > > > > On Wed, Apr 24, 2024 at 8:42 PM Uwe L. Korn <uw...@xhochy.com>
> > > wrote:
> > > > > >
> > > > > > > > Should we consider
> > > > > > > > Parquet developers from other projects than parquet-mr as
> > Parquet
> > > > > > > commuters?
> > > > > > >
> > > > > > > We are doing this (speaking as a Parquet PMC who didn't work on
> > > > > > > parquet-mr, but parquet-cpp).
> > > > > > >
> > > > > > > Best
> > > > > > > Uwe
> > > > > > >
> > > > > > > On Wed, Apr 24, 2024, at 2:38 PM, Gang Wu wrote:
> > > > > > > > +1 for moving parquet-cpp issues from Apache Jira to Arrow's
> > > GitHub
> > > > > > > issue.
> > > > > > > >
> > > > > > > > Besides, I want to echo Will's question in the thread. Should
> > we
> > > > > consider
> > > > > > > > Parquet developers from other projects than parquet-mr as
> > Parquet
> > > > > > > commiters?
> > > > > > > > Currently apache/parquet-format and apache/parquet-testing
> > > > > repositories
> > > > > > > are
> > > > > > > > solely governed by Apache Parquet PMC. It would be better for
> > the
> > > > > entire
> > > > > > > > Parquet community if developers with sufficient contributions
> > to
> > > > open
> > > > > > > source
> > > > > > > > Parquet projects (including but not limited to parquet-cpp,
> > > > arrow-rs,
> > > > > > > cudf,
> > > > > > > > etc.)
> > > > > > > > can be considered as Parquet committer and PMC.
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Gang
> > > > > > > >
> > > > > > > > On Wed, Apr 24, 2024 at 7:04 PM Uwe L. Korn <
> uw...@xhochy.com>
> > > > > wrote:
> > > > > > > >
> > > > > > > >> I would be very supportive of this move. The Parquet C++
> > > > > development has
> > > > > > > >> been under the umbrella of the Arrow repository for more
> than
> > > > > five(six?)
> > > > > > > >> years now. Thus, the issues should also be aligned with the
> > > Arrow
> > > > > > > project.
> > > > > > > >>
> > > > > > > >> Uwe
> > > > > > > >>
> > > > > > > >> On Tue, Apr 23, 2024, at 8:27 PM, Rok Mihevc wrote:
> > > > > > > >> > Bumping this thread again to see if there is will to call
> > for
> > > a
> > > > > vote
> > > > > > > and
> > > > > > > >> > move parquet-cpp issues from Apache Jira to Arrow's GitHub
> > > issue
> > > > > as
> > > > > > > was
> > > > > > > >> > done for Arrow.
> > > > > > > >> > I'm willing to do the move as I already did it for Arrow.
> > > > > > > >> >
> > > > > > > >> > Rok
> > > > > > > >> >
> > > > > > > >> > On Sat, Apr 15, 2023 at 4:53 AM Micah Kornfield <
> > > > > > > emkornfi...@apache.org>
> > > > > > > >> > wrote:
> > > > > > > >> >
> > > > > > > >> >> Bumping this thread again to see in any Parquet PMC
> members
> > > can
> > > > > chime
> > > > > > > >> >> in/maybe start a formal vote to move governance of
> > > Parquet-CPP
> > > > > under
> > > > > > > the
> > > > > > > >> >> umbrella.
> > > > > > > >> >>
> > > > > > > >> >> -Micah
> > > > > > > >> >>
> > > > > > > >> >> On 2023/02/02 10:34:25 Antoine Pitrou wrote:
> > > > > > > >> >> >
> > > > > > > >> >> >
> > > > > > > >> >> > Hi Will,
> > > > > > > >> >> >
> > > > > > > >> >> > Le 01/02/2023 à 20:27, Will Jones a écrit :
> > > > > > > >> >> > >
> > > > > > > >> >> > > First, it's not obvious where issues are supposed to
> be
> > > > > open: In
> > > > > > > >> >> Parquet
> > > > > > > >> >> > > Jira or Arrow GitHub issues. Looking back at some of
> > the
> > > > > original
> > > > > > > >> >> > > discussion, it looks like the intention was
> > > > > > > >> >> > >
> > > > > > > >> >> > > * use PARQUET-XXX for issues relating to Parquet core
> > > > > > > >> >> > >> * use ARROW-XXX for issues relation to Arrow's
> > > consumption
> > > > > of
> > > > > > > >> Parquet
> > > > > > > >> >> > >> core (e.g. changes that are in parquet/arrow right
> > now)
> > > > > > > >> >> > >>
> > > > > > > >> >> > > The README for the old parquet-cpp repo [3] states
> > > instead
> > > > in
> > > > > > > it's
> > > > > > > >> >> > > migration note:
> > > > > > > >> >> > >
> > > > > > > >> >> > >   JIRA issues should continue to be opened in the
> > PARQUET
> > > > > JIRA
> > > > > > > >> project.
> > > > > > > >> >> > >
> > > > > > > >> >> > > Either way, it doesn't seem like this process is
> > obvious
> > > to
> > > > > > > people.
> > > > > > > >> >> Perhaps
> > > > > > > >> >> > > we could clarify this and add notices to Arrow's
> GitHub
> > > > > issues
> > > > > > > >> >> template?
> > > > > > > >> >> >
> > > > > > > >> >> > I agree we should clarify this. I have no personal
> > > > preference,
> > > > > but
> > > > > > > I
> > > > > > > >> >> will note
> > > > > > > >> >> > that Github issues decrease friction as having a GH
> > account
> > > > is
> > > > > > > already
> > > > > > > >> >> necessary
> > > > > > > >> >> > for submitting PRs.
> > > > > > > >> >> >
> > > > > > > >> >> > > Second, committer status is a little unclear. I am a
> > > > > committer on
> > > > > > > >> >> Arrow,
> > > > > > > >> >> > > but not on Parquet right now. Does that mean I should
> > > only
> > > > > merge
> > > > > > > >> >> Parquet
> > > > > > > >> >> > > C++ PRs for code changes in parquet/arrow? Or that I
> > > > > shouldn't
> > > > > > > merge
> > > > > > > >> >> > > Parquet changes at all?
> > > > > > > >> >> >
> > > > > > > >> >> > Since Parquet C++ is part of Arrow C++, you are allowed
> > to
> > > > > merge
> > > > > > > >> Parquet
> > > > > > > >> >> C++
> > > > > > > >> >> > changes. As always you should ensure you have
> sufficient
> > > > > > > understanding
> > > > > > > >> >> of the
> > > > > > > >> >> > contribution, and that it follows established
> practices:
> > > > > > > >> >> >
> > > https://arrow.apache.org/docs/dev/developers/reviewing.html
> > > > > > > >> >> >
> > > > > > > >> >> > > Also, are the contributions to Arrow C++ Parquet
> being
> > > > > actively
> > > > > > > >> >> reviewed
> > > > > > > >> >> > > for potential new committers?
> > > > > > > >> >> >
> > > > > > > >> >> > I would certainly do.
> > > > > > > >> >> >
> > > > > > > >> >> > Regards
> > > > > > > >> >> >
> > > > > > > >> >> > Antoine.
> > > > > > > >> >> >
> > > > > > > >> >> >
> > > > > > > >> >>
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to