1. I think we should make it easy for people contributing to the C++
codebase. (which is why I voted for the move at the time)
2. If merging repos removes the need to deal with the circular dependency
between repos issue for the C++ code bases, it does it at the expense of
making it easy to evolve the parquet spec and the java and c++
implementations together.
This setup was optimized for quick iterations on the APIs on the C++ side.
Now that those APIs are more stable, it is less needed IMO.

parquet-cpp depends only on arrow-core that does not have to depend on
parquet-cpp. It really just needs the vectors. Other components like
arrow-dataset and pyarrow can depend on parquet-cpp just like they depend
on orc externally.

I realize that would be work to make it happen, but the current location of
the parquet-cpp codebase is a big trade-off of prioritizing quick iteration
on the C++ implementations over iteration on the format. As interest grows
in evolving the format, I think it warrants a re-evaluation.



On Tue, May 14, 2024 at 9:20 AM Antoine Pitrou <anto...@python.org> wrote:

>
> Moving Parquet C++ out of Arrow C++ would basically recreate the
> problems that motivated the integration of Parquet C++ into Arrow C++
> :-)
>
> Regards
>
> Antoine.
>
>
> On Tue, 14 May 2024 13:52:15 +0800
> Gang Wu <ust...@gmail.com> wrote:
> > IMO, moving parquet-cpp out of arrow is challenging as the dependency
> > chain looks like: arrow core <- parquet-cpp <- arrow dataset <- pyarrow
> >
> > Best,
> > Gang
> >
> > On Tue, May 14, 2024 at 12:38 PM Julien Le Dem <
> julien-1odqgaof3lkdnm+yrof...@public.gmane.org> wrote:
> >
> > > It is great to see more momentum building.
> > > I have myself a little bit more time to contribute to Parquet.
> > >
> > > Personally I think moving it back would make sense.
> > > *However* I have personally contributed a lot more to the Java than
> the C++
> > > code base.
> > > That move was done initially because people contributing to the Arrow
> and
> > > Parquet C++ code bases were the same ones and circular dependencies
> were
> > > getting in the way (does Parquet depend on Arrow or the other way
> around?
> > > At the time it was both ways.). So to make this happen, we need enough
> > > Parquet C++ contributors that would be happy with the move and clarify
> > > which way the dependency goes. My take is that Parquet depends on
> Arrow but
> > > I'd be curious to see what others think.
> > > Julien
> > >
> > > On Sat, May 11, 2024 at 2:51 AM Andrew Lamb <andrewlam...@gmail.com>
> > > wrote:
> > >
> > > > It is great to see some additional enthusiasm and momentum around the
> > > > Apache Parquet implementation (congratulations on the release of
> > > parquet-mr
> > > > 1.14[1]!).
> > > >
> > > > As activity picks up, if the desire is to build more community around
> > > > Parquet, perhaps the Parquet PMC wants to encourage moving code back
> to
> > > > repositories managed by parquet (and out of arrow, for example). I
> > > realize
> > > > this would be a technical burden, but it might clarify communities
> and
> > > > committers.
> > > >
> > > > Andrew
> > > >
> > > > [1]:
> https://lists.apache.org/thread/2gggm938z0x9fx3wtwctfm5htsxlf3z4
> > > >
> > > >
> > > >
> > > > On Fri, May 10, 2024 at 11:45 PM Matt Topol <zotthewiz...@gmail.com>
> > > > wrote:
> > > >
> > > > > I just wanted to also poke the question of non-Java developers who
> have
> > > > > worked on the other parquet implementations potentially being
> > > recognized
> > > > as
> > > > > committers or otherwise on the Parquet project (speaking as the
> primary
> > > > > developer of the Go parquet implementation which also lives in
> the
> > > Arrow
> > > > > repository). It would be great to see some active contributors to
> > > > > parquet-cpp, parquet-go, or otherwise not just being considered but
> > > > > actively becoming committers.
> > > > >
> > > > > That's just my two cents from a community perspective.
> > > > >
> > > > > --Matt
> > > > >
> > > > > On Fri, May 10, 2024, 10:35 PM Jacob Wujciak <
> assignu...@apache.org>
> > > > > wrote:
> > > > >
> > > > > > Thank you, that sounds great! On first glance some seem to be
> rather
> > > > old
> > > > > > and probably don't apply anymore.
> > > > > >
> > > > > > > BTW, do we really need to make a full copy of them to have a
> mirror
> > > > in
> > > > > > the Arrow GitHub issues?
> > > > > >
> > > > > > I am not sure I understand what you mean? A full copy of the
> > > > > > open/closed/all issues? I'd say only the (remaining) open
> issues
> > > would
> > > > be
> > > > > > fine.
> > > > > > For reference this is what an imported issue looks like:
> > > > > > https://github.com/apache/arrow/issues/30543
> > > > > >
> > > > > > Am Sa., 11. Mai 2024 um 04:09 Uhr schrieb Gang Wu <
> ustcwg-re5jqeeqqe8avxtiumw...@public.gmane.org
> > > >:
> > > > > >
> > > > > > > I can initiate the vote. But before the vote, I think we need
> to
> > > > > revisit
> > > > > > > the states of all unresolved tickets and close some as needed.
> > > > > > >
> > > > > > > BTW, do we really need to make a full copy of them to have a
> mirror
> > > > > > > in the Arrow GitHub issues?
> > > > > > >
> > > > > > > I'd like to seek a consensus here before sending the vote.
> > > > > > >
> > > > > > > Best,
> > > > > > > Gang
> > > > > > >
> > > > > > > On Sat, May 11, 2024 at 8:46 AM Jacob Wujciak <
> > > assignu...@apache.org
> > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hello Everyone!
> > > > > > > >
> > > > > > > > It seems there is general agreement on this topic, it would
> be
> > > > great
> > > > > > if a
> > > > > > > > committer/PMC could start a (lazy consensus) procedural vote.
> > > > > > > >
> > > > > > > > I will inquire how to handle the parquet-cpp component in
> jira
> > > > > (ideally
> > > > > > > > disabling it, not removing).
> > > > > > > > There are currently only ~70 open tickets for parquet-cpp,
> with
> > > the
> > > > > > > change
> > > > > > > > in repo it is probably easier to just move open tickets but
> I'll
> > > > > leave
> > > > > > > that
> > > > > > > > to Rok who managed the transition of Arrows 20k+ tickets too
> :D
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Jacob
> > > > > > > >
> > > > > > > > Arrow committer
> > > > > > > >
> > > > > > > > On 2024/04/25 05:31:18 Gang Wu wrote:
> > > > > > > > > I know we have some non-Java committers and PMCs. But
> after the
> > > > > > > > parquet-cpp
> > > > > > > > > donation, it seems that no one worked on Parquet from
> arrow
> > > (cpp,
> > > > > > rust,
> > > > > > > > go,
> > > > > > > > > etc.)
> > > > > > > > > and other projects are promoted as a Parquet committer.
> It
> > > would
> > > > be
> > > > > > > > > inconvenient
> > > > > > > > > for non-Java Parquet developers to work with
> > > > apache/parquet-format
> > > > > > and
> > > > > > > > > apache/parquet-testing repositories. Furthermore, votes
> from
> > > > these
> > > > > > > > > developers
> > > > > > > > > are not binding for a format change in the ML.
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Gang
> > > > > > > > >
> > > > > > > > > On Wed, Apr 24, 2024 at 8:42 PM Uwe L. Korn <
> uw...@xhochy.com>
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > > Should we consider
> > > > > > > > > > > Parquet developers from other projects than parquet-mr
> as
> > > > > Parquet
> > > > > > > > > > commuters?
> > > > > > > > > >
> > > > > > > > > > We are doing this (speaking as a Parquet PMC who didn't
> work
> > > on
> > > > > > > > > > parquet-mr, but parquet-cpp).
> > > > > > > > > >
> > > > > > > > > > Best
> > > > > > > > > > Uwe
> > > > > > > > > >
> > > > > > > > > > On Wed, Apr 24, 2024, at 2:38 PM, Gang Wu wrote:
> > > > > > > > > > > +1 for moving parquet-cpp issues from Apache Jira to
> > > Arrow's
> > > > > > GitHub
> > > > > > > > > > issue.
> > > > > > > > > > >
> > > > > > > > > > > Besides, I want to echo Will's question in the
> thread.
> > > Should
> > > > > we
> > > > > > > > consider
> > > > > > > > > > > Parquet developers from other projects than parquet-mr
> as
> > > > > Parquet
> > > > > > > > > > commiters?
> > > > > > > > > > > Currently apache/parquet-format and
> apache/parquet-testing
> > > > > > > > repositories
> > > > > > > > > > are
> > > > > > > > > > > solely governed by Apache Parquet PMC. It would be
> better
> > > for
> > > > > the
> > > > > > > > entire
> > > > > > > > > > > Parquet community if developers with sufficient
> > > contributions
> > > > > to
> > > > > > > open
> > > > > > > > > > source
> > > > > > > > > > > Parquet projects (including but not limited to
> parquet-cpp,
> > > > > > > arrow-rs,
> > > > > > > > > > cudf,
> > > > > > > > > > > etc.)
> > > > > > > > > > > can be considered as Parquet committer and PMC.
> > > > > > > > > > >
> > > > > > > > > > > Best,
> > > > > > > > > > > Gang
> > > > > > > > > > >
> > > > > > > > > > > On Wed, Apr 24, 2024 at 7:04 PM Uwe L. Korn <
> > > > uw...@xhochy.com>
> > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > >> I would be very supportive of this move. The Parquet
> C++
> > > > > > > > development has
> > > > > > > > > > >> been under the umbrella of the Arrow repository for
> more
> > > > than
> > > > > > > > five(six?)
> > > > > > > > > > >> years now. Thus, the issues should also be aligned
> with
> > > the
> > > > > > Arrow
> > > > > > > > > > project.
> > > > > > > > > > >>
> > > > > > > > > > >> Uwe
> > > > > > > > > > >>
> > > > > > > > > > >> On Tue, Apr 23, 2024, at 8:27 PM, Rok Mihevc wrote:
> > > > > > > > > > >> > Bumping this thread again to see if there is will
> to
> > > call
> > > > > for
> > > > > > a
> > > > > > > > vote
> > > > > > > > > > and
> > > > > > > > > > >> > move parquet-cpp issues from Apache Jira to
> Arrow's
> > > GitHub
> > > > > > issue
> > > > > > > > as
> > > > > > > > > > was
> > > > > > > > > > >> > done for Arrow.
> > > > > > > > > > >> > I'm willing to do the move as I already did it for
> > > Arrow.
> > > > > > > > > > >> >
> > > > > > > > > > >> > Rok
> > > > > > > > > > >> >
> > > > > > > > > > >> > On Sat, Apr 15, 2023 at 4:53 AM Micah Kornfield <
> > > > > > > > > > emkornfi...@apache.org>
> > > > > > > > > > >> > wrote:
> > > > > > > > > > >> >
> > > > > > > > > > >> >> Bumping this thread again to see in any Parquet
> PMC
> > > > members
> > > > > > can
> > > > > > > > chime
> > > > > > > > > > >> >> in/maybe start a formal vote to move governance
> of
> > > > > > Parquet-CPP
> > > > > > > > under
> > > > > > > > > > the
> > > > > > > > > > >> >> umbrella.
> > > > > > > > > > >> >>
> > > > > > > > > > >> >> -Micah
> > > > > > > > > > >> >>
> > > > > > > > > > >> >> On 2023/02/02 10:34:25 Antoine Pitrou wrote:
> > > > > > > > > > >> >> >
> > > > > > > > > > >> >> >
> > > > > > > > > > >> >> > Hi Will,
> > > > > > > > > > >> >> >
> > > > > > > > > > >> >> > Le 01/02/2023 à 20:27, Will Jones a écrit :
> > > > > > > > > > >> >> > >
> > > > > > > > > > >> >> > > First, it's not obvious where issues are
> supposed
> > > to
> > > > be
> > > > > > > > open: In
> > > > > > > > > > >> >> Parquet
> > > > > > > > > > >> >> > > Jira or Arrow GitHub issues. Looking back at
> some
> > > of
> > > > > the
> > > > > > > > original
> > > > > > > > > > >> >> > > discussion, it looks like the intention was
> > > > > > > > > > >> >> > >
> > > > > > > > > > >> >> > > * use PARQUET-XXX for issues relating to
> Parquet
> > > core
> > > > > > > > > > >> >> > >> * use ARROW-XXX for issues relation to
> Arrow's
> > > > > > consumption
> > > > > > > > of
> > > > > > > > > > >> Parquet
> > > > > > > > > > >> >> > >> core (e.g. changes that are in parquet/arrow
> right
> > > > > now)
> > > > > > > > > > >> >> > >>
> > > > > > > > > > >> >> > > The README for the old parquet-cpp repo [3]
> states
> > > > > > instead
> > > > > > > in
> > > > > > > > > > it's
> > > > > > > > > > >> >> > > migration note:
> > > > > > > > > > >> >> > >
> > > > > > > > > > >> >> > >   JIRA issues should continue to be opened in
> the
> > > > > PARQUET
> > > > > > > > JIRA
> > > > > > > > > > >> project.
> > > > > > > > > > >> >> > >
> > > > > > > > > > >> >> > > Either way, it doesn't seem like this process
> is
> > > > > obvious
> > > > > > to
> > > > > > > > > > people.
> > > > > > > > > > >> >> Perhaps
> > > > > > > > > > >> >> > > we could clarify this and add notices to
> Arrow's
> > > > GitHub
> > > > > > > > issues
> > > > > > > > > > >> >> template?
> > > > > > > > > > >> >> >
> > > > > > > > > > >> >> > I agree we should clarify this. I have no
> personal
> > > > > > > preference,
> > > > > > > > but
> > > > > > > > > > I
> > > > > > > > > > >> >> will note
> > > > > > > > > > >> >> > that Github issues decrease friction as having a
> GH
> > > > > account
> > > > > > > is
> > > > > > > > > > already
> > > > > > > > > > >> >> necessary
> > > > > > > > > > >> >> > for submitting PRs.
> > > > > > > > > > >> >> >
> > > > > > > > > > >> >> > > Second, committer status is a little unclear.
> I am
> > > a
> > > > > > > > committer on
> > > > > > > > > > >> >> Arrow,
> > > > > > > > > > >> >> > > but not on Parquet right now. Does that mean
> I
> > > should
> > > > > > only
> > > > > > > > merge
> > > > > > > > > > >> >> Parquet
> > > > > > > > > > >> >> > > C++ PRs for code changes in parquet/arrow? Or
> that
> > > I
> > > > > > > > shouldn't
> > > > > > > > > > merge
> > > > > > > > > > >> >> > > Parquet changes at all?
> > > > > > > > > > >> >> >
> > > > > > > > > > >> >> > Since Parquet C++ is part of Arrow C++, you are
> > > allowed
> > > > > to
> > > > > > > > merge
> > > > > > > > > > >> Parquet
> > > > > > > > > > >> >> C++
> > > > > > > > > > >> >> > changes. As always you should ensure you have
> > > > sufficient
> > > > > > > > > > understanding
> > > > > > > > > > >> >> of the
> > > > > > > > > > >> >> > contribution, and that it follows established
> > > > practices:
> > > > > > > > > > >> >> >
> > > > > > https://arrow.apache.org/docs/dev/developers/reviewing.html
> > > > > > > > > > >> >> >
> > > > > > > > > > >> >> > > Also, are the contributions to Arrow C++
> Parquet
> > > > being
> > > > > > > > actively
> > > > > > > > > > >> >> reviewed
> > > > > > > > > > >> >> > > for potential new committers?
> > > > > > > > > > >> >> >
> > > > > > > > > > >> >> > I would certainly do.
> > > > > > > > > > >> >> >
> > > > > > > > > > >> >> > Regards
> > > > > > > > > > >> >> >
> > > > > > > > > > >> >> > Antoine.
> > > > > > > > > > >> >> >
> > > > > > > > > > >> >> >
> > > > > > > > > > >> >>
> > > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
>
>
>

Reply via email to