1. I think we should make it easy for people contributing to the C++ codebase. (which is why I voted for the move at the time) 2. If merging repos removes the need to deal with the circular dependency between repos issue for the C++ code bases, it does it at the expense of making it easy to evolve the parquet spec and the java and c++ implementations together. This setup was optimized for quick iterations on the APIs on the C++ side. Now that those APIs are more stable, it is less needed IMO.
parquet-cpp depends only on arrow-core that does not have to depend on parquet-cpp. It really just needs the vectors. Other components like arrow-dataset and pyarrow can depend on parquet-cpp just like they depend on orc externally. I realize that would be work to make it happen, but the current location of the parquet-cpp codebase is a big trade-off of prioritizing quick iteration on the C++ implementations over iteration on the format. As interest grows in evolving the format, I think it warrants a re-evaluation. On Tue, May 14, 2024 at 9:20 AM Antoine Pitrou <anto...@python.org> wrote: > > Moving Parquet C++ out of Arrow C++ would basically recreate the > problems that motivated the integration of Parquet C++ into Arrow C++ > :-) > > Regards > > Antoine. > > > On Tue, 14 May 2024 13:52:15 +0800 > Gang Wu <ust...@gmail.com> wrote: > > IMO, moving parquet-cpp out of arrow is challenging as the dependency > > chain looks like: arrow core <- parquet-cpp <- arrow dataset <- pyarrow > > > > Best, > > Gang > > > > On Tue, May 14, 2024 at 12:38 PM Julien Le Dem < > julien-1odqgaof3lkdnm+yrof...@public.gmane.org> wrote: > > > > > It is great to see more momentum building. > > > I have myself a little bit more time to contribute to Parquet. > > > > > > Personally I think moving it back would make sense. > > > *However* I have personally contributed a lot more to the Java than > the C++ > > > code base. > > > That move was done initially because people contributing to the Arrow > and > > > Parquet C++ code bases were the same ones and circular dependencies > were > > > getting in the way (does Parquet depend on Arrow or the other way > around? > > > At the time it was both ways.). So to make this happen, we need enough > > > Parquet C++ contributors that would be happy with the move and clarify > > > which way the dependency goes. My take is that Parquet depends on > Arrow but > > > I'd be curious to see what others think. > > > Julien > > > > > > On Sat, May 11, 2024 at 2:51 AM Andrew Lamb <andrewlam...@gmail.com> > > > wrote: > > > > > > > It is great to see some additional enthusiasm and momentum around the > > > > Apache Parquet implementation (congratulations on the release of > > > parquet-mr > > > > 1.14[1]!). > > > > > > > > As activity picks up, if the desire is to build more community around > > > > Parquet, perhaps the Parquet PMC wants to encourage moving code back > to > > > > repositories managed by parquet (and out of arrow, for example). I > > > realize > > > > this would be a technical burden, but it might clarify communities > and > > > > committers. > > > > > > > > Andrew > > > > > > > > [1]: > https://lists.apache.org/thread/2gggm938z0x9fx3wtwctfm5htsxlf3z4 > > > > > > > > > > > > > > > > On Fri, May 10, 2024 at 11:45 PM Matt Topol <zotthewiz...@gmail.com> > > > > wrote: > > > > > > > > > I just wanted to also poke the question of non-Java developers who > have > > > > > worked on the other parquet implementations potentially being > > > recognized > > > > as > > > > > committers or otherwise on the Parquet project (speaking as the > primary > > > > > developer of the Go parquet implementation which also lives in > the > > > Arrow > > > > > repository). It would be great to see some active contributors to > > > > > parquet-cpp, parquet-go, or otherwise not just being considered but > > > > > actively becoming committers. > > > > > > > > > > That's just my two cents from a community perspective. > > > > > > > > > > --Matt > > > > > > > > > > On Fri, May 10, 2024, 10:35 PM Jacob Wujciak < > assignu...@apache.org> > > > > > wrote: > > > > > > > > > > > Thank you, that sounds great! On first glance some seem to be > rather > > > > old > > > > > > and probably don't apply anymore. > > > > > > > > > > > > > BTW, do we really need to make a full copy of them to have a > mirror > > > > in > > > > > > the Arrow GitHub issues? > > > > > > > > > > > > I am not sure I understand what you mean? A full copy of the > > > > > > open/closed/all issues? I'd say only the (remaining) open > issues > > > would > > > > be > > > > > > fine. > > > > > > For reference this is what an imported issue looks like: > > > > > > https://github.com/apache/arrow/issues/30543 > > > > > > > > > > > > Am Sa., 11. Mai 2024 um 04:09 Uhr schrieb Gang Wu < > ustcwg-re5jqeeqqe8avxtiumw...@public.gmane.org > > > >: > > > > > > > > > > > > > I can initiate the vote. But before the vote, I think we need > to > > > > > revisit > > > > > > > the states of all unresolved tickets and close some as needed. > > > > > > > > > > > > > > BTW, do we really need to make a full copy of them to have a > mirror > > > > > > > in the Arrow GitHub issues? > > > > > > > > > > > > > > I'd like to seek a consensus here before sending the vote. > > > > > > > > > > > > > > Best, > > > > > > > Gang > > > > > > > > > > > > > > On Sat, May 11, 2024 at 8:46 AM Jacob Wujciak < > > > assignu...@apache.org > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > Hello Everyone! > > > > > > > > > > > > > > > > It seems there is general agreement on this topic, it would > be > > > > great > > > > > > if a > > > > > > > > committer/PMC could start a (lazy consensus) procedural vote. > > > > > > > > > > > > > > > > I will inquire how to handle the parquet-cpp component in > jira > > > > > (ideally > > > > > > > > disabling it, not removing). > > > > > > > > There are currently only ~70 open tickets for parquet-cpp, > with > > > the > > > > > > > change > > > > > > > > in repo it is probably easier to just move open tickets but > I'll > > > > > leave > > > > > > > that > > > > > > > > to Rok who managed the transition of Arrows 20k+ tickets too > :D > > > > > > > > > > > > > > > > Thanks, > > > > > > > > Jacob > > > > > > > > > > > > > > > > Arrow committer > > > > > > > > > > > > > > > > On 2024/04/25 05:31:18 Gang Wu wrote: > > > > > > > > > I know we have some non-Java committers and PMCs. But > after the > > > > > > > > parquet-cpp > > > > > > > > > donation, it seems that no one worked on Parquet from > arrow > > > (cpp, > > > > > > rust, > > > > > > > > go, > > > > > > > > > etc.) > > > > > > > > > and other projects are promoted as a Parquet committer. > It > > > would > > > > be > > > > > > > > > inconvenient > > > > > > > > > for non-Java Parquet developers to work with > > > > apache/parquet-format > > > > > > and > > > > > > > > > apache/parquet-testing repositories. Furthermore, votes > from > > > > these > > > > > > > > > developers > > > > > > > > > are not binding for a format change in the ML. > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > Gang > > > > > > > > > > > > > > > > > > On Wed, Apr 24, 2024 at 8:42 PM Uwe L. Korn < > uw...@xhochy.com> > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > Should we consider > > > > > > > > > > > Parquet developers from other projects than parquet-mr > as > > > > > Parquet > > > > > > > > > > commuters? > > > > > > > > > > > > > > > > > > > > We are doing this (speaking as a Parquet PMC who didn't > work > > > on > > > > > > > > > > parquet-mr, but parquet-cpp). > > > > > > > > > > > > > > > > > > > > Best > > > > > > > > > > Uwe > > > > > > > > > > > > > > > > > > > > On Wed, Apr 24, 2024, at 2:38 PM, Gang Wu wrote: > > > > > > > > > > > +1 for moving parquet-cpp issues from Apache Jira to > > > Arrow's > > > > > > GitHub > > > > > > > > > > issue. > > > > > > > > > > > > > > > > > > > > > > Besides, I want to echo Will's question in the > thread. > > > Should > > > > > we > > > > > > > > consider > > > > > > > > > > > Parquet developers from other projects than parquet-mr > as > > > > > Parquet > > > > > > > > > > commiters? > > > > > > > > > > > Currently apache/parquet-format and > apache/parquet-testing > > > > > > > > repositories > > > > > > > > > > are > > > > > > > > > > > solely governed by Apache Parquet PMC. It would be > better > > > for > > > > > the > > > > > > > > entire > > > > > > > > > > > Parquet community if developers with sufficient > > > contributions > > > > > to > > > > > > > open > > > > > > > > > > source > > > > > > > > > > > Parquet projects (including but not limited to > parquet-cpp, > > > > > > > arrow-rs, > > > > > > > > > > cudf, > > > > > > > > > > > etc.) > > > > > > > > > > > can be considered as Parquet committer and PMC. > > > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > > Gang > > > > > > > > > > > > > > > > > > > > > > On Wed, Apr 24, 2024 at 7:04 PM Uwe L. Korn < > > > > uw...@xhochy.com> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > >> I would be very supportive of this move. The Parquet > C++ > > > > > > > > development has > > > > > > > > > > >> been under the umbrella of the Arrow repository for > more > > > > than > > > > > > > > five(six?) > > > > > > > > > > >> years now. Thus, the issues should also be aligned > with > > > the > > > > > > Arrow > > > > > > > > > > project. > > > > > > > > > > >> > > > > > > > > > > >> Uwe > > > > > > > > > > >> > > > > > > > > > > >> On Tue, Apr 23, 2024, at 8:27 PM, Rok Mihevc wrote: > > > > > > > > > > >> > Bumping this thread again to see if there is will > to > > > call > > > > > for > > > > > > a > > > > > > > > vote > > > > > > > > > > and > > > > > > > > > > >> > move parquet-cpp issues from Apache Jira to > Arrow's > > > GitHub > > > > > > issue > > > > > > > > as > > > > > > > > > > was > > > > > > > > > > >> > done for Arrow. > > > > > > > > > > >> > I'm willing to do the move as I already did it for > > > Arrow. > > > > > > > > > > >> > > > > > > > > > > > >> > Rok > > > > > > > > > > >> > > > > > > > > > > > >> > On Sat, Apr 15, 2023 at 4:53 AM Micah Kornfield < > > > > > > > > > > emkornfi...@apache.org> > > > > > > > > > > >> > wrote: > > > > > > > > > > >> > > > > > > > > > > > >> >> Bumping this thread again to see in any Parquet > PMC > > > > members > > > > > > can > > > > > > > > chime > > > > > > > > > > >> >> in/maybe start a formal vote to move governance > of > > > > > > Parquet-CPP > > > > > > > > under > > > > > > > > > > the > > > > > > > > > > >> >> umbrella. > > > > > > > > > > >> >> > > > > > > > > > > >> >> -Micah > > > > > > > > > > >> >> > > > > > > > > > > >> >> On 2023/02/02 10:34:25 Antoine Pitrou wrote: > > > > > > > > > > >> >> > > > > > > > > > > > >> >> > > > > > > > > > > > >> >> > Hi Will, > > > > > > > > > > >> >> > > > > > > > > > > > >> >> > Le 01/02/2023 à 20:27, Will Jones a écrit : > > > > > > > > > > >> >> > > > > > > > > > > > > >> >> > > First, it's not obvious where issues are > supposed > > > to > > > > be > > > > > > > > open: In > > > > > > > > > > >> >> Parquet > > > > > > > > > > >> >> > > Jira or Arrow GitHub issues. Looking back at > some > > > of > > > > > the > > > > > > > > original > > > > > > > > > > >> >> > > discussion, it looks like the intention was > > > > > > > > > > >> >> > > > > > > > > > > > > >> >> > > * use PARQUET-XXX for issues relating to > Parquet > > > core > > > > > > > > > > >> >> > >> * use ARROW-XXX for issues relation to > Arrow's > > > > > > consumption > > > > > > > > of > > > > > > > > > > >> Parquet > > > > > > > > > > >> >> > >> core (e.g. changes that are in parquet/arrow > right > > > > > now) > > > > > > > > > > >> >> > >> > > > > > > > > > > >> >> > > The README for the old parquet-cpp repo [3] > states > > > > > > instead > > > > > > > in > > > > > > > > > > it's > > > > > > > > > > >> >> > > migration note: > > > > > > > > > > >> >> > > > > > > > > > > > > >> >> > > JIRA issues should continue to be opened in > the > > > > > PARQUET > > > > > > > > JIRA > > > > > > > > > > >> project. > > > > > > > > > > >> >> > > > > > > > > > > > > >> >> > > Either way, it doesn't seem like this process > is > > > > > obvious > > > > > > to > > > > > > > > > > people. > > > > > > > > > > >> >> Perhaps > > > > > > > > > > >> >> > > we could clarify this and add notices to > Arrow's > > > > GitHub > > > > > > > > issues > > > > > > > > > > >> >> template? > > > > > > > > > > >> >> > > > > > > > > > > > >> >> > I agree we should clarify this. I have no > personal > > > > > > > preference, > > > > > > > > but > > > > > > > > > > I > > > > > > > > > > >> >> will note > > > > > > > > > > >> >> > that Github issues decrease friction as having a > GH > > > > > account > > > > > > > is > > > > > > > > > > already > > > > > > > > > > >> >> necessary > > > > > > > > > > >> >> > for submitting PRs. > > > > > > > > > > >> >> > > > > > > > > > > > >> >> > > Second, committer status is a little unclear. > I am > > > a > > > > > > > > committer on > > > > > > > > > > >> >> Arrow, > > > > > > > > > > >> >> > > but not on Parquet right now. Does that mean > I > > > should > > > > > > only > > > > > > > > merge > > > > > > > > > > >> >> Parquet > > > > > > > > > > >> >> > > C++ PRs for code changes in parquet/arrow? Or > that > > > I > > > > > > > > shouldn't > > > > > > > > > > merge > > > > > > > > > > >> >> > > Parquet changes at all? > > > > > > > > > > >> >> > > > > > > > > > > > >> >> > Since Parquet C++ is part of Arrow C++, you are > > > allowed > > > > > to > > > > > > > > merge > > > > > > > > > > >> Parquet > > > > > > > > > > >> >> C++ > > > > > > > > > > >> >> > changes. As always you should ensure you have > > > > sufficient > > > > > > > > > > understanding > > > > > > > > > > >> >> of the > > > > > > > > > > >> >> > contribution, and that it follows established > > > > practices: > > > > > > > > > > >> >> > > > > > > > https://arrow.apache.org/docs/dev/developers/reviewing.html > > > > > > > > > > >> >> > > > > > > > > > > > >> >> > > Also, are the contributions to Arrow C++ > Parquet > > > > being > > > > > > > > actively > > > > > > > > > > >> >> reviewed > > > > > > > > > > >> >> > > for potential new committers? > > > > > > > > > > >> >> > > > > > > > > > > > >> >> > I would certainly do. > > > > > > > > > > >> >> > > > > > > > > > > > >> >> > Regards > > > > > > > > > > >> >> > > > > > > > > > > > >> >> > Antoine. > > > > > > > > > > >> >> > > > > > > > > > > > >> >> > > > > > > > > > > > >> >> > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >