Moving Parquet C++ out of Arrow C++ would basically recreate the
problems that motivated the integration of Parquet C++ into Arrow C++
:-)

Regards

Antoine.


On Tue, 14 May 2024 13:52:15 +0800
Gang Wu <[email protected]> wrote:
> IMO, moving parquet-cpp out of arrow is challenging as the dependency
> chain looks like: arrow core <- parquet-cpp <- arrow dataset <- pyarrow
> 
> Best,
> Gang
> 
> On Tue, May 14, 2024 at 12:38 PM Julien Le Dem 
> <[email protected]> wrote:
> 
> > It is great to see more momentum building.
> > I have myself a little bit more time to contribute to Parquet.
> >
> > Personally I think moving it back would make sense.
> > *However* I have personally contributed a lot more to the Java than the C++
> > code base.
> > That move was done initially because people contributing to the Arrow and
> > Parquet C++ code bases were the same ones and circular dependencies were
> > getting in the way (does Parquet depend on Arrow or the other way around?
> > At the time it was both ways.). So to make this happen, we need enough
> > Parquet C++ contributors that would be happy with the move and clarify
> > which way the dependency goes. My take is that Parquet depends on Arrow but
> > I'd be curious to see what others think.
> > Julien
> >
> > On Sat, May 11, 2024 at 2:51 AM Andrew Lamb <[email protected]>
> > wrote:
> >  
> > > It is great to see some additional enthusiasm and momentum around the
> > > Apache Parquet implementation (congratulations on the release of  
> > parquet-mr  
> > > 1.14[1]!).
> > >
> > > As activity picks up, if the desire is to build more community around
> > > Parquet, perhaps the Parquet PMC wants to encourage moving code back to
> > > repositories managed by parquet (and out of arrow, for example). I  
> > realize  
> > > this would be a technical burden, but it might clarify communities and
> > > committers.
> > >
> > > Andrew
> > >
> > > [1]: https://lists.apache.org/thread/2gggm938z0x9fx3wtwctfm5htsxlf3z4
> > >
> > >
> > >
> > > On Fri, May 10, 2024 at 11:45 PM Matt Topol <[email protected]>
> > > wrote:
> > >  
> > > > I just wanted to also poke the question of non-Java developers who have
> > > > worked on the other parquet implementations potentially being  
> > recognized  
> > > as  
> > > > committers or otherwise on the Parquet project (speaking as the primary
> > > > developer of the Go parquet implementation which also lives in the  
> > Arrow  
> > > > repository). It would be great to see some active contributors to
> > > > parquet-cpp, parquet-go, or otherwise not just being considered but
> > > > actively becoming committers.
> > > >
> > > > That's just my two cents from a community perspective.
> > > >
> > > > --Matt
> > > >
> > > > On Fri, May 10, 2024, 10:35 PM Jacob Wujciak <[email protected]>
> > > > wrote:
> > > >  
> > > > > Thank you, that sounds great! On first glance some seem to be rather  
> > > old  
> > > > > and probably don't apply anymore.
> > > > >  
> > > > > > BTW, do we really need to make a full copy of them to have a mirror 
> > > > > >  
> > > in  
> > > > > the Arrow GitHub issues?
> > > > >
> > > > > I am not sure I understand what you mean? A full copy of the
> > > > > open/closed/all issues? I'd say only the (remaining) open issues  
> > would  
> > > be  
> > > > > fine.
> > > > > For reference this is what an imported issue looks like:
> > > > > https://github.com/apache/arrow/issues/30543
> > > > >
> > > > > Am Sa., 11. Mai 2024 um 04:09 Uhr schrieb Gang Wu 
> > > > > <[email protected]  
> > >:  
> > > > >  
> > > > > > I can initiate the vote. But before the vote, I think we need to  
> > > > revisit  
> > > > > > the states of all unresolved tickets and close some as needed.
> > > > > >
> > > > > > BTW, do we really need to make a full copy of them to have a mirror
> > > > > > in the Arrow GitHub issues?
> > > > > >
> > > > > > I'd like to seek a consensus here before sending the vote.
> > > > > >
> > > > > > Best,
> > > > > > Gang
> > > > > >
> > > > > > On Sat, May 11, 2024 at 8:46 AM Jacob Wujciak <  
> > [email protected]  
> > > >  
> > > > > > wrote:
> > > > > >  
> > > > > > > Hello Everyone!
> > > > > > >
> > > > > > > It seems there is general agreement on this topic, it would be  
> > > great  
> > > > > if a  
> > > > > > > committer/PMC could start a (lazy consensus) procedural vote.
> > > > > > >
> > > > > > > I will inquire how to handle the parquet-cpp component in jira  
> > > > (ideally  
> > > > > > > disabling it, not removing).
> > > > > > > There are currently only ~70 open tickets for parquet-cpp, with  
> > the  
> > > > > > change  
> > > > > > > in repo it is probably easier to just move open tickets but I'll  
> > > > leave  
> > > > > > that  
> > > > > > > to Rok who managed the transition of Arrows 20k+ tickets too :D
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Jacob
> > > > > > >
> > > > > > > Arrow committer
> > > > > > >
> > > > > > > On 2024/04/25 05:31:18 Gang Wu wrote:  
> > > > > > > > I know we have some non-Java committers and PMCs. But after the 
> > > > > > > >  
> > > > > > > parquet-cpp  
> > > > > > > > donation, it seems that no one worked on Parquet from arrow  
> > (cpp,  
> > > > > rust,  
> > > > > > > go,  
> > > > > > > > etc.)
> > > > > > > > and other projects are promoted as a Parquet committer. It  
> > would  
> > > be  
> > > > > > > > inconvenient
> > > > > > > > for non-Java Parquet developers to work with  
> > > apache/parquet-format  
> > > > > and  
> > > > > > > > apache/parquet-testing repositories. Furthermore, votes from  
> > > these  
> > > > > > > > developers
> > > > > > > > are not binding for a format change in the ML.
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Gang
> > > > > > > >
> > > > > > > > On Wed, Apr 24, 2024 at 8:42 PM Uwe L. Korn <[email protected]>  
> > > > > wrote:  
> > > > > > > >  
> > > > > > > > > > Should we consider
> > > > > > > > > > Parquet developers from other projects than parquet-mr as  
> > > > Parquet  
> > > > > > > > > commuters?
> > > > > > > > >
> > > > > > > > > We are doing this (speaking as a Parquet PMC who didn't work  
> > on  
> > > > > > > > > parquet-mr, but parquet-cpp).
> > > > > > > > >
> > > > > > > > > Best
> > > > > > > > > Uwe
> > > > > > > > >
> > > > > > > > > On Wed, Apr 24, 2024, at 2:38 PM, Gang Wu wrote:  
> > > > > > > > > > +1 for moving parquet-cpp issues from Apache Jira to  
> > Arrow's  
> > > > > GitHub  
> > > > > > > > > issue.  
> > > > > > > > > >
> > > > > > > > > > Besides, I want to echo Will's question in the thread.  
> > Should  
> > > > we  
> > > > > > > consider  
> > > > > > > > > > Parquet developers from other projects than parquet-mr as  
> > > > Parquet  
> > > > > > > > > commiters?  
> > > > > > > > > > Currently apache/parquet-format and apache/parquet-testing  
> > > > > > > repositories  
> > > > > > > > > are  
> > > > > > > > > > solely governed by Apache Parquet PMC. It would be better  
> > for  
> > > > the  
> > > > > > > entire  
> > > > > > > > > > Parquet community if developers with sufficient  
> > contributions  
> > > > to  
> > > > > > open  
> > > > > > > > > source  
> > > > > > > > > > Parquet projects (including but not limited to parquet-cpp, 
> > > > > > > > > >  
> > > > > > arrow-rs,  
> > > > > > > > > cudf,  
> > > > > > > > > > etc.)
> > > > > > > > > > can be considered as Parquet committer and PMC.
> > > > > > > > > >
> > > > > > > > > > Best,
> > > > > > > > > > Gang
> > > > > > > > > >
> > > > > > > > > > On Wed, Apr 24, 2024 at 7:04 PM Uwe L. Korn <  
> > > [email protected]>  
> > > > > > > wrote:  
> > > > > > > > > >  
> > > > > > > > > >> I would be very supportive of this move. The Parquet C++  
> > > > > > > development has  
> > > > > > > > > >> been under the umbrella of the Arrow repository for more  
> > > than  
> > > > > > > five(six?)  
> > > > > > > > > >> years now. Thus, the issues should also be aligned with  
> > the  
> > > > > Arrow  
> > > > > > > > > project.  
> > > > > > > > > >>
> > > > > > > > > >> Uwe
> > > > > > > > > >>
> > > > > > > > > >> On Tue, Apr 23, 2024, at 8:27 PM, Rok Mihevc wrote:  
> > > > > > > > > >> > Bumping this thread again to see if there is will to  
> > call  
> > > > for  
> > > > > a  
> > > > > > > vote  
> > > > > > > > > and  
> > > > > > > > > >> > move parquet-cpp issues from Apache Jira to Arrow's  
> > GitHub  
> > > > > issue  
> > > > > > > as  
> > > > > > > > > was  
> > > > > > > > > >> > done for Arrow.
> > > > > > > > > >> > I'm willing to do the move as I already did it for  
> > Arrow.  
> > > > > > > > > >> >
> > > > > > > > > >> > Rok
> > > > > > > > > >> >
> > > > > > > > > >> > On Sat, Apr 15, 2023 at 4:53 AM Micah Kornfield <  
> > > > > > > > > [email protected]>  
> > > > > > > > > >> > wrote:
> > > > > > > > > >> >  
> > > > > > > > > >> >> Bumping this thread again to see in any Parquet PMC  
> > > members  
> > > > > can  
> > > > > > > chime  
> > > > > > > > > >> >> in/maybe start a formal vote to move governance of  
> > > > > Parquet-CPP  
> > > > > > > under  
> > > > > > > > > the  
> > > > > > > > > >> >> umbrella.
> > > > > > > > > >> >>
> > > > > > > > > >> >> -Micah
> > > > > > > > > >> >>
> > > > > > > > > >> >> On 2023/02/02 10:34:25 Antoine Pitrou wrote:  
> > > > > > > > > >> >> >
> > > > > > > > > >> >> >
> > > > > > > > > >> >> > Hi Will,
> > > > > > > > > >> >> >
> > > > > > > > > >> >> > Le 01/02/2023 à 20:27, Will Jones a écrit :  
> > > > > > > > > >> >> > >
> > > > > > > > > >> >> > > First, it's not obvious where issues are supposed  
> > to  
> > > be  
> > > > > > > open: In  
> > > > > > > > > >> >> Parquet  
> > > > > > > > > >> >> > > Jira or Arrow GitHub issues. Looking back at some  
> > of  
> > > > the  
> > > > > > > original  
> > > > > > > > > >> >> > > discussion, it looks like the intention was
> > > > > > > > > >> >> > >
> > > > > > > > > >> >> > > * use PARQUET-XXX for issues relating to Parquet  
> > core  
> > > > > > > > > >> >> > >> * use ARROW-XXX for issues relation to Arrow's  
> > > > > consumption  
> > > > > > > of  
> > > > > > > > > >> Parquet  
> > > > > > > > > >> >> > >> core (e.g. changes that are in parquet/arrow right 
> > > > > > > > > >> >> > >>  
> > > > now)  
> > > > > > > > > >> >> > >>  
> > > > > > > > > >> >> > > The README for the old parquet-cpp repo [3] states  
> > > > > instead  
> > > > > > in  
> > > > > > > > > it's  
> > > > > > > > > >> >> > > migration note:
> > > > > > > > > >> >> > >
> > > > > > > > > >> >> > >   JIRA issues should continue to be opened in the  
> > > > PARQUET  
> > > > > > > JIRA  
> > > > > > > > > >> project.  
> > > > > > > > > >> >> > >
> > > > > > > > > >> >> > > Either way, it doesn't seem like this process is  
> > > > obvious  
> > > > > to  
> > > > > > > > > people.  
> > > > > > > > > >> >> Perhaps  
> > > > > > > > > >> >> > > we could clarify this and add notices to Arrow's  
> > > GitHub  
> > > > > > > issues  
> > > > > > > > > >> >> template?  
> > > > > > > > > >> >> >
> > > > > > > > > >> >> > I agree we should clarify this. I have no personal  
> > > > > > preference,  
> > > > > > > but  
> > > > > > > > > I  
> > > > > > > > > >> >> will note  
> > > > > > > > > >> >> > that Github issues decrease friction as having a GH  
> > > > account  
> > > > > > is  
> > > > > > > > > already  
> > > > > > > > > >> >> necessary  
> > > > > > > > > >> >> > for submitting PRs.
> > > > > > > > > >> >> >  
> > > > > > > > > >> >> > > Second, committer status is a little unclear. I am  
> > a  
> > > > > > > committer on  
> > > > > > > > > >> >> Arrow,  
> > > > > > > > > >> >> > > but not on Parquet right now. Does that mean I  
> > should  
> > > > > only  
> > > > > > > merge  
> > > > > > > > > >> >> Parquet  
> > > > > > > > > >> >> > > C++ PRs for code changes in parquet/arrow? Or that  
> > I  
> > > > > > > shouldn't  
> > > > > > > > > merge  
> > > > > > > > > >> >> > > Parquet changes at all?  
> > > > > > > > > >> >> >
> > > > > > > > > >> >> > Since Parquet C++ is part of Arrow C++, you are  
> > allowed  
> > > > to  
> > > > > > > merge  
> > > > > > > > > >> Parquet  
> > > > > > > > > >> >> C++  
> > > > > > > > > >> >> > changes. As always you should ensure you have  
> > > sufficient  
> > > > > > > > > understanding  
> > > > > > > > > >> >> of the  
> > > > > > > > > >> >> > contribution, and that it follows established  
> > > practices:  
> > > > > > > > > >> >> >  
> > > > > https://arrow.apache.org/docs/dev/developers/reviewing.html  
> > > > > > > > > >> >> >  
> > > > > > > > > >> >> > > Also, are the contributions to Arrow C++ Parquet  
> > > being  
> > > > > > > actively  
> > > > > > > > > >> >> reviewed  
> > > > > > > > > >> >> > > for potential new committers?  
> > > > > > > > > >> >> >
> > > > > > > > > >> >> > I would certainly do.
> > > > > > > > > >> >> >
> > > > > > > > > >> >> > Regards
> > > > > > > > > >> >> >
> > > > > > > > > >> >> > Antoine.
> > > > > > > > > >> >> >
> > > > > > > > > >> >> >  
> > > > > > > > > >> >>  
> > > > > > > > > >>  
> > > > > > > > >  
> > > > > > > >  
> > > > > > >  
> > > > > >  
> > > > >  
> > > >  
> > >  
> >  
> 



Reply via email to