+1 I think this sounds like a reasonable solution to the problem, and one that is supported by the people that will do the work.
I'd appreciate some clarification on this: > The Parquet community can create scripts to "cut" Parquet C++ releases I think this should be a requirement of considering the integration of the two code-bases "successful". This wording makes it sound optional if anyone in the Parquet community cares to do it, which I don't think was the intent. rb On Sun, Aug 19, 2018 at 10:37 AM Wes McKinney <wesmck...@gmail.com> wrote: > OK. I'm a bit -0 on doing anything that results in Arrow having a > nonlinear git history (and rebasing is not really an option) but we > can discuss that more later > > On Sun, Aug 19, 2018 at 8:50 AM, Uwe L. Korn <uw...@xhochy.com> wrote: > > +1 on this but also see my comments in the mail on the discussions. > > > > We should also keep the git history of parquet-cpp, that should not be > hard with git and there is probably a StackOverflow answer out there that > gives you the commands to do the merge. > > > > Uwe > > > > On Fri, Aug 17, 2018, at 12:57 AM, Wes McKinney wrote: > >> In case any are interested: my estimate of the work involved in the > >> migration to be about a full day of total work, possibly less. As soon > >> as the migration plan is decided upon I intend to execute ASAP so that > >> ongoing development efforts are not disrupted. > >> > >> Additionally, in flight patches do not all need to be merged. Patches > >> can be easily edited to apply against the modified repository > >> structure > >> > >> On Wed, Aug 15, 2018 at 6:04 PM, Wes McKinney <wesmck...@gmail.com> > wrote: > >> > hi all, > >> > > >> > As discussed on the mailing list [1] I am proposing to undertake a > >> > restructuring of the development process for parquet-cpp and its > >> > consumption in the Arrow ecosystem to benefit the developers and users > >> > of both communities. > >> > > >> > The specific actions we would take would be: > >> > > >> > 1) Move the source code currently located at src/ in the > >> > apache/parquet-cpp repository [2] to the cpp/src/ directory located in > >> > apache/arrow [3] > >> > > >> > 2) The parquet code tree would remain separate from the Arrow code > >> > tree, though the two projects will continue to share code as they do > >> > now > >> > > >> > 3) The build system in apache/parquet-cpp would be effectively > >> > deprecated and can be mostly discarded, as it is largely redundant and > >> > duplicated from the build system in apache/arrow > >> > > >> > 4) The Parquet and Arrow C++ communities will collaborate to provide > >> > development workflows to enable contributors working exclusively on > >> > the Parquet core functionality to be able to work unencumbered with > >> > unnecessary build or test dependencies from the rest of the Arrow > >> > codebase. Note that parquet-cpp already builds a significant portion > >> > of Apache Arrow en route to creating its libraries > >> > > >> > 5) The Parquet community can create scripts to "cut" Parquet C++ > >> > releases by packaging up the appropriate components and ensuring that > >> > they can be built and installed independently as now > >> > > >> > 6) The CI processes would be merged -- since we already build the > >> > Parquet libraries in Arrow's CI workflow, this would amount to > >> > building the Parquet unit tests and running them. > >> > > >> > 7) Patches contributed that do not involve Arrow-related functionality > >> > could use the PARQUET-XXXX marking, though some ARROW-XXXX patches may > >> > span both codebases > >> > > >> > 8) Parquet C++ committers can be given push rights on apache/arrow > >> > subject to ongoing good citizenry (e.g. not merging patches that break > >> > builds). The Arrow PMC may need to vote on the procedure for offering > >> > pass-through commit rights to anyone who has been invited to be a > >> > committer for Apache Parquet > >> > > >> > 9) The contributors who work on both Arrow and Parquet will work in > >> > good faith to ensure that that needs of Parquet-only developers (i.e. > >> > who consume Parquet files in some way unrelated to the Arrow columnar > >> > standard) are accommodated > >> > > >> > There are a number of particular details we will need to discuss > >> > further (such as the specific logistics of the codebase surgery; e.g. > >> > how to manage the commit history in apache/parquet-cpp -- do we care > >> > about git blame?) > >> > > >> > This vote is to determine if the Parquet PMC is in favor of working in > >> > good faith to execute on the above plan. I will inquire with the Arrow > >> > PMC to see if we need to have a corresponding vote there, and also how > >> > to handle the management of commit rights. > >> > > >> > [ ] +1: In favor of implementing the proposed monorepo plan > >> > [ ] +0: . . . > >> > [ ] -1: Not in favor because . . . > >> > > >> > Here is my vote: +1. > >> > > >> > Thank you, > >> > Wes > >> > > >> > [1]: > https://lists.apache.org/thread.html/4bc135b4e933b959602df48bc3d5978ab7a4299d83d4295da9f498ac@%3Cdev.parquet.apache.org%3E > >> > [2]: https://github.com/apache/parquet-cpp/tree/master/src/parquet > >> > [3]: https://github.com/apache/arrow/tree/master/cpp/src > -- Ryan Blue Software Engineer Netflix