+1 on this but also see my comments in the mail on the discussions. We should also keep the git history of parquet-cpp, that should not be hard with git and there is probably a StackOverflow answer out there that gives you the commands to do the merge.
Uwe On Fri, Aug 17, 2018, at 12:57 AM, Wes McKinney wrote: > In case any are interested: my estimate of the work involved in the > migration to be about a full day of total work, possibly less. As soon > as the migration plan is decided upon I intend to execute ASAP so that > ongoing development efforts are not disrupted. > > Additionally, in flight patches do not all need to be merged. Patches > can be easily edited to apply against the modified repository > structure > > On Wed, Aug 15, 2018 at 6:04 PM, Wes McKinney <wesmck...@gmail.com> wrote: > > hi all, > > > > As discussed on the mailing list [1] I am proposing to undertake a > > restructuring of the development process for parquet-cpp and its > > consumption in the Arrow ecosystem to benefit the developers and users > > of both communities. > > > > The specific actions we would take would be: > > > > 1) Move the source code currently located at src/ in the > > apache/parquet-cpp repository [2] to the cpp/src/ directory located in > > apache/arrow [3] > > > > 2) The parquet code tree would remain separate from the Arrow code > > tree, though the two projects will continue to share code as they do > > now > > > > 3) The build system in apache/parquet-cpp would be effectively > > deprecated and can be mostly discarded, as it is largely redundant and > > duplicated from the build system in apache/arrow > > > > 4) The Parquet and Arrow C++ communities will collaborate to provide > > development workflows to enable contributors working exclusively on > > the Parquet core functionality to be able to work unencumbered with > > unnecessary build or test dependencies from the rest of the Arrow > > codebase. Note that parquet-cpp already builds a significant portion > > of Apache Arrow en route to creating its libraries > > > > 5) The Parquet community can create scripts to "cut" Parquet C++ > > releases by packaging up the appropriate components and ensuring that > > they can be built and installed independently as now > > > > 6) The CI processes would be merged -- since we already build the > > Parquet libraries in Arrow's CI workflow, this would amount to > > building the Parquet unit tests and running them. > > > > 7) Patches contributed that do not involve Arrow-related functionality > > could use the PARQUET-XXXX marking, though some ARROW-XXXX patches may > > span both codebases > > > > 8) Parquet C++ committers can be given push rights on apache/arrow > > subject to ongoing good citizenry (e.g. not merging patches that break > > builds). The Arrow PMC may need to vote on the procedure for offering > > pass-through commit rights to anyone who has been invited to be a > > committer for Apache Parquet > > > > 9) The contributors who work on both Arrow and Parquet will work in > > good faith to ensure that that needs of Parquet-only developers (i.e. > > who consume Parquet files in some way unrelated to the Arrow columnar > > standard) are accommodated > > > > There are a number of particular details we will need to discuss > > further (such as the specific logistics of the codebase surgery; e.g. > > how to manage the commit history in apache/parquet-cpp -- do we care > > about git blame?) > > > > This vote is to determine if the Parquet PMC is in favor of working in > > good faith to execute on the above plan. I will inquire with the Arrow > > PMC to see if we need to have a corresponding vote there, and also how > > to handle the management of commit rights. > > > > [ ] +1: In favor of implementing the proposed monorepo plan > > [ ] +0: . . . > > [ ] -1: Not in favor because . . . > > > > Here is my vote: +1. > > > > Thank you, > > Wes > > > > [1]: > > https://lists.apache.org/thread.html/4bc135b4e933b959602df48bc3d5978ab7a4299d83d4295da9f498ac@%3Cdev.parquet.apache.org%3E > > [2]: https://github.com/apache/parquet-cpp/tree/master/src/parquet > > [3]: https://github.com/apache/arrow/tree/master/cpp/src