hi all, As discussed on the mailing list [1] I am proposing to undertake a restructuring of the development process for parquet-cpp and its consumption in the Arrow ecosystem to benefit the developers and users of both communities.
The specific actions we would take would be: 1) Move the source code currently located at src/ in the apache/parquet-cpp repository [2] to the cpp/src/ directory located in apache/arrow [3] 2) The parquet code tree would remain separate from the Arrow code tree, though the two projects will continue to share code as they do now 3) The build system in apache/parquet-cpp would be effectively deprecated and can be mostly discarded, as it is largely redundant and duplicated from the build system in apache/arrow 4) The Parquet and Arrow C++ communities will collaborate to provide development workflows to enable contributors working exclusively on the Parquet core functionality to be able to work unencumbered with unnecessary build or test dependencies from the rest of the Arrow codebase. Note that parquet-cpp already builds a significant portion of Apache Arrow en route to creating its libraries 5) The Parquet community can create scripts to "cut" Parquet C++ releases by packaging up the appropriate components and ensuring that they can be built and installed independently as now 6) The CI processes would be merged -- since we already build the Parquet libraries in Arrow's CI workflow, this would amount to building the Parquet unit tests and running them. 7) Patches contributed that do not involve Arrow-related functionality could use the PARQUET-XXXX marking, though some ARROW-XXXX patches may span both codebases 8) Parquet C++ committers can be given push rights on apache/arrow subject to ongoing good citizenry (e.g. not merging patches that break builds). The Arrow PMC may need to vote on the procedure for offering pass-through commit rights to anyone who has been invited to be a committer for Apache Parquet 9) The contributors who work on both Arrow and Parquet will work in good faith to ensure that that needs of Parquet-only developers (i.e. who consume Parquet files in some way unrelated to the Arrow columnar standard) are accommodated There are a number of particular details we will need to discuss further (such as the specific logistics of the codebase surgery; e.g. how to manage the commit history in apache/parquet-cpp -- do we care about git blame?) This vote is to determine if the Parquet PMC is in favor of working in good faith to execute on the above plan. I will inquire with the Arrow PMC to see if we need to have a corresponding vote there, and also how to handle the management of commit rights. [ ] +1: In favor of implementing the proposed monorepo plan [ ] +0: . . . [ ] -1: Not in favor because . . . Here is my vote: +1. Thank you, Wes [1]: https://lists.apache.org/thread.html/4bc135b4e933b959602df48bc3d5978ab7a4299d83d4295da9f498ac@%3Cdev.parquet.apache.org%3E [2]: https://github.com/apache/parquet-cpp/tree/master/src/parquet [3]: https://github.com/apache/arrow/tree/master/cpp/src