hi all,

As discussed on the mailing list [1] I am proposing to undertake a
restructuring of the development process for parquet-cpp and its
consumption in the Arrow ecosystem to benefit the developers and users
of both communities.

The specific actions we would take would be:

1) Move the source code currently located at src/ in the
apache/parquet-cpp repository [2] to the cpp/src/ directory located in
apache/arrow [3]

2) The parquet code tree would remain separate from the Arrow code
tree, though the two projects will continue to share code as they do
now

3) The build system in apache/parquet-cpp would be effectively
deprecated and can be mostly discarded, as it is largely redundant and
duplicated from the build system in apache/arrow

4) The Parquet and Arrow C++ communities will collaborate to provide
development workflows to enable contributors working exclusively on
the Parquet core functionality to be able to work unencumbered with
unnecessary build or test dependencies from the rest of the Arrow
codebase. Note that parquet-cpp already builds a significant portion
of Apache Arrow en route to creating its libraries

5) The Parquet community can create scripts to "cut" Parquet C++
releases by packaging up the appropriate components and ensuring that
they can be built and installed independently as now

6) The CI processes would be merged -- since we already build the
Parquet libraries in Arrow's CI workflow, this would amount to
building the Parquet unit tests and running them.

7) Patches contributed that do not involve Arrow-related functionality
could use the PARQUET-XXXX marking, though some ARROW-XXXX patches may
span both codebases

8) Parquet C++ committers can be given push rights on apache/arrow
subject to ongoing good citizenry (e.g. not merging patches that break
builds). The Arrow PMC may need to vote on the procedure for offering
pass-through commit rights to anyone who has been invited to be a
committer for Apache Parquet

9) The contributors who work on both Arrow and Parquet will work in
good faith to ensure that that needs of Parquet-only developers (i.e.
who consume Parquet files in some way unrelated to the Arrow columnar
standard) are accommodated

There are a number of particular details we will need to discuss
further (such as the specific logistics of the codebase surgery; e.g.
how to manage the commit history in apache/parquet-cpp -- do we care
about git blame?)

This vote is to determine if the Parquet PMC is in favor of working in
good faith to execute on the above plan. I will inquire with the Arrow
PMC to see if we need to have a corresponding vote there, and also how
to handle the management of commit rights.

[ ] +1: In favor of implementing the proposed monorepo plan
[ ] +0: . . .
[ ] -1: Not in favor because . . .

Here is my vote: +1.

Thank you,
Wes

[1]: 
https://lists.apache.org/thread.html/4bc135b4e933b959602df48bc3d5978ab7a4299d83d4295da9f498ac@%3Cdev.parquet.apache.org%3E
[2]: https://github.com/apache/parquet-cpp/tree/master/src/parquet
[3]: https://github.com/apache/arrow/tree/master/cpp/src

Reply via email to