Hi Julien,
On Thu, 16 May 2024 18:23:33 -0700 Julien Le Dem <jul...@apache.org> wrote: > > As discussed, that code was moved in the arrow repo for convenience: > https://lists.apache.org/thread/gkvbm6yyly1r4cg3f6xtnqkjz6ogn6o2 > > To take an excerpt of that original decision: > > 4) The Parquet and Arrow C++ communities will collaborate to provide > development workflows to enable contributors working exclusively on the > Parquet core functionality to be able to work unencumbered with unnecessary > build or test dependencies from the rest of the Arrow codebase. Note that > parquet-cpp already builds a significant portion of Apache Arrow en route > to creating its libraries 5) The Parquet community can create scripts to > "cut" Parquet C++ releases by packaging up the appropriate components and > ensuring that they can be built and installed independently as now Unfortunately, these two points haven't happened at all. On the contrary, the Arrow C++ dependency has infused much deeper in Parquet C++ (I was not there at the beginning of Parquet C++, but I get the impression there was originally an effort to have a Arrow-independent Parquet C++ core; that "core" doesn't exist anymore). Note that this doesn't mean that Parquet C++ forces you to read Parquet files as Arrow-formatted data (*). It's just that Parquet C++ uses a large number of assorted utilities that live in the Arrow C++ codebase. (*) though I would argue that it's better to do so, as it's probably more efficient, especially for BYTE_ARRAY data > The alternative is to live up to the part where we agreed that the two > communities collaborate on making it easy for the Parquet community to > govern its code base in the arrow repo. > Would you agree? Yep. I don't think there has been any problem in that regard, TBH. It's just that the situation is difficult to understand for people. Regards Antoine.