Re: [C++] Parquet and Arrow overlap

Antoine Pitrou Fri, 17 May 2024 01:36:31 -0700


Hi Julien,

On Thu, 16 May 2024 18:23:33 -0700
Julien Le Dem <jul...@apache.org> wrote:
> 
> As discussed, that code was moved in the arrow repo for convenience:
> https://lists.apache.org/thread/gkvbm6yyly1r4cg3f6xtnqkjz6ogn6o2
> 
> To take an excerpt of that original decision:
> 
> 4) The Parquet and Arrow C++ communities will collaborate to provide
> development workflows to enable contributors working exclusively on the
> Parquet core functionality to be able to work unencumbered with unnecessary
> build or test dependencies from the rest of the Arrow codebase. Note that
> parquet-cpp already builds a significant portion of Apache Arrow en route
> to creating its libraries 5) The Parquet community can create scripts to
> "cut" Parquet C++ releases by packaging up the appropriate components and
> ensuring that they can be built and installed independently as now

Unfortunately, these two points haven't happened at all. On the
contrary, the Arrow C++ dependency has infused much deeper in Parquet
C++ (I was not there at the beginning of Parquet C++, but I get the
impression there was originally an effort to have a Arrow-independent
Parquet C++ core; that "core" doesn't exist anymore).

Note that this doesn't mean that Parquet C++ forces you to read Parquet
files as Arrow-formatted data (*). It's just that Parquet C++ uses a
large number of assorted utilities that live in the Arrow C++ codebase.

(*) though I would argue that it's better to do so, as it's probably
more efficient, especially for BYTE_ARRAY data

> The alternative is to live up to the part where we agreed that the two
> communities collaborate on making it easy for the Parquet community to
> govern its code base in the arrow repo.
> Would you agree?

Yep. I don't think there has been any problem in that regard, TBH. It's
just that the situation is difficult to understand for people.

Regards

Antoine.

Re: [C++] Parquet and Arrow overlap

Reply via email to