Re: [C++] Parquet and Arrow overlap

Julien Le Dem Fri, 17 May 2024 08:33:56 -0700

If we deem that it would be too hard to move it back for the moment, we
need at a minimum to clarify and reduce the confusion.
If practice doesn't match what the PMC voted on, we need to improve the
practice.
Do we have suggestions on improving that?
perhaps OWNERSFILE in the parquet folder in the arrow repo? (just an idea)


On Fri, May 17, 2024 at 2:49 AM Uwe L. Korn <uw...@xhochy.com> wrote:

>
>
> On Fri, May 17, 2024, at 10:36 AM, Antoine Pitrou wrote:
> > Hi Julien,
> >
> > On Thu, 16 May 2024 18:23:33 -0700
> > Julien Le Dem <jul...@apache.org> wrote:
> >>
> >> As discussed, that code was moved in the arrow repo for convenience:
> >> https://lists.apache.org/thread/gkvbm6yyly1r4cg3f6xtnqkjz6ogn6o2
> >>
> >> To take an excerpt of that original decision:
> >>
> >> 4) The Parquet and Arrow C++ communities will collaborate to provide
> >> development workflows to enable contributors working exclusively on the
> >> Parquet core functionality to be able to work unencumbered with
> unnecessary
> >> build or test dependencies from the rest of the Arrow codebase. Note
> that
> >> parquet-cpp already builds a significant portion of Apache Arrow en
> route
> >> to creating its libraries 5) The Parquet community can create scripts to
> >> "cut" Parquet C++ releases by packaging up the appropriate components
> and
> >> ensuring that they can be built and installed independently as now
> >
> > Unfortunately, these two points haven't happened at all. On the
> > contrary, the Arrow C++ dependency has infused much deeper in Parquet
> > C++ (I was not there at the beginning of Parquet C++, but I get the
> > impression there was originally an effort to have a Arrow-independent
> > Parquet C++ core; that "core" doesn't exist anymore).
>
> As an example, we had in the beginning separate I/O primitives in Arrow
> and Parquet. But during the further development, we realised that we were
> implementing exactly the same code paths only in different namespaces.
>
> There are some core "utilities" hidden in Arrow that are required to build
> any modern C++based data processing library. If you would separate that
> into its own repository would enable parquet-cpp to be separated more
> easily. But given that the development around this is still very active in
> Arrow, it would bring a massive slowdown to the overall project.
>
> Best
> Uwe
>

Re: [C++] Parquet and Arrow overlap

Reply via email to