I support this. In the past I had to effectively do this manually to build 
Arrow/PyArrow in a monorepo (to build for multiple Python versions 
simultaneously without having conflicting copies of Arrow for each Python 
version). From what I remember, there's some usage of Arrow-internal headers 
that need to be replaced, but fortunately they were all very simple to replace.

Though in my personal experience, it wasn't often that I needed to touch 
src/arrow/python.

-David

On Mon, Aug 16, 2021, at 11:08, Alessandro Molina wrote:
> PyArrow is currently full Cython codebase, but in reality it relies on some
> classes and functions that are implemented in C++ within the src/python
> directory ( https://github.com/apache/arrow/tree/master/cpp/src/arrow/python
> ). Especially for numpy/pandas conversion code that has to interface with
> Numpy arrays data at low level.
> 
> When working in the area of PyArrow it's not uncommon that you end up
> jumping back and forth between the Arrow C++ codebase for Python and
> PyArrow and you can also end up with, sometimes hard to catch, integration
> issues if you forgot to recompile libarrow even if you are working on a
> Python only change.
> 
> I'm wondering if it wouldn't make life easier for contributors if the
> src/arrow/python directory was moved into pyarrow and we made PyArrow able
> to build it.
> 
> That would probably reduce risk of integration issues as rebuilding pyarrow
> alone would probably be enough for most python specific changes (as it
> would also rebuild the Python specific C++).
> 
> I think that moving src/arrow/python into pyarrow would also make the
> codebase more cohesive which would lower the barrier for new contributors
> looking for how to fix a pyarrow specific issue.
> 
> Unless there is any major side effect (outside of having to build a bit
> more complex build scripts for pyarrow, but it's already CMake based, so
> building some C++ shouldn't be a big deal) that I'm missing, it seems that
> the benefits of having all Python related code into a single place would
> surpass the side effects.
> 
> Also I'm not sure how widespread it is the requirement of Python from C++,
> but it seems to me that if we moved all Python specific code into pyarrow
> we could make libarrow decoupled from Python. Which might make it easier to
> deal with Virtualenvs or debug versions of python as you wouldn't have to
> deal with Python3_EXECUTABLE etc when building libarrow.
> 
> Any thoughts?
> 

Reply via email to