I support this. In the past I had to effectively do this manually to build Arrow/PyArrow in a monorepo (to build for multiple Python versions simultaneously without having conflicting copies of Arrow for each Python version). From what I remember, there's some usage of Arrow-internal headers that need to be replaced, but fortunately they were all very simple to replace.
Though in my personal experience, it wasn't often that I needed to touch src/arrow/python. -David On Mon, Aug 16, 2021, at 11:08, Alessandro Molina wrote: > PyArrow is currently full Cython codebase, but in reality it relies on some > classes and functions that are implemented in C++ within the src/python > directory ( https://github.com/apache/arrow/tree/master/cpp/src/arrow/python > ). Especially for numpy/pandas conversion code that has to interface with > Numpy arrays data at low level. > > When working in the area of PyArrow it's not uncommon that you end up > jumping back and forth between the Arrow C++ codebase for Python and > PyArrow and you can also end up with, sometimes hard to catch, integration > issues if you forgot to recompile libarrow even if you are working on a > Python only change. > > I'm wondering if it wouldn't make life easier for contributors if the > src/arrow/python directory was moved into pyarrow and we made PyArrow able > to build it. > > That would probably reduce risk of integration issues as rebuilding pyarrow > alone would probably be enough for most python specific changes (as it > would also rebuild the Python specific C++). > > I think that moving src/arrow/python into pyarrow would also make the > codebase more cohesive which would lower the barrier for new contributors > looking for how to fix a pyarrow specific issue. > > Unless there is any major side effect (outside of having to build a bit > more complex build scripts for pyarrow, but it's already CMake based, so > building some C++ shouldn't be a big deal) that I'm missing, it seems that > the benefits of having all Python related code into a single place would > surpass the side effects. > > Also I'm not sure how widespread it is the requirement of Python from C++, > but it seems to me that if we moved all Python specific code into pyarrow > we could make libarrow decoupled from Python. Which might make it easier to > deal with Virtualenvs or debug versions of python as you wouldn't have to > deal with Python3_EXECUTABLE etc when building libarrow. > > Any thoughts? >