I don't agree with this approach right now. Here are my reasons:

1. The Parquet Python integration will need to depend both on PyArrow
and the Arrow C++ libraries, so these libraries would generally need
to be developed together

2. PyArrow would need to define and maintain a C++ or Cython API so
that the equivalent of the current pyarrow.parquet library can access
C-level data. For example:

https://github.com/apache/arrow/blob/master/python/pyarrow/parquet.pyx#L31

Cython does permit cross-project C API access (we are already doing
cross-module Cython APi access within pyarrow). This adds additional
complexity that I think we should avoid for now.

3. Maintaining a separate C++ build toolchain for a Python package
adds additional maintenance and packaging burden on us

My inclination is to keep the code where it is and make the Parquet
extension optional.

- Wes

On Wed, Sep 21, 2016 at 10:16 AM, Uwe Korn <uw...@xhochy.com> wrote:
> Hello,
>
> as we have moved the Arrow<->Parquet C++ integration into parquet-cpp, we
> still have to decide on how we are going to proceed with the Arrow<->Parquet
> Python integration. For the moment, it seems that the best way to go ahead
> is to pull the pyarrow.parquet module out into a separate Python package.
> From an organisational point, I'm unclear how I should proceed here. Should
> we put this in a separate repo? If so, as part of the Apache organisation?
>
> Uwe

Reply via email to