Hi all,

I want to propose an interface to allow custom array objects in Python to
define how they should be converted to Arrow arrays (e.g. in
pyarrow.array(..)). I opened
https://issues.apache.org/jira/browse/ARROW-5271 for this.
This would be similar to the numpy __array__ protocol (so we could eg call
it __arrow_array__).
Feedback / discussion very welcome!

I am coming to this discussion specifically from the point of view of
pandas ExtensionArrays (github issue for this:
https://github.com/pandas-dev/pandas/issues/20612/#issuecomment-489649556).
Such a protocol would, for example, make it possible that pandas users can
save DataFrames with ExtensionArrays (eg the nullable integers) to parquet,
without the need for pyarrow to know about all those possible different
extension arrays. This would also be useful for projects extending pandas
such as GeoPandas <https://github.com/geopandas/geopandas> and Fletcher
<https://github.com/xhochy/fletcher>.
But I suppose it could also be of interest more in general of other
array-like / pandas-like projects that want to interface with arrow.

Sidenote: for the pandas case, I want to look a the full roundtrip, so also
the conversion back from an arrow Table to DataFrame. For that aspect there
is https://issues.apache.org/jira/browse/ARROW-2428, but this is much more
specific to pandas and its ExtensionArrays.

Regards,
Joris

Reply via email to