Hi all, I want to propose an interface to allow custom array objects in Python to define how they should be converted to Arrow arrays (e.g. in pyarrow.array(..)). I opened https://issues.apache.org/jira/browse/ARROW-5271 for this. This would be similar to the numpy __array__ protocol (so we could eg call it __arrow_array__). Feedback / discussion very welcome!
I am coming to this discussion specifically from the point of view of pandas ExtensionArrays (github issue for this: https://github.com/pandas-dev/pandas/issues/20612/#issuecomment-489649556). Such a protocol would, for example, make it possible that pandas users can save DataFrames with ExtensionArrays (eg the nullable integers) to parquet, without the need for pyarrow to know about all those possible different extension arrays. This would also be useful for projects extending pandas such as GeoPandas <https://github.com/geopandas/geopandas> and Fletcher <https://github.com/xhochy/fletcher>. But I suppose it could also be of interest more in general of other array-like / pandas-like projects that want to interface with arrow. Sidenote: for the pandas case, I want to look a the full roundtrip, so also the conversion back from an arrow Table to DataFrame. For that aspect there is https://issues.apache.org/jira/browse/ARROW-2428, but this is much more specific to pandas and its ExtensionArrays. Regards, Joris