Joris Van den Bossche created ARROW-7066:
--------------------------------------------
Summary: [Python] support returning ChunkedArray from
__arrow_array__ ?
Key: ARROW-7066
URL: https://issues.apache.org/jira/browse/ARROW-7066
Project: Apache Arrow
Issue Type: Improvement
Components: Python
Reporter: Joris Van den Bossche
Fix For: 1.0.0
The {{\_\_arrow_array\_\_}} protocol was added so that custom objects can
define how they should be converted to a pyarrow Array (similar to numpy's
{{\_\_array\_\_}}). This is then also used to support converting pandas
DataFrames with columns using pandas' ExtensionArrays to a pyarrow Table (if
the pandas ExtensionArray, such as nullable integer type, implements this
{{\_\_arrow_array\_\_}} method).
This last use case could also be useful for fletcher
(https://github.com/xhochy/fletcher/, a package that implements pandas
ExtensionArrays that wrap pyarrow arrays, so they can be stored as is in a
pandas DataFrame).
However, fletcher stores ChunkedArrays in ExtensionArry / the columns of a
pandas DataFrame (to have a better mapping with a Table, where the columns also
consist of chunked arrays). While we currently require that the return value of
{{\_\_arrow_array\_\_}} is a pyarrow.Array.
So I was wondering: could we relax this constraint and also allow ChunkedArray
as return value?
However, this protocol is currently called in the {{pa.array(..)}} function,
which probably should keep returning an Array (and not ChunkedArray in certain
cases).
cc [~uwe]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)