[ 
https://issues.apache.org/jira/browse/ARROW-2428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16473253#comment-16473253
 ] 

Alex Hagerman commented on ARROW-2428:
--------------------------------------

[~xhochy] I was reading through the meta issue and trying to understand what we 
have to make sure to pass. Do you think this has settled enough to begin work? 
It appears pandas will expect a class defining the type, which I'm guessing the 
objects in the arrow column will be instances of that user type? Do we expect 
arrow columns to meet all the requirements of ExtensionArray?

 

I was specifically looking at this to understand what options have to be passed 
and what the ExtensionArray requires.

https://github.com/pandas-dev/pandas/pull/19174/files#diff-e448fe09dbe8aed468d89a4c90e65cff

> [Python] Support ExtensionArrays in to_pandas conversion
> --------------------------------------------------------
>
>                 Key: ARROW-2428
>                 URL: https://issues.apache.org/jira/browse/ARROW-2428
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python
>            Reporter: Uwe L. Korn
>            Priority: Major
>              Labels: beginner
>             Fix For: 1.0.0
>
>
> With the next release of Pandas, it will be possible to define custom column 
> types that back a {{pandas.Series}}. Thus we will not be able to cover all 
> possible column types in the {{to_pandas}} conversion by default as we won't 
> be aware of all extension arrays.
> To enable users to create {{ExtensionArray}} instances from Arrow columns in 
> the {{to_pandas}} conversion, we should provide a hook in the {{to_pandas}} 
> call where they can overload the default conversion routines with the ones 
> that produce their {{ExtensionArray}} instances.
> This should avoid additional copies in the case where we would nowadays first 
> convert the Arrow column into a default Pandas column (probably of object 
> type) and the user would afterwards convert it to a more efficient 
> {{ExtensionArray}}. This hook here will be especially useful when you build 
> {{ExtensionArrays}} where the storage is backed by Arrow.
> The meta-issue that tracks the implementation inside of Pandas is: 
> https://github.com/pandas-dev/pandas/issues/19696



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to