[ 
https://issues.apache.org/jira/browse/ARROW-8010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17055833#comment-17055833
 ] 

Joris Van den Bossche commented on ARROW-8010:
----------------------------------------------

[~balancap] Thanks for the report! 

I think for pandas, the natural conversion is to an object series of arrays (as 
we already support for the variable sized ListArray). 
For numpy, although it could be represented as a 2D array, this would be 
surprising since pyarrow arrays are 1D arrays, so also here we should probably 
convert to the less efficient object array of numpy arrays.

Closing as a duplicate of ARROW-7365, so let's continue discussion there.

> [Python] Fixed size list not convertible to Numpy Array / pandas Series
> -----------------------------------------------------------------------
>
>                 Key: ARROW-8010
>                 URL: https://issues.apache.org/jira/browse/ARROW-8010
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python
>    Affects Versions: 0.16.0
>         Environment: Ubuntu 19.10 + python 3.7
>            Reporter: Paul Balanca
>            Priority: Major
>
> Fixed size list of base types (i.e. int, float, ...) are not convertible to 
> Numpy array.
> The following code:
> {code:java}
> import pyarrow as pa
> t = pa.list_(pa.float32(), 2)
> arr = pa.array([[1, 2], [3, 4], [5, 6]], type=t)
> arr.to_numpy(){code}
> raises a not implemented Arrow error as there is no Pandas block equivalent.
> It sounds reasonable that the conversion to Pandas fails, but I would expect 
> a natural conversion to Numpy Array, as according to the Fixed Size List 
> Layout ([https://arrow.apache.org/docs/format/Columnar.html#]), the former 
> could be mapped to a 2-dimensional row major matrix (e.g. 3x2 in the previous 
> example).
> Note we can get the expected result by working around using flatten:
> {code:java}
> arr.flatten().to_numpy().reshape((-1, t.list_size)){code}
> This form of memory representation is quite natural if ones wants to use 
> Apache Arrow for in-memory collection of 2D/3D points, where we wish to have 
> coordinates contiguous in memory.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to