[ 
https://issues.apache.org/jira/browse/ARROW-15767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van den Bossche updated ARROW-15767:
------------------------------------------
    Summary: [Python] Arrow Table with Nullable DenseUnion fails to convert to 
Python Pandas DataFrame  (was: Arrow Table with Nullable DenseUnion fails to 
convert to Python Pandas DataFrame)

> [Python] Arrow Table with Nullable DenseUnion fails to convert to Python 
> Pandas DataFrame
> -----------------------------------------------------------------------------------------
>
>                 Key: ARROW-15767
>                 URL: https://issues.apache.org/jira/browse/ARROW-15767
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 6.0.1
>            Reporter: Ben Baumgold
>            Priority: Major
>         Attachments: nothing.arrow
>
>
> A feather file containing column of nullable values errors when converting to 
> a Pandas DataFrame. It can be read into a pyarrow.Table as follows:
> {code:python}
> In [1]: import pyarrow.feather as feather
> In [2]: t = feather.read_table("nothing.arrow")
> In [3]: t
> Out[3]:
> pyarrow.Table
> col: dense_union<: null=0, : int32 not null=1>
>   child 0, : null
>   child 1, : int32 not null
> ----
> col: [  -- is_valid: all not null  -- type_ids:     [
>       1,
>       1,
>       1,
>       0
>     ]  -- value_offsets:     [
>       0,
>       1,
>       2,
>       0
>     ]  -- child 0 type: null
> 1 nulls  -- child 1 type: int32
>     [
>       1,
>       2,
>       3
>     ]]
> {code}
> But when trying to convert the pyarrow.Table into a Pandas DataFrame, I get 
> the following error:
> {code:python}
> In [4]: t.to_pandas()
> ---------------------------------------------------------------------------
> ArrowNotImplementedError                  Traceback (most recent call last)
> <ipython-input-25-8ba84762c39a> in <module>
> ----> 1 t.to_pandas()
> ~/miniconda3/lib/python3.9/site-packages/pyarrow/array.pxi in 
> pyarrow.lib._PandasConvertible.to_pandas()
> ~/miniconda3/lib/python3.9/site-packages/pyarrow/table.pxi in 
> pyarrow.lib.Table._to_pandas()
> ~/miniconda3/lib/python3.9/site-packages/pyarrow/pandas_compat.py in 
> table_to_blockmanager(options, table, categories, ignore_metadata, 
> types_mapper)
>     787     _check_data_column_metadata_consistency(all_columns)
>     788     columns = _deserialize_column_index(table, all_columns, 
> column_indexes)
> --> 789     blocks = _table_to_blocks(options, table, categories, 
> ext_columns_dtypes)
>     790
>     791     axes = [columns, index]
> ~/miniconda3/lib/python3.9/site-packages/pyarrow/pandas_compat.py in 
> _table_to_blocks(options, block_table, categories, extension_columns)
>    1126     # Convert an arrow table to Block from the internal pandas API
>    1127     columns = block_table.column_names
> -> 1128     result = pa.lib.table_to_blocks(options, block_table, categories,
>    1129                                     list(extension_columns.keys()))
>    1130     return [_reconstruct_block(item, columns, extension_columns)
> ~/miniconda3/lib/python3.9/site-packages/pyarrow/table.pxi in 
> pyarrow.lib.table_to_blocks()
> ~/miniconda3/lib/python3.9/site-packages/pyarrow/error.pxi in 
> pyarrow.lib.check_status()
> ArrowNotImplementedError: No known equivalent Pandas block for Arrow data of 
> type dense_union<: null=0, : int32 not null=1> is known.
> {code}
> Note the Arrow file is valid and can be read successfully by 
> [Arrow.jl|https://github.com/apache/arrow-julia]. A related issue is 
> [arrow-julia#285|https://github.com/apache/arrow-julia/issues/285].  The  
> [^nothing.arrow]  file used in this example is attached for convenience.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to