[ https://issues.apache.org/jira/browse/ARROW-15767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joris Van den Bossche updated ARROW-15767: ------------------------------------------ Summary: [Python] Arrow Table with DenseUnion fails to convert to Python Pandas DataFrame (was: [Python] Arrow Table with Nullable DenseUnion fails to convert to Python Pandas DataFrame) > [Python] Arrow Table with DenseUnion fails to convert to Python Pandas > DataFrame > -------------------------------------------------------------------------------- > > Key: ARROW-15767 > URL: https://issues.apache.org/jira/browse/ARROW-15767 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Affects Versions: 6.0.1 > Reporter: Ben Baumgold > Priority: Major > Attachments: nothing.arrow > > > A feather file containing column of nullable values errors when converting to > a Pandas DataFrame. It can be read into a pyarrow.Table as follows: > {code:python} > In [1]: import pyarrow.feather as feather > In [2]: t = feather.read_table("nothing.arrow") > In [3]: t > Out[3]: > pyarrow.Table > col: dense_union<: null=0, : int32 not null=1> > child 0, : null > child 1, : int32 not null > ---- > col: [ -- is_valid: all not null -- type_ids: [ > 1, > 1, > 1, > 0 > ] -- value_offsets: [ > 0, > 1, > 2, > 0 > ] -- child 0 type: null > 1 nulls -- child 1 type: int32 > [ > 1, > 2, > 3 > ]] > {code} > But when trying to convert the pyarrow.Table into a Pandas DataFrame, I get > the following error: > {code:python} > In [4]: t.to_pandas() > --------------------------------------------------------------------------- > ArrowNotImplementedError Traceback (most recent call last) > <ipython-input-25-8ba84762c39a> in <module> > ----> 1 t.to_pandas() > ~/miniconda3/lib/python3.9/site-packages/pyarrow/array.pxi in > pyarrow.lib._PandasConvertible.to_pandas() > ~/miniconda3/lib/python3.9/site-packages/pyarrow/table.pxi in > pyarrow.lib.Table._to_pandas() > ~/miniconda3/lib/python3.9/site-packages/pyarrow/pandas_compat.py in > table_to_blockmanager(options, table, categories, ignore_metadata, > types_mapper) > 787 _check_data_column_metadata_consistency(all_columns) > 788 columns = _deserialize_column_index(table, all_columns, > column_indexes) > --> 789 blocks = _table_to_blocks(options, table, categories, > ext_columns_dtypes) > 790 > 791 axes = [columns, index] > ~/miniconda3/lib/python3.9/site-packages/pyarrow/pandas_compat.py in > _table_to_blocks(options, block_table, categories, extension_columns) > 1126 # Convert an arrow table to Block from the internal pandas API > 1127 columns = block_table.column_names > -> 1128 result = pa.lib.table_to_blocks(options, block_table, categories, > 1129 list(extension_columns.keys())) > 1130 return [_reconstruct_block(item, columns, extension_columns) > ~/miniconda3/lib/python3.9/site-packages/pyarrow/table.pxi in > pyarrow.lib.table_to_blocks() > ~/miniconda3/lib/python3.9/site-packages/pyarrow/error.pxi in > pyarrow.lib.check_status() > ArrowNotImplementedError: No known equivalent Pandas block for Arrow data of > type dense_union<: null=0, : int32 not null=1> is known. > {code} > Note the Arrow file is valid and can be read successfully by > [Arrow.jl|https://github.com/apache/arrow-julia]. A related issue is > [arrow-julia#285|https://github.com/apache/arrow-julia/issues/285]. The > [^nothing.arrow] file used in this example is attached for convenience. -- This message was sent by Atlassian Jira (v8.20.1#820001)