Razvan Chitu created ARROW-6899: ----------------------------------- Summary: to_pandas() not implemented on list<dictionary<values=string, indices=int32> Key: ARROW-6899 URL: https://issues.apache.org/jira/browse/ARROW-6899 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 0.15.0, 0.13.0 Reporter: Razvan Chitu Attachments: encoded.arrow
Hi, {{pyarrow.Table.to_pandas()}} fails on an Arrow List Vector where the data vector is of type "dictionary encoded string". Here is the table schema as printed by pyarrow: {code:java} pyarrow.Table encodedList: list<$data$: dictionary<values=string, indices=int32, ordered=0> not null> not null child 0, $data$: dictionary<values=string, indices=int32, ordered=0> not null metadata -------- OrderedDict() {code} and the data (also attached in a file to this ticket) {code:java} <pyarrow.lib.ChunkedArray object at 0x7f7ea6a748b8> [ [ -- dictionary: [ "a", "b", "c", "d" ] -- indices: [ 0, 1, 2 ], -- dictionary: [ "a", "b", "c", "d" ] -- indices: [ 0, 3 ] ] ] {code} and the exception I got {code:java} --------------------------------------------------------------------------- ArrowNotImplementedError Traceback (most recent call last) <ipython-input-10-5f865bc01df1> in <module> ----> 1 df.to_pandas() ~/.local/share/virtualenvs/jupyter-BKbz0SEp/lib/python3.6/site-packages/pyarrow/array.pxi in pyarrow.lib._PandasConvertible.to_pandas() ~/.local/share/virtualenvs/jupyter-BKbz0SEp/lib/python3.6/site-packages/pyarrow/table.pxi in pyarrow.lib.Table._to_pandas() ~/.local/share/virtualenvs/jupyter-BKbz0SEp/lib/python3.6/site-packages/pyarrow/pandas_compat.py in table_to_blockmanager(options, table, categories, ignore_metadata) 700 701 _check_data_column_metadata_consistency(all_columns) --> 702 blocks = _table_to_blocks(options, table, categories) 703 columns = _deserialize_column_index(table, all_columns, column_indexes) 704 ~/.local/share/virtualenvs/jupyter-BKbz0SEp/lib/python3.6/site-packages/pyarrow/pandas_compat.py in _table_to_blocks(options, block_table, categories) 972 973 # Convert an arrow table to Block from the internal pandas API --> 974 result = pa.lib.table_to_blocks(options, block_table, categories) 975 976 # Defined above ~/.local/share/virtualenvs/jupyter-BKbz0SEp/lib/python3.6/site-packages/pyarrow/table.pxi in pyarrow.lib.table_to_blocks() ~/.local/share/virtualenvs/jupyter-BKbz0SEp/lib/python3.6/site-packages/pyarrow/error.pxi in pyarrow.lib.check_status() ArrowNotImplementedError: Not implemented type for list in DataFrameBlock: dictionary<values=string, indices=int32, ordered=0> {code} Note that the data vector itself can be loaded successfully by to_pandas. It'd be great if this would be addressed in the next version of pyarrow. For now, is there anything I can do on my end to bypass this unimplemented conversion? Thanks, Razvan -- This message was sent by Atlassian Jira (v8.3.4#803005)