[ https://issues.apache.org/jira/browse/ARROW-6899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Neal Richardson reassigned ARROW-6899: -------------------------------------- Assignee: Wes McKinney (was: Neal Richardson) > [Python] to_pandas() not implemented on list<dictionary<values=string, > indices=int32> > ------------------------------------------------------------------------------------- > > Key: ARROW-6899 > URL: https://issues.apache.org/jira/browse/ARROW-6899 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Affects Versions: 0.13.0, 0.15.0 > Reporter: Razvan Chitu > Assignee: Wes McKinney > Priority: Major > Labels: pull-request-available > Fix For: 0.16.0 > > Attachments: encoded.arrow > > Time Spent: 0.5h > Remaining Estimate: 0h > > Hi, > {{pyarrow.Table.to_pandas()}} fails on an Arrow List Vector where the data > vector is of type "dictionary encoded string". Here is the table schema as > printed by pyarrow: > {code:java} > pyarrow.Table > encodedList: list<$data$: dictionary<values=string, indices=int32, ordered=0> > not null> not null > child 0, $data$: dictionary<values=string, indices=int32, ordered=0> not > null > metadata > -------- > OrderedDict() {code} > and the data (also attached in a file to this ticket) > {code:java} > <pyarrow.lib.ChunkedArray object at 0x7f7ea6a748b8> > [ > [ > -- dictionary: > [ > "a", > "b", > "c", > "d" > ] > -- indices: > [ > 0, > 1, > 2 > ], > -- dictionary: > [ > "a", > "b", > "c", > "d" > ] > -- indices: > [ > 0, > 3 > ] > ] > ] {code} > and the exception I got > {code:java} > --------------------------------------------------------------------------- > ArrowNotImplementedError Traceback (most recent call last) > <ipython-input-10-5f865bc01df1> in <module> > ----> 1 df.to_pandas() > ~/.local/share/virtualenvs/jupyter-BKbz0SEp/lib/python3.6/site-packages/pyarrow/array.pxi > in pyarrow.lib._PandasConvertible.to_pandas() > ~/.local/share/virtualenvs/jupyter-BKbz0SEp/lib/python3.6/site-packages/pyarrow/table.pxi > in pyarrow.lib.Table._to_pandas() > ~/.local/share/virtualenvs/jupyter-BKbz0SEp/lib/python3.6/site-packages/pyarrow/pandas_compat.py > in table_to_blockmanager(options, table, categories, ignore_metadata) > 700 > 701 _check_data_column_metadata_consistency(all_columns) > --> 702 blocks = _table_to_blocks(options, table, categories) > 703 columns = _deserialize_column_index(table, all_columns, > column_indexes) > 704 > ~/.local/share/virtualenvs/jupyter-BKbz0SEp/lib/python3.6/site-packages/pyarrow/pandas_compat.py > in _table_to_blocks(options, block_table, categories) > 972 > 973 # Convert an arrow table to Block from the internal pandas API > --> 974 result = pa.lib.table_to_blocks(options, block_table, categories) > 975 > 976 # Defined above > ~/.local/share/virtualenvs/jupyter-BKbz0SEp/lib/python3.6/site-packages/pyarrow/table.pxi > in pyarrow.lib.table_to_blocks() > ~/.local/share/virtualenvs/jupyter-BKbz0SEp/lib/python3.6/site-packages/pyarrow/error.pxi > in pyarrow.lib.check_status() > ArrowNotImplementedError: Not implemented type for list in DataFrameBlock: > dictionary<values=string, indices=int32, ordered=0> {code} > Note that the data vector itself can be loaded successfully by to_pandas. > It'd be great if this would be addressed in the next version of pyarrow. For > now, is there anything I can do on my end to bypass this unimplemented > conversion? > Thanks, > Razvan -- This message was sent by Atlassian Jira (v8.3.4#803005)