[ 
https://issues.apache.org/jira/browse/ARROW-15370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-15370:
-----------------------------------
    Labels: pull-request-available  (was: )

> [Python] Regression in empty table to_pandas conversion
> -------------------------------------------------------
>
>                 Key: ARROW-15370
>                 URL: https://issues.apache.org/jira/browse/ARROW-15370
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>            Reporter: Joris Van den Bossche
>            Priority: Blocker
>              Labels: pull-request-available
>             Fix For: 7.0.0
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Nightly integration tests with kartothek are failing, see eg 
> https://github.com/ursacomputing/crossbow/runs/4863725914?check_suite_focus=true
> This seems something on our side, and a recent failure (the builds only 
> started failing today, and I don't see other differences with the last 
> working build yesterday)
> Update, a reproducer:
> {code}
> In [4]: df = pd.DataFrame({'a': [1, 2], 'b': [0.1, 0.2]})
> In [5]: table = pa.table(df)
> In [6]: table.schema.empty_table().to_pandas()
> ---------------------------------------------------------------------------
> ValueError                                Traceback (most recent call last)
> <ipython-input-6-a03ecffc0af8> in <module>
> ----> 1 table.schema.empty_table().to_pandas()
> ~/scipy/repos/arrow/python/pyarrow/array.pxi in 
> pyarrow.lib._PandasConvertible.to_pandas()
> ~/scipy/repos/arrow/python/pyarrow/table.pxi in pyarrow.lib.Table._to_pandas()
> ~/scipy/repos/arrow/python/pyarrow/pandas_compat.py in 
> table_to_blockmanager(options, table, categories, ignore_metadata, 
> types_mapper)
>     790 
>     791     axes = [columns, index]
> --> 792     return BlockManager(blocks, axes)
>     793 
>     794 
> ~/miniconda3/envs/arrow-dev/lib/python3.8/site-packages/pandas/core/internals/managers.py
>  in __init__(self, blocks, axes, verify_integrity)
>     912                         pass
>     913 
> --> 914             self._verify_integrity()
>     915 
>     916     def _verify_integrity(self) -> None:
> ~/miniconda3/envs/arrow-dev/lib/python3.8/site-packages/pandas/core/internals/managers.py
>  in _verify_integrity(self)
>     919         for block in self.blocks:
>     920             if block.shape[1:] != mgr_shape[1:]:
> --> 921                 raise construction_error(tot_items, block.shape[1:], 
> self.axes)
>     922         if len(self.items) != tot_items:
>     923             raise AssertionError(
> ValueError: Empty data passed with indices specified.
> {code}
> It happens specifically if the schema still has pandas metadata that indicate 
> a range for the index (which we try to recreate, but that doesn't match the 
> actual length of the table).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to