[ https://issues.apache.org/jira/browse/ARROW-5353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Rok Mihevc updated ARROW-5353: ------------------------------ External issue URL: https://github.com/apache/arrow/issues/21812 > 0-row table can be written but not read > --------------------------------------- > > Key: ARROW-5353 > URL: https://issues.apache.org/jira/browse/ARROW-5353 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python > Affects Versions: 0.11.0, 0.12.0, 0.13.0 > Reporter: Thomas Buhrmann > Priority: Major > > I can serialize a table with 0 rows, but not read it. The following code > {code} > import pandas as pd > import pyarrow as pa > df = pd.DataFrame({'x': [0,1,2]})[:0] > fnm = "tbl.arr" > tbl = pa.Table.from_pandas(df) > print(tbl.schema) > writer = pa.RecordBatchStreamWriter(fnm, tbl.schema) > writer.write_table(tbl) > reader = pa.RecordBatchStreamReader(fnm) > tbl2 = reader.read_all() > {code} > ...results in the following output: > {code} > x: int64 > metadata > -------- > OrderedDict([(b'pandas', > b'{"index_columns": [{"kind": "range", "name": null, "start": ' > b'0, "stop": 0, "step": 1}], "column_indexes": [{"name": null,' > b' "field_name": null, "pandas_type": "unicode", "numpy_type":' > b' "object", "metadata": {"encoding": "UTF-8"}}], "columns": [' > b'{"name": "x", "field_name": "x", "pandas_type": "int64", "nu' > b'mpy_type": "int64", "metadata": null}], "creator": {"library' > b'": "pyarrow", "version": "0.13.0"}, "pandas_version": > null}')]) > --------------------------------------------------------------------------- > ArrowInvalid Traceback (most recent call last) > <ipython-input-3-8869ad9b37c6> in <module> > 11 writer.write_table(tbl) > 12 > ---> 13 reader = pa.RecordBatchStreamReader(fnm) > 14 tbl2 = reader.read_all() > ~/anaconda/envs/grapy/lib/python3.6/site-packages/pyarrow/ipc.py in > __init__(self, source) > 56 """ > 57 def __init__(self, source): > ---> 58 self._open(source) > 59 > 60 > ~/anaconda/envs/grapy/lib/python3.6/site-packages/pyarrow/ipc.pxi in > pyarrow.lib._RecordBatchStreamReader._open() > ~/anaconda/envs/grapy/lib/python3.6/site-packages/pyarrow/error.pxi in > pyarrow.lib.check_status() > ArrowInvalid: Expected schema message in stream, was null or length 0 > {code} > Since the schema should be sufficient to build a table, even though it may > not have any actual data, I wouldn't expect this to fail but return the same > 0-row input table. > -- This message was sent by Atlassian Jira (v8.20.10#820010)