Thomas Buhrmann created ARROW-5353:
--------------------------------------
Summary: 0-row table can be written but not read
Key: ARROW-5353
URL: https://issues.apache.org/jira/browse/ARROW-5353
Project: Apache Arrow
Issue Type: Bug
Components: C++, Python
Affects Versions: 0.13.0, 0.12.0, 0.11.0
Reporter: Thomas Buhrmann
I can serialize a table with 0 rows, but not read it. The following code
{code}
import pandas as pd
import pyarrow as pa
df = pd.DataFrame({'x': [0,1,2]})[:0]
fnm = "tbl.arr"
tbl = pa.Table.from_pandas(df)
print(tbl.schema)
writer = pa.RecordBatchStreamWriter(fnm, tbl.schema)
writer.write_table(tbl)
reader = pa.RecordBatchStreamReader(fnm)
tbl2 = reader.read_all()
{code}
...results in the following output:
{code}
x: int64
metadata
--------
OrderedDict([(b'pandas',
b'{"index_columns": [{"kind": "range", "name": null, "start": '
b'0, "stop": 0, "step": 1}], "column_indexes": [{"name": null,'
b' "field_name": null, "pandas_type": "unicode", "numpy_type":'
b' "object", "metadata": {"encoding": "UTF-8"}}], "columns": ['
b'{"name": "x", "field_name": "x", "pandas_type": "int64", "nu'
b'mpy_type": "int64", "metadata": null}], "creator": {"library'
b'": "pyarrow", "version": "0.13.0"}, "pandas_version": null}')])
---------------------------------------------------------------------------
ArrowInvalid Traceback (most recent call last)
<ipython-input-3-8869ad9b37c6> in <module>
11 writer.write_table(tbl)
12
---> 13 reader = pa.RecordBatchStreamReader(fnm)
14 tbl2 = reader.read_all()
~/anaconda/envs/grapy/lib/python3.6/site-packages/pyarrow/ipc.py in
__init__(self, source)
56 """
57 def __init__(self, source):
---> 58 self._open(source)
59
60
~/anaconda/envs/grapy/lib/python3.6/site-packages/pyarrow/ipc.pxi in
pyarrow.lib._RecordBatchStreamReader._open()
~/anaconda/envs/grapy/lib/python3.6/site-packages/pyarrow/error.pxi in
pyarrow.lib.check_status()
ArrowInvalid: Expected schema message in stream, was null or length 0
{code}
Since the schema should be sufficient to build a table, even though it may not
have any actual data, I wouldn't expect this to fail but return the same 0-row
input table.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)