[ 
https://issues.apache.org/jira/browse/ARROW-5353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-5353:
------------------------------
    External issue URL: https://github.com/apache/arrow/issues/21812

> 0-row table can be written but not read
> ---------------------------------------
>
>                 Key: ARROW-5353
>                 URL: https://issues.apache.org/jira/browse/ARROW-5353
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++, Python
>    Affects Versions: 0.11.0, 0.12.0, 0.13.0
>            Reporter: Thomas Buhrmann
>            Priority: Major
>
> I can serialize a table with 0 rows, but not read it. The following code
> {code}
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame({'x': [0,1,2]})[:0]
> fnm = "tbl.arr"
> tbl = pa.Table.from_pandas(df)
> print(tbl.schema)
> writer = pa.RecordBatchStreamWriter(fnm, tbl.schema)
> writer.write_table(tbl)
> reader = pa.RecordBatchStreamReader(fnm)
> tbl2 = reader.read_all()
> {code}
> ...results in the following output:
> {code}
> x: int64
> metadata
> --------
> OrderedDict([(b'pandas',
>               b'{"index_columns": [{"kind": "range", "name": null, "start": '
>               b'0, "stop": 0, "step": 1}], "column_indexes": [{"name": null,'
>               b' "field_name": null, "pandas_type": "unicode", "numpy_type":'
>               b' "object", "metadata": {"encoding": "UTF-8"}}], "columns": ['
>               b'{"name": "x", "field_name": "x", "pandas_type": "int64", "nu'
>               b'mpy_type": "int64", "metadata": null}], "creator": {"library'
>               b'": "pyarrow", "version": "0.13.0"}, "pandas_version": 
> null}')])
> ---------------------------------------------------------------------------
> ArrowInvalid                              Traceback (most recent call last)
> <ipython-input-3-8869ad9b37c6> in <module>
>      11 writer.write_table(tbl)
>      12 
> ---> 13 reader = pa.RecordBatchStreamReader(fnm)
>      14 tbl2 = reader.read_all()
> ~/anaconda/envs/grapy/lib/python3.6/site-packages/pyarrow/ipc.py in 
> __init__(self, source)
>      56     """
>      57     def __init__(self, source):
> ---> 58         self._open(source)
>      59 
>      60 
> ~/anaconda/envs/grapy/lib/python3.6/site-packages/pyarrow/ipc.pxi in 
> pyarrow.lib._RecordBatchStreamReader._open()
> ~/anaconda/envs/grapy/lib/python3.6/site-packages/pyarrow/error.pxi in 
> pyarrow.lib.check_status()
> ArrowInvalid: Expected schema message in stream, was null or length 0
> {code}
> Since the schema should be sufficient to build a table, even though it may 
> not have any actual data, I wouldn't expect this to fail but return the same 
> 0-row input table.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to