Thomas Buhrmann created ARROW-5353:
--------------------------------------

             Summary: 0-row table can be written but not read
                 Key: ARROW-5353
                 URL: https://issues.apache.org/jira/browse/ARROW-5353
             Project: Apache Arrow
          Issue Type: Bug
          Components: C++, Python
    Affects Versions: 0.13.0, 0.12.0, 0.11.0
            Reporter: Thomas Buhrmann


I can serialize a table with 0 rows, but not read it. The following code
{code}
import pandas as pd
import pyarrow as pa

df = pd.DataFrame({'x': [0,1,2]})[:0]
fnm = "tbl.arr"

tbl = pa.Table.from_pandas(df)
print(tbl.schema)

writer = pa.RecordBatchStreamWriter(fnm, tbl.schema)
writer.write_table(tbl)

reader = pa.RecordBatchStreamReader(fnm)
tbl2 = reader.read_all()
{code}
...results in the following output:
{code}
x: int64
metadata
--------
OrderedDict([(b'pandas',
              b'{"index_columns": [{"kind": "range", "name": null, "start": '
              b'0, "stop": 0, "step": 1}], "column_indexes": [{"name": null,'
              b' "field_name": null, "pandas_type": "unicode", "numpy_type":'
              b' "object", "metadata": {"encoding": "UTF-8"}}], "columns": ['
              b'{"name": "x", "field_name": "x", "pandas_type": "int64", "nu'
              b'mpy_type": "int64", "metadata": null}], "creator": {"library'
              b'": "pyarrow", "version": "0.13.0"}, "pandas_version": null}')])
---------------------------------------------------------------------------
ArrowInvalid                              Traceback (most recent call last)
<ipython-input-3-8869ad9b37c6> in <module>
     11 writer.write_table(tbl)
     12 
---> 13 reader = pa.RecordBatchStreamReader(fnm)
     14 tbl2 = reader.read_all()

~/anaconda/envs/grapy/lib/python3.6/site-packages/pyarrow/ipc.py in 
__init__(self, source)
     56     """
     57     def __init__(self, source):
---> 58         self._open(source)
     59 
     60 

~/anaconda/envs/grapy/lib/python3.6/site-packages/pyarrow/ipc.pxi in 
pyarrow.lib._RecordBatchStreamReader._open()

~/anaconda/envs/grapy/lib/python3.6/site-packages/pyarrow/error.pxi in 
pyarrow.lib.check_status()

ArrowInvalid: Expected schema message in stream, was null or length 0
{code}
Since the schema should be sufficient to build a table, even though it may not 
have any actual data, I wouldn't expect this to fail but return the same 0-row 
input table.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to