[ https://issues.apache.org/jira/browse/ARROW-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated ARROW-2307: ---------------------------------- Labels: pull-request-available (was: ) > [Python] Unable to read arrow stream containing 0 record batches > ---------------------------------------------------------------- > > Key: ARROW-2307 > URL: https://issues.apache.org/jira/browse/ARROW-2307 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Affects Versions: 0.8.0 > Reporter: Benjamin Duffield > Assignee: Wes McKinney > Priority: Major > Labels: pull-request-available > > Using java arrow I'm creating an arrow stream, using the stream writer. > > Sometimes I don't have anything to serialize, and so I don't write any record > batches. My arrow stream thus consists of just a schema message. > {code:java} > <SCHEMA> > <EOS [optional]: int32> > {code} > I am able to deserialize this arrow stream correctly using the java stream > reader, but when reading it with python I instead hit an error > {code} > import pyarrow as pa > # ... > reader = pa.open_stream(stream) > df = reader.read_all().to_pandas() > {code} > produces > {code} > File "ipc.pxi", line 307, in pyarrow.lib._RecordBatchReader.read_all > File "error.pxi", line 77, in pyarrow.lib.check_status > ArrowInvalid: Must pass at least one record batch > {code} > i.e. we're hitting the check in > https://github.com/apache/arrow/blob/apache-arrow-0.8.0/cpp/src/arrow/table.cc#L284 > The workaround we're currently using is to always ensure we serialize at > least one record batch, even if it's empty. However, I think it would be nice > to either support a stream without record batches or explicitly disallow this > and then match behaviour in java. -- This message was sent by Atlassian JIRA (v7.6.3#76005)