[jira] [Updated] (ARROW-2307) [Python] Unable to read arrow stream containing 0 record batches
[ https://issues.apache.org/jira/browse/ARROW-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-2307: -- Labels: pull-request-available (was: ) > [Python] Unable to read arrow stream containing 0 record batches > > > Key: ARROW-2307 > URL: https://issues.apache.org/jira/browse/ARROW-2307 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.8.0 >Reporter: Benjamin Duffield >Assignee: Wes McKinney >Priority: Major > Labels: pull-request-available > > Using java arrow I'm creating an arrow stream, using the stream writer. > > Sometimes I don't have anything to serialize, and so I don't write any record > batches. My arrow stream thus consists of just a schema message. > {code:java} > > > {code} > I am able to deserialize this arrow stream correctly using the java stream > reader, but when reading it with python I instead hit an error > {code} > import pyarrow as pa > # ... > reader = pa.open_stream(stream) > df = reader.read_all().to_pandas() > {code} > produces > {code} > File "ipc.pxi", line 307, in pyarrow.lib._RecordBatchReader.read_all > File "error.pxi", line 77, in pyarrow.lib.check_status > ArrowInvalid: Must pass at least one record batch > {code} > i.e. we're hitting the check in > https://github.com/apache/arrow/blob/apache-arrow-0.8.0/cpp/src/arrow/table.cc#L284 > The workaround we're currently using is to always ensure we serialize at > least one record batch, even if it's empty. However, I think it would be nice > to either support a stream without record batches or explicitly disallow this > and then match behaviour in java. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2307) [Python] Unable to read arrow stream containing 0 record batches
[ https://issues.apache.org/jira/browse/ARROW-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-2307: Summary: [Python] Unable to read arrow stream containing 0 record batches (was: Unable to read arrow stream containing 0 record batches using pyarrow) > [Python] Unable to read arrow stream containing 0 record batches > > > Key: ARROW-2307 > URL: https://issues.apache.org/jira/browse/ARROW-2307 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.8.0 >Reporter: Benjamin Duffield >Assignee: Wes McKinney >Priority: Major > > Using java arrow I'm creating an arrow stream, using the stream writer. > > Sometimes I don't have anything to serialize, and so I don't write any record > batches. My arrow stream thus consists of just a schema message. > {code:java} > > > {code} > I am able to deserialize this arrow stream correctly using the java stream > reader, but when reading it with python I instead hit an error > {code} > import pyarrow as pa > # ... > reader = pa.open_stream(stream) > df = reader.read_all().to_pandas() > {code} > produces > {code} > File "ipc.pxi", line 307, in pyarrow.lib._RecordBatchReader.read_all > File "error.pxi", line 77, in pyarrow.lib.check_status > ArrowInvalid: Must pass at least one record batch > {code} > i.e. we're hitting the check in > https://github.com/apache/arrow/blob/apache-arrow-0.8.0/cpp/src/arrow/table.cc#L284 > The workaround we're currently using is to always ensure we serialize at > least one record batch, even if it's empty. However, I think it would be nice > to either support a stream without record batches or explicitly disallow this > and then match behaviour in java. -- This message was sent by Atlassian JIRA (v7.6.3#76005)