[ https://issues.apache.org/jira/browse/ARROW-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16779569#comment-16779569 ]
Wes McKinney commented on ARROW-2119: ------------------------------------- This can be resolved by adding a zero-record-batch stream to the integration tests > [C++][Java] Handle Arrow stream with zero record batch > ------------------------------------------------------ > > Key: ARROW-2119 > URL: https://issues.apache.org/jira/browse/ARROW-2119 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Java > Reporter: Jingyuan Wang > Priority: Major > Fix For: 0.13.0 > > > It looks like currently many places of the code assume that there needs to be > at least one record batch for streaming format. Is zero-recordbatch not > supported by design? > e.g. > [https://github.com/apache/arrow/blob/master/java/tools/src/main/java/org/apache/arrow/tools/StreamToFile.java#L45] > {code:none} > public static void convert(InputStream in, OutputStream out) throws > IOException { > BufferAllocator allocator = new RootAllocator(Integer.MAX_VALUE); > try (ArrowStreamReader reader = new ArrowStreamReader(in, allocator)) { > VectorSchemaRoot root = reader.getVectorSchemaRoot(); > // load the first batch before instantiating the writer so that we have > any dictionaries > if (!reader.loadNextBatch()) { > throw new IOException("Unable to read first record batch"); > } > ... > {code} > Pyarrow-0.8.0 does not load 0-recordbatch stream either. It would throw an > exception originated from > [https://github.com/apache/arrow/blob/a95465b8ce7a32feeaae3e13d0a64102ffa590d9/cpp/src/arrow/table.cc#L309:] > {code:none} > Status Table::FromRecordBatches(const > std::vector<std::shared_ptr<RecordBatch>>& batches, > std::shared_ptr<Table>* table) { > if (batches.size() == 0) { > return Status::Invalid("Must pass at least one record batch"); > } > ...{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)