Jingyuan Wang created ARROW-2119: ------------------------------------ Summary: Handle Arrow stream with zero record batch Key: ARROW-2119 URL: https://issues.apache.org/jira/browse/ARROW-2119 Project: Apache Arrow Issue Type: Bug Reporter: Jingyuan Wang
It looks like currently many places of the code assume that there needs to be at least one record batch for streaming format. Is zero-recordbatch not supported by design? e.g. [https://github.com/apache/arrow/blob/master/java/tools/src/main/java/org/apache/arrow/tools/StreamToFile.java#L45] {code:none} public static void convert(InputStream in, OutputStream out) throws IOException { BufferAllocator allocator = new RootAllocator(Integer.MAX_VALUE); try (ArrowStreamReader reader = new ArrowStreamReader(in, allocator)) { VectorSchemaRoot root = reader.getVectorSchemaRoot(); // load the first batch before instantiating the writer so that we have any dictionaries if (!reader.loadNextBatch()) { throw new IOException("Unable to read first record batch"); } ... {code} Pyarrow-0.8.0 does not load 0-recordbatch stream either. It would throw an exception originated from [https://github.com/apache/arrow/blob/a95465b8ce7a32feeaae3e13d0a64102ffa590d9/cpp/src/arrow/table.cc#L309:] {code:none} Status Table::FromRecordBatches(const std::vector<std::shared_ptr<RecordBatch>>& batches, std::shared_ptr<Table>* table) { if (batches.size() == 0) { return Status::Invalid("Must pass at least one record batch"); } ...{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)