westonpace commented on a change in pull request #10794: URL: https://github.com/apache/arrow/pull/10794#discussion_r677115339
########## File path: cpp/src/arrow/csv/column_decoder.cc ########## @@ -198,6 +198,11 @@ Result<std::shared_ptr<Array>> InferringColumnDecoder::RunInference( Future<std::shared_ptr<Array>> InferringColumnDecoder::Decode( const std::shared_ptr<BlockParser>& parser) { + if (parser->num_rows() == 0) { Review comment: ```suggestion // Note: Care should be taken to discard these batches as the data type will be null // and won't match future decoded arrays. if (parser->num_rows() == 0) { ``` Nit: Not necessary but might help explain to future readers. Optionally you could also return `nullptr` here so it is clear it is invalid but then that will cause some confusion below between EOF and invalidly parsed batch. I'll defer to whatever you think is best. ########## File path: cpp/src/arrow/csv/reader.cc ########## @@ -901,13 +901,31 @@ class StreamingReaderImpl : public ReaderMixin, auto self = shared_from_this(); return rb_gen().Then([self, rb_gen, max_readahead](const DecodedBlock& first_block) { - return self->InitAfterFirstBatch(first_block, std::move(rb_gen), max_readahead); + return self->InitFromBlock(first_block, std::move(rb_gen), max_readahead, 0); }); } - Status InitAfterFirstBatch(const DecodedBlock& first_block, - AsyncGenerator<DecodedBlock> batch_gen, int max_readahead) { - schema_ = first_block.record_batch->schema(); + Future<> InitFromBlock(const DecodedBlock& block, + AsyncGenerator<DecodedBlock> batch_gen, int max_readahead, + int64_t prev_bytes_processed) { + if (!block.record_batch) { + // End of file just return null batches + record_batch_gen_ = []() { return std::shared_ptr<RecordBatch>(); }; Review comment: ```suggestion record_batch_gen_ = MakeEmptyGenerator<std::shared_ptr<RecordBatch>>(); ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org