westonpace commented on a change in pull request #10794:
URL: https://github.com/apache/arrow/pull/10794#discussion_r677115339



##########
File path: cpp/src/arrow/csv/column_decoder.cc
##########
@@ -198,6 +198,11 @@ Result<std::shared_ptr<Array>> 
InferringColumnDecoder::RunInference(
 
 Future<std::shared_ptr<Array>> InferringColumnDecoder::Decode(
     const std::shared_ptr<BlockParser>& parser) {
+  if (parser->num_rows() == 0) {

Review comment:
       ```suggestion
     // Note: Care should be taken to discard these batches as the data type 
will be null
     // and won't match future decoded arrays.
     if (parser->num_rows() == 0) {
   ```
   
   Nit: Not necessary but might help explain to future readers.  Optionally you 
could also return `nullptr` here so it is clear it is invalid but then that 
will cause some confusion below between EOF and invalidly parsed batch.  I'll 
defer to whatever you think is best.

##########
File path: cpp/src/arrow/csv/reader.cc
##########
@@ -901,13 +901,31 @@ class StreamingReaderImpl : public ReaderMixin,
 
     auto self = shared_from_this();
     return rb_gen().Then([self, rb_gen, max_readahead](const DecodedBlock& 
first_block) {
-      return self->InitAfterFirstBatch(first_block, std::move(rb_gen), 
max_readahead);
+      return self->InitFromBlock(first_block, std::move(rb_gen), 
max_readahead, 0);
     });
   }
 
-  Status InitAfterFirstBatch(const DecodedBlock& first_block,
-                             AsyncGenerator<DecodedBlock> batch_gen, int 
max_readahead) {
-    schema_ = first_block.record_batch->schema();
+  Future<> InitFromBlock(const DecodedBlock& block,
+                         AsyncGenerator<DecodedBlock> batch_gen, int 
max_readahead,
+                         int64_t prev_bytes_processed) {
+    if (!block.record_batch) {
+      // End of file just return null batches
+      record_batch_gen_ = []() { return std::shared_ptr<RecordBatch>(); };

Review comment:
       ```suggestion
         record_batch_gen_ = MakeEmptyGenerator<std::shared_ptr<RecordBatch>>();
   ```
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to