lidavidm commented on a change in pull request #9607:
URL: https://github.com/apache/arrow/pull/9607#discussion_r585583131



##########
File path: cpp/src/arrow/dataset/file_parquet.cc
##########
@@ -364,7 +367,8 @@ Result<ScanTaskIterator> 
ParquetFileFormat::ScanFile(std::shared_ptr<ScanOptions
         reader_options.io_context, reader_options.cache_options, options, 
context);
   }
 
-  return MakeVectorIterator(std::move(tasks));

Review comment:
       Shouldn't basically the entire method body be wrapped in a Future, since 
we're doing I/O here? Right now this synchronously reads the Parquet footer and 
then returns an already-completed future.

##########
File path: cpp/src/arrow/dataset/file_parquet.cc
##########
@@ -85,13 +85,16 @@ class ParquetScanTask : public ScanTask {
       // since it must outlive the wrapped RecordBatchReader
       std::shared_ptr<parquet::arrow::FileReader> file_reader;
       std::unique_ptr<RecordBatchReader> record_batch_reader;
-    } NextBatch;
+    };
+    auto next_batch = std::make_shared<GetNextBatch>();

Review comment:
       Minor nit/not really related, but the fact that we have to do this 
wrapping here seems like a shortcoming of the Parquet APIs; the record batch 
reader should reference the file reader and we should be able to use 
GeneratorFromReader here.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to