zanmato1984 commented on code in PR #43698:
URL: https://github.com/apache/arrow/pull/43698#discussion_r2313098982


##########
cpp/src/arrow/dataset/scanner.h:
##########
@@ -459,6 +465,11 @@ class ARROW_DS_EXPORT Scanner {
       TaggedRecordBatchIterator scan);
 
   const std::shared_ptr<ScanOptions> scan_options_;
+
+  ::arrow::internal::Executor* async_cpu_executor() const {
+    return scan_options_->exec_context.executor() ? 
scan_options_->exec_context.executor()
+                                                  : 
::arrow::internal::GetCpuThreadPool();
+  }

Review Comment:
   ```suggestion
   ```
   This seems unnecessary. We can pass the `scan_options_->cpu_executor` (as 
noted in the other comment, avoid introducing `ExecContext`), even null, all 
the way down to `IterateSynchronously` where a null executor implies defaulted 
CPU thread pool.



##########
cpp/src/arrow/dataset/file_parquet.cc:
##########
@@ -642,10 +643,14 @@ Result<RecordBatchGenerator> 
ParquetFileFormat::ScanBatchesAsync(
             kParquetTypeName, options.get(), default_fragment_scan_options));
     int batch_readahead = options->batch_readahead;
     int64_t rows_to_readahead = batch_readahead * options->batch_size;
-    ARROW_ASSIGN_OR_RAISE(auto generator,
-                          reader->GetRecordBatchGenerator(
-                              reader, row_groups, column_projection,
-                              ::arrow::internal::GetCpuThreadPool(), 
rows_to_readahead));
+    // Use the executor in scan_options instead of always using the
+    // default CPU thread pool.

Review Comment:
   This looks like explaining this particular "change" rather than explaining 
the code itself. Maybe we can revise it to like "Respect the executor in the 
scan options".



##########
cpp/src/arrow/dataset/scanner.h:
##########
@@ -107,6 +107,11 @@ struct ARROW_DS_EXPORT ScanOptions {
   /// Note: The IOContext executor will be ignored if use_threads is set to 
false
   io::IOContext io_context;
 
+  /// ExecContext for any CPU tasks
+  ///
+  /// Note: The ExecContext executor will be ignored if use_threads is set to 
false
+  compute::ExecContext exec_context;

Review Comment:
   I think this should be an `Executor *`. The `ExecContext` is more of a 
compute internal concept that doesn't seem applicable here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to