zanmato1984 commented on code in PR #43698: URL: https://github.com/apache/arrow/pull/43698#discussion_r2313098982
########## cpp/src/arrow/dataset/scanner.h: ########## @@ -459,6 +465,11 @@ class ARROW_DS_EXPORT Scanner { TaggedRecordBatchIterator scan); const std::shared_ptr<ScanOptions> scan_options_; + + ::arrow::internal::Executor* async_cpu_executor() const { + return scan_options_->exec_context.executor() ? scan_options_->exec_context.executor() + : ::arrow::internal::GetCpuThreadPool(); + } Review Comment: ```suggestion ``` This seems unnecessary. We can pass the `scan_options_->cpu_executor` (as noted in the other comment, avoid introducing `ExecContext`), even null, all the way down to `IterateSynchronously` where a null executor implies defaulted CPU thread pool. ########## cpp/src/arrow/dataset/file_parquet.cc: ########## @@ -642,10 +643,14 @@ Result<RecordBatchGenerator> ParquetFileFormat::ScanBatchesAsync( kParquetTypeName, options.get(), default_fragment_scan_options)); int batch_readahead = options->batch_readahead; int64_t rows_to_readahead = batch_readahead * options->batch_size; - ARROW_ASSIGN_OR_RAISE(auto generator, - reader->GetRecordBatchGenerator( - reader, row_groups, column_projection, - ::arrow::internal::GetCpuThreadPool(), rows_to_readahead)); + // Use the executor in scan_options instead of always using the + // default CPU thread pool. Review Comment: This looks like explaining this particular "change" rather than explaining the code itself. Maybe we can revise it to like "Respect the executor in the scan options". ########## cpp/src/arrow/dataset/scanner.h: ########## @@ -107,6 +107,11 @@ struct ARROW_DS_EXPORT ScanOptions { /// Note: The IOContext executor will be ignored if use_threads is set to false io::IOContext io_context; + /// ExecContext for any CPU tasks + /// + /// Note: The ExecContext executor will be ignored if use_threads is set to false + compute::ExecContext exec_context; Review Comment: I think this should be an `Executor *`. The `ExecContext` is more of a compute internal concept that doesn't seem applicable here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org