Tushar7012 opened a new pull request, #20023: URL: https://github.com/apache/datafusion/pull/20023
## Which issue does this PR close? - Part of improving DataFusion's file listing performance for large-scale table scans. ## Rationale for this change When a `ListingTable` has multiple table paths, the current implementation processes them sequentially using `future::try_join_all`. This can be a bottleneck when listing files across many directories, as each path's file listing waits for the previous one to complete before starting. By parallelizing the file listing using `tokio::task::JoinSet`, we can spawn separate tasks for each table path, allowing them to execute concurrently and significantly improving performance for tables with multiple paths. ## What changes are included in this PR? 1. **Parallel file listing with `JoinSet`** - Modified `list_files_for_scan` to spawn parallel tasks for each table path using `tokio::task::JoinSet` 2. **Function signature refactoring** - Updated `pruned_partition_list` and `list_all_files` to accept `&ConfigOptions` and `&Arc<RuntimeEnv>` instead of `&dyn Session` to enable cloning for parallel task spawning 3. **WASM compatibility** - Added conditional compilation (`#[cfg(not(target_arch = "wasm32"))]`) to use parallel execution for native targets and sequential execution with `try_join_all` for WASM targets, since WASM has limited multi-threading support ### Files changed: - `datafusion/catalog-listing/src/table.rs` - Main parallelization logic - `datafusion/catalog-listing/src/helpers.rs` - Updated function signatures - `datafusion/catalog-listing/src/options.rs` - Updated function signatures - `datafusion/datasource/src/url.rs` - Updated `list_all_files` signature - `datafusion/core/src/datasource/listing/table.rs` - Updated call sites - `datafusion/core/tests/catalog_listing/pruned_partition_list.rs` - Updated test calls ## Are these changes tested? Yes, the existing tests cover the functionality: - `pruned_partition_list` tests validate the file listing behavior - WASM tests ensure compatibility with WebAssembly target - CI runs include both native and WASM build tests ## Are there any user-facing changes? No user-facing API changes. This is an internal performance optimization that maintains the same external behavior while improving file listing performance for tables with multiple paths. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
