sushiljacksparrow opened a new pull request, #597: URL: https://github.com/apache/hudi-rs/pull/597
## Change Logs Implements the re-scoped scope from #205 (xushiyan's 2026-04-28 comment): parallelize `get_leaf_dirs` and add an optional per-level predicate so partition pruning short-circuits subtrees during descent. 1. `get_leaf_dirs` (`crates/core/src/storage/mod.rs`) gains an optional predicate and a parallelism bound. Subtrees the predicate rejects are skipped without listing; sibling subdirs at each level are walked concurrently via `buffer_unordered`. Parallelism is bounded by `hoodie.plan.listing.parallelism` (default 10). 2. `PartitionPruner::should_include_prefix` evaluates filters whose field is already present in a partial path. Filters for deeper fields are deferred. The single `_hoodie_partition_path` field used by timestamp-based key generators conservatively admits at intermediate levels — `should_include` runs at the leaf as before. 3. `FileLister::list_relevant_partition_paths` plumbs both: the predicate filters lake-format metadata dirs (`.hoodie`, `_delta_log`, `metadata`) and calls `should_include_prefix` for partition pruning. Scope is the no-MDT listing path only. The metadata-table FILES path is unchanged. Closes #205. ## Impact Performance. On remote object stores with selective partition filters, every `LIST` for a pruned subtree is eliminated. Even un-filtered listings benefit from parallel descent. No behavior change to the listed set on either path. ## Risk level low ## Documentation Update None required. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
