[PR] feat: use ScanOperation for Spark 3.3 and 3.4 partition pruning [hudi]

via GitHub Sat, 17 Jan 2026 20:09:27 -0800


suryaprasanna opened a new pull request, #17936:
URL: https://github.com/apache/hudi/pull/17936


   ### Describe the issue this Pull Request addresses
   
   In Spark 3.3 and 3.4, the partition pruning logic should use `ScanOperation` 
instead of `PhysicalOperation` for better compatibility with Spark's internal 
query optimization. Currently, all Spark 3.x versions were using the same 
partition pruning implementation, which may not leverage version-specific 
optimizations.
   
   ### Summary and Changelog
   
   This PR adds version-specific partition pruning implementations for Spark 
3.3, 3.4, and 3.5 to use the appropriate pattern matching strategy.
   
   **Changes:**
   - Added `Spark33HoodiePruneFileSourcePartitions` using `ScanOperation` for 
Spark 3.3
   - Added `Spark34HoodiePruneFileSourcePartitions` using `PhysicalOperation` 
for Spark 3.4  
   - Added `Spark35HoodiePruneFileSourcePartitions` using `PhysicalOperation` 
for Spark 3.5
   - Updated `HoodieAnalysis` to route to the correct implementation based on 
Spark version
   
   ### Impact
   
   Improves partition pruning compatibility and optimization for Spark 3.3+ by 
using version-appropriate pattern matching. No breaking changes to public APIs.
   
   ### Risk Level
   
   **Low** - Adds version-specific implementations without changing core logic. 
Each Spark version gets the appropriate partition pruning strategy.
   
   ### Documentation Update
   
   none
   
   ### Contributor's checklist
   
   - [x] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [x] Enough context is provided in the sections above
   - [x] Adequate tests were added if applicable


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] feat: use ScanOperation for Spark 3.3 and 3.4 partition pruning [hudi]

Reply via email to