suryaprasanna opened a new pull request, #17934:
URL: https://github.com/apache/hudi/pull/17934

   ### Describe the issue this Pull Request addresses
   
   When querying with multiple filters on the same partition column (e.g., 
`datestr > '2016-01-01' AND datestr < '2016-12-31'`), some partition filters 
are being dropped during query optimization. This occurs because 
`ExpressionSet` deduplicates expressions, which incorrectly treats multiple 
filters on the same column as duplicates even when they have different 
predicates.
   
   ### Summary and Changelog
   
   This PR fixes partition filter loss by removing ExpressionSet conversion 
during partition pruning.
   
   **Changes:**
   - Removed `ExpressionSet` usage when appending partition filters in 
`Spark3HoodiePruneFileSourcePartitions`
   - Changed from `ExpressionSet(partitionFilters ++ 
extraPartitionFilter).toSeq` to direct concatenation `partitionFilters ++ 
extraPartitionFilter`
   - Added inline comment explaining the issue with ExpressionSet deduplication
   
   ### Impact
   
   Fixes incorrect query results where partition filters were being dropped, 
leading to more partitions being scanned than necessary. This improves query 
correctness and potentially performance by ensuring all intended partition 
filters are applied.
   
   ### Risk Level
   
   **Low** - Removes a problematic deduplication step that was incorrectly 
dropping valid filters. The change preserves all filters as intended.
   
   ### Documentation Update
   
   none
   
   ### Contributor's checklist
   
   - [x] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [x] Enough context is provided in the sections above
   - [x] Adequate tests were added if applicable


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to