suryaprasanna opened a new pull request, #17934: URL: https://github.com/apache/hudi/pull/17934
### Describe the issue this Pull Request addresses When querying with multiple filters on the same partition column (e.g., `datestr > '2016-01-01' AND datestr < '2016-12-31'`), some partition filters are being dropped during query optimization. This occurs because `ExpressionSet` deduplicates expressions, which incorrectly treats multiple filters on the same column as duplicates even when they have different predicates. ### Summary and Changelog This PR fixes partition filter loss by removing ExpressionSet conversion during partition pruning. **Changes:** - Removed `ExpressionSet` usage when appending partition filters in `Spark3HoodiePruneFileSourcePartitions` - Changed from `ExpressionSet(partitionFilters ++ extraPartitionFilter).toSeq` to direct concatenation `partitionFilters ++ extraPartitionFilter` - Added inline comment explaining the issue with ExpressionSet deduplication ### Impact Fixes incorrect query results where partition filters were being dropped, leading to more partitions being scanned than necessary. This improves query correctness and potentially performance by ensuring all intended partition filters are applied. ### Risk Level **Low** - Removes a problematic deduplication step that was incorrectly dropping valid filters. The change preserves all filters as intended. ### Documentation Update none ### Contributor's checklist - [x] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [x] Enough context is provided in the sections above - [x] Adequate tests were added if applicable -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
