[I] Pushing Down Partition Pruning Conditions to Column Stats During Data Skipping [hudi]

via GitHub Sun, 30 Nov 2025 01:34:56 -0800


hudi-bot opened a new issue, #16364:
URL: https://github.com/apache/hudi/issues/16364


   In the current implementation of data skipping, column statistics for the 
entire table are read and then subjected to data skipping filtering operations 
based on these stats. When the table has a large volume of data and a high 
number of partitions, this approach can reduce the efficiency of data skipping, 
as partition pruning conditions are not utilized.
   
   By pushing down the conditions for partition filtering to after the column 
statistics are read and applying pruning at that point, the size of the column 
stats that are subsequently involved in data skipping will be significantly 
reduced. This not only saves time on later computations but also conserves 
memory.
   
    
   
   ## JIRA info
   
   - Link: https://issues.apache.org/jira/browse/HUDI-7291
   - Type: Improvement


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] Pushing Down Partition Pruning Conditions to Column Stats During Data Skipping [hudi]

Reply via email to