sdf-jkl commented on PR #20497:
URL: https://github.com/apache/datafusion/pull/20497#issuecomment-4061269624

   Sorry, I think I got things mixed up while working on this.
   
   We consider a column `sorted` by checking `page_index` ordering (`min/max`) 
for that column across pages in each row group. If  those pages are ordered, we 
treat that column as sorted.
   
   Given that, this column is usually a strong candidate for row group/page 
pruning. So we prune.
   
   After pruning, the remaining work goes to `row_filter`. For a range 
predicate on a sorted column, `row_filter` is then likely to trim mostly at 
kept-window boundaries (often a relatively small contiguous region, though it 
can still include full page(s) once we use the selection on heavier columns)
   
   This should make the incremental benefit of using a predicate on this column 
early in Late Materialization likely marginal in many workloads, given most of 
the pruning value was already captured earlier.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to