hareshkh commented on code in PR #21057:
URL: https://github.com/apache/datafusion/pull/21057#discussion_r2966432536


##########
datafusion/optimizer/src/push_down_filter.rs:
##########
@@ -1130,6 +1137,13 @@ impl OptimizerRule for PushDownFilter {
             }
             LogicalPlan::Join(join) => push_down_join(join, 
Some(&filter.predicate)),
             LogicalPlan::TableScan(scan) => {
+                // If the scan has a fetch (limit), pushing filters into it
+                // would change semantics: the limit should apply before the
+                // filter, not after.
+                if scan.fetch.is_some() {
+                    filter.input = Arc::new(LogicalPlan::TableScan(scan));
+                    return Ok(Transformed::no(LogicalPlan::Filter(filter)));
+                }

Review Comment:
   @alamb : Curious about your thoughts for this part of the change?
   For a plan like:
   ```
   FILTER col = val
   |---- LIMIT 50
         |---- TABLE_SCAN
   ```
   
   After PushDownLimit folds the limit, we get FILTER -> TABLE_SCAN(fetch=50). 
This PR prevents pushing the filter into scan.filters when fetch is set.
   
   Since there is no ordering specified, the LIMIT is essentially 
non-deterministic in the rows it returns. So the filter can be moved past it or 
run after it — both are semantically correct. But if the table scan has an 
underlying implicit ordering (due to the layout of data or such), then pushing 
the filter around may be incorrect.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to