kosiew commented on code in PR #22150:
URL: https://github.com/apache/datafusion/pull/22150#discussion_r3239810410


##########
datafusion/physical-optimizer/src/limit_pushdown.rs:
##########
@@ -284,6 +307,61 @@ pub fn pushdown_limit_helper(
     }
 }
 
+/// Returns true if exact input statistics prove that applying the limit would
+/// not remove any rows.
+fn limit_satisfied_by_input(
+    plan: &Arc<dyn ExecutionPlan>,
+    skip: usize,
+    fetch: usize,
+) -> Result<bool> {
+    if skip > 0 {
+        return Ok(false);
+    }
+
+    if plan.output_partitioning().partition_count() != 1 {
+        return Ok(false);
+    }
+
+    let Some(num_rows) = limit_eliminable_exact_num_rows(plan)? else {
+        return Ok(false);
+    };
+
+    Ok(num_rows <= fetch)
+}
+
+/// Returns exact row counts only for operators whose cardinality guarantee is

Review Comment:
   The helper comment mentions returning exact row counts for operators whose 
guarantees are strong enough, but the implementation is intentionally a pretty 
small whitelist (`EmptyExec`, `PlaceholderRowExec`, and exact zero stats). It 
might help to clarify in the comment that this is meant to be a conservative 
whitelist.
   
   Another option would be adding a small test or note explaining why a general 
`Precision::Exact(n <= fetch)` is not considered safe enough here. I think that 
would make the invariant a bit clearer for future extensions.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to