kosiew commented on code in PR #22150:
URL: https://github.com/apache/datafusion/pull/22150#discussion_r3239810410
##########
datafusion/physical-optimizer/src/limit_pushdown.rs:
##########
@@ -284,6 +307,61 @@ pub fn pushdown_limit_helper(
}
}
+/// Returns true if exact input statistics prove that applying the limit would
+/// not remove any rows.
+fn limit_satisfied_by_input(
+ plan: &Arc<dyn ExecutionPlan>,
+ skip: usize,
+ fetch: usize,
+) -> Result<bool> {
+ if skip > 0 {
+ return Ok(false);
+ }
+
+ if plan.output_partitioning().partition_count() != 1 {
+ return Ok(false);
+ }
+
+ let Some(num_rows) = limit_eliminable_exact_num_rows(plan)? else {
+ return Ok(false);
+ };
+
+ Ok(num_rows <= fetch)
+}
+
+/// Returns exact row counts only for operators whose cardinality guarantee is
Review Comment:
The helper comment mentions returning exact row counts for operators whose
guarantees are strong enough, but the implementation is intentionally a pretty
small whitelist (`EmptyExec`, `PlaceholderRowExec`, and exact zero stats). It
might help to clarify in the comment that this is meant to be a conservative
whitelist.
Another option would be adding a small test or note explaining why a general
`Precision::Exact(n <= fetch)` is not considered safe enough here. I think that
would make the invariant a bit clearer for future extensions.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]