neilconway commented on code in PR #22870:
URL: https://github.com/apache/datafusion/pull/22870#discussion_r3423902646
##########
datafusion/optimizer/src/eliminate_outer_join.rs:
##########
@@ -252,190 +242,134 @@ pub fn eliminate_outer(
}
}
-/// Find the columns that `expr` rejects NULL on. If any of these columns are
-/// NULL, `expr` is guaranteed to evaluate to NULL or false, and the row
-/// therefore cannot survive a WHERE clause. Matching columns are appended to
-/// `null_rejecting_cols`.
-///
-/// The caller uses the result to decide whether an outer join's null-padded
-/// rows could survive the predicate above the join: if a column from the
-/// nullable side appears in `null_rejecting_cols`, it cannot, and the outer
-/// join can be converted to an inner join.
+#[derive(Default, Clone, Copy, Debug, PartialEq, Eq)]
+struct NullRejectingSides {
+ left: bool,
+ right: bool,
+}
+
+impl NullRejectingSides {
+ fn for_column(col: &Column, left_schema: &DFSchema, right_schema:
&DFSchema) -> Self {
+ Self {
+ left: left_schema.has_column(col),
+ right: right_schema.has_column(col),
+ }
+ }
+
+ fn union(self, other: Self) -> Self {
+ Self {
+ left: self.left || other.left,
+ right: self.right || other.right,
+ }
+ }
+
+ fn intersection(self, other: Self) -> Self {
+ Self {
+ left: self.left && other.left,
+ right: self.right && other.right,
+ }
+ }
+}
+
+/// Compute which join sides cannot be NULL-padded and still pass `expr` in a
+/// WHERE clause. A marked side means rows with that side padded to NULL are
+/// guaranteed to evaluate to NULL or false and be filtered out.
Review Comment:
```suggestion
/// Compute which join sides are null-rejected by `expr` in a WHERE clause.
/// For each marked side, rows padded with NULLs on that side are
guaranteed to
/// evaluate to NULL or false and be filtered out.
```
##########
datafusion/optimizer/src/eliminate_outer_join.rs:
##########
@@ -146,36 +146,26 @@ impl OptimizerRule for EliminateOuterJoin {
}
}
-/// Run the null-rejection analysis on `predicate` against `join`'s left/right
-/// schemas. Return `Some(new_join_plan)` if the join type can be tightened
-/// (e.g. LEFT → INNER), `None` otherwise.
+/// Determine which null-padded sides of `join` are rejected by `predicate`.
+/// Return `Some(new_join_plan)` if that side-level evidence can tighten the
+/// join type (e.g. LEFT → INNER), `None` otherwise.
Review Comment:
```suggestion
/// Attempt to simplify an outer join by analyzing `predicate` for
/// null-rejection. If the predicate filters out rows padded with NULLs on
one
/// or both sides, return a copy of `join` rewritten to an equivalent join
type
/// that omits those rows in the first place; otherwise return `None`.
```
##########
datafusion/optimizer/src/eliminate_outer_join.rs:
##########
@@ -252,190 +242,134 @@ pub fn eliminate_outer(
}
}
-/// Find the columns that `expr` rejects NULL on. If any of these columns are
-/// NULL, `expr` is guaranteed to evaluate to NULL or false, and the row
-/// therefore cannot survive a WHERE clause. Matching columns are appended to
-/// `null_rejecting_cols`.
-///
-/// The caller uses the result to decide whether an outer join's null-padded
-/// rows could survive the predicate above the join: if a column from the
-/// nullable side appears in `null_rejecting_cols`, it cannot, and the outer
-/// join can be converted to an inner join.
+#[derive(Default, Clone, Copy, Debug, PartialEq, Eq)]
+struct NullRejectingSides {
+ left: bool,
+ right: bool,
+}
+
+impl NullRejectingSides {
+ fn for_column(col: &Column, left_schema: &DFSchema, right_schema:
&DFSchema) -> Self {
+ Self {
+ left: left_schema.has_column(col),
+ right: right_schema.has_column(col),
+ }
+ }
Review Comment:
I think this merits an explanatory comment; it's a bit obscure what a helper
called `for_column` is actually doing (the name just describes the input, not
what the function really does). Maybe add a header comment like
```rust
/// The join side(s) a column belongs to.
///
/// A bare column reference is null-rejecting on its own side: if the column
/// is NULL, every null-propagating operator above it yields NULL and the row
/// is filtered.
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]