HaoYang670 commented on code in PR #4272:
URL: https://github.com/apache/arrow-datafusion/pull/4272#discussion_r1026146513
##########
datafusion/optimizer/src/eliminate_outer_join.rs:
##########
@@ -39,191 +38,107 @@ impl ReduceOuterJoin {
}
}
+/// Attempt to eliminate outer joins to inner joins.
+/// for query: select ... from a left join b on ... where b.xx = 100;
+/// if b.xx is null, and b.xx = 100 returns false, filtered those null rows.
+/// Therefore, there is no need to produce null rows for output, we can use
+/// inner join instead of left join.
+///
+/// Generally, an outer join can be eliminated to inner join if equals from
where
+/// return false while any inputs are null and columns of those equals are
come from
+/// nullable side of outer join.
Review Comment:
We can simplify the docs to just illustrate that " We will replace the outer
join by the inner join if the `null` rows in the nullable side will be filtered
out.
##########
datafusion/optimizer/src/eliminate_outer_join.rs:
##########
@@ -39,191 +38,107 @@ impl ReduceOuterJoin {
}
}
+/// Attempt to eliminate outer joins to inner joins.
Review Comment:
Question from a beginner:
Can a JOIN be always optimized to the INNER JOIN?
For example:
for a query
```
select ... from a full join b where a.xx = 100
```
we could only optimize it to a RIGHT JOIN.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]