2010YOUY01 commented on code in PR #23078:
URL: https://github.com/apache/datafusion/pull/23078#discussion_r3456561811
##########
datafusion/physical-plan/src/joins/hash_join/exec.rs:
##########
@@ -6610,6 +6611,63 @@ mod tests {
Ok(())
}
+ #[test]
+ fn test_swap_inputs_clears_dynamic_filter() -> Result<()> {
Review Comment:
I'm wondering: are you using `swap_inputs()` directly downstream? 🤔
I think the root cause is that `swap_inputs()` has not been clearly
specified. It implicitly assumes a bunch of preconditions, so it is only mostly
safe to use internally when we strictly follow the optimizer rule order.
However, it is now a public API, so downstream usages may trigger subtle
bugs if users are unaware of those preconditions.
See the optimizer rule order below. Internally, `swap_inputs()` is only
called inside that optimizer rule, so we can ensure it is called only when
`dynamic_filter` is `None`:
https://github.com/apache/datafusion/blob/46b508eb4019868a1d4ee676f6604f298cb3a862/datafusion/physical-optimizer/src/optimizer.rs#L161
I tried adding `assert!(self.dynamic_filter.is_none())` at the beginning of
`HashJoinExec::swap_inputs()`, and the internal tests still pass.
So I suggest we discuss separately what extra contract you need for
downstream usage, and then extend `swap_inputs()` accordingly. (and also add
the above assertions to prevent similar issues)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]