jonathanc-n commented on code in PR #16210: URL: https://github.com/apache/datafusion/pull/16210#discussion_r2144282145
########## datafusion/physical-plan/src/joins/nested_loop_join.rs: ########## @@ -178,6 +187,18 @@ pub struct NestedLoopJoinExec { metrics: ExecutionPlanMetricsSet, /// Cache holding plan properties like equivalences, output partitioning etc. cache: PlanProperties, + /// Null matching behavior: If `null_equals_null` is true, rows that have + /// `null`s in both left and right equijoin columns will be matched. + /// Otherwise, rows that have `null`s in the join columns will not be + /// matched and thus will not appear in the output. + null_equals_null: bool, + /// Set of equijoin columns from the relations: `(left_col, right_col)` + /// + /// This is optional as a nested loop join can be passed a 'on' clause + /// in the case that a Hash Join cost is more expensive than a + /// nested loop join or when a user would like to pick nested loop + /// join by hint + on: Option<Vec<(PhysicalExprRef, PhysicalExprRef)>>, Review Comment: Ok this sounds like a good plan, I think I will try to reiterate and run some benchmarks to see if it is worth it. > If null_equal_null is true, we can replace = in the ON clause with <=> and merge it into the filter in the planner This seems slower, we should just pass this to smj or hj. I can play around with the threshold logic that duckdb has for smaller tables where nljs may be faster. > There's **no equal(`=` not `<=>`) join condition in `on`,** so the planner choose nlj This is interesting because how will sort merge join + hash join when to treat . And since hash join and smj are usually faster for this equi conditions, instead of passing in <=> into the join filter we can flip the null equals null bit to true and run the <=> in the `on` clause; but this would clash with a `=` if null_equals_null is false. The solution I was thinking of is these cases: - `null_equals_null` equals true + `<=>` condition + `=` condition -> we pass the `<=>` expressions into the `on` clause - `null_equals_null` equals false + `<=>` condition + `=` condition -> `<=>` stays in join filter - `null_equals_null` equals false + `<=>` condition + no `=` condition -> we flip the `null_equals_null` to true and pass the `<=>` expressions into the `on` clause -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org