jonathanc-n commented on code in PR #16210:
URL: https://github.com/apache/datafusion/pull/16210#discussion_r2141190301


##########
datafusion/physical-plan/src/joins/nested_loop_join.rs:
##########
@@ -178,6 +187,18 @@ pub struct NestedLoopJoinExec {
     metrics: ExecutionPlanMetricsSet,
     /// Cache holding plan properties like equivalences, output partitioning 
etc.
     cache: PlanProperties,
+    /// Null matching behavior: If `null_equals_null` is true, rows that have
+    /// `null`s in both left and right equijoin columns will be matched.
+    /// Otherwise, rows that have `null`s in the join columns will not be
+    /// matched and thus will not appear in the output.
+    null_equals_null: bool,
+    /// Set of equijoin columns from the relations: `(left_col, right_col)`
+    ///
+    /// This is optional as a nested loop join can be passed a 'on' clause
+    /// in the case that a Hash Join cost is more expensive than a
+    /// nested loop join or when a user would like to pick nested loop
+    /// join by hint
+    on: Option<Vec<(PhysicalExprRef, PhysicalExprRef)>>,

Review Comment:
   > Maybe we can change type to `Vec<(PhysicalExprRef, PhysicalExprRef)>` . 
`on.is_empty() == true` mean no equijoin columns.
   
   Because nested loop joins exclusively take in filter clauses more often than 
not, I decided it'd be easier to check if `on.is_some()`. Currently the 
physical planner doesnt take in nested loop join for equijoins on purpose, this 
will be added in another pr.
   
   > If we can merge `on` condition into `filter`, we can remove this field and 
reuse existing filter logic. For example: `filter: t1.a < t2.a, on: t1.c = 
t2.c` -> `filter: t1.a < t2.a and t1.c = t2.c`
   I separated merging the filter + on condition to make sure that we can do 
the null_equals_null check (this is how it is implemented for the other joins 
as well).



##########
datafusion/physical-plan/src/joins/nested_loop_join.rs:
##########
@@ -178,6 +187,18 @@ pub struct NestedLoopJoinExec {
     metrics: ExecutionPlanMetricsSet,
     /// Cache holding plan properties like equivalences, output partitioning 
etc.
     cache: PlanProperties,
+    /// Null matching behavior: If `null_equals_null` is true, rows that have
+    /// `null`s in both left and right equijoin columns will be matched.
+    /// Otherwise, rows that have `null`s in the join columns will not be
+    /// matched and thus will not appear in the output.
+    null_equals_null: bool,
+    /// Set of equijoin columns from the relations: `(left_col, right_col)`
+    ///
+    /// This is optional as a nested loop join can be passed a 'on' clause
+    /// in the case that a Hash Join cost is more expensive than a
+    /// nested loop join or when a user would like to pick nested loop
+    /// join by hint
+    on: Option<Vec<(PhysicalExprRef, PhysicalExprRef)>>,

Review Comment:
   > Maybe we can change type to `Vec<(PhysicalExprRef, PhysicalExprRef)>` . 
`on.is_empty() == true` mean no equijoin columns.
   
   Because nested loop joins exclusively take in filter clauses more often than 
not, I decided it'd be easier to check if `on.is_some()`. Currently the 
physical planner doesnt take in nested loop join for equijoins on purpose, this 
will be added in another pr.
   
   > If we can merge `on` condition into `filter`, we can remove this field and 
reuse existing filter logic. For example: `filter: t1.a < t2.a, on: t1.c = 
t2.c` -> `filter: t1.a < t2.a and t1.c = t2.c`
   
   I separated merging the filter + on condition to make sure that we can do 
the null_equals_null check (this is how it is implemented for the other joins 
as well).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to