Dandandan commented on a change in pull request #1135:
URL: https://github.com/apache/arrow-datafusion/pull/1135#discussion_r743729526



##########
File path: datafusion/src/physical_plan/hash_join.rs
##########
@@ -751,12 +788,19 @@ fn build_join_indexes(
 }
 
 macro_rules! equal_rows_elem {
-    ($array_type:ident, $l: ident, $r: ident, $left: ident, $right: ident) => 
{{
+    ($array_type:ident, $l: ident, $r: ident, $left: ident, $right: ident, 
$null_equal_safe: ident) => {{
         let left_array = $l.as_any().downcast_ref::<$array_type>().unwrap();
         let right_array = $r.as_any().downcast_ref::<$array_type>().unwrap();
 
         match (left_array.is_null($left), right_array.is_null($right)) {
             (false, false) => left_array.value($left) == 
right_array.value($right),
+            (true, true) => {

Review comment:
       I am concerned bout the performance impact.
   It would expect it to be optimized away, as it's using a macro for this and 
I would expect the branch to be removed based on the boolean constant, but not 
sure about that.

##########
File path: datafusion/src/physical_plan/hash_join.rs
##########
@@ -751,12 +788,19 @@ fn build_join_indexes(
 }
 
 macro_rules! equal_rows_elem {
-    ($array_type:ident, $l: ident, $r: ident, $left: ident, $right: ident) => 
{{
+    ($array_type:ident, $l: ident, $r: ident, $left: ident, $right: ident, 
$null_equal_null: ident) => {{
         let left_array = $l.as_any().downcast_ref::<$array_type>().unwrap();
         let right_array = $r.as_any().downcast_ref::<$array_type>().unwrap();
 
         match (left_array.is_null($left), right_array.is_null($right)) {
             (false, false) => left_array.value($left) == 
right_array.value($right),
+            (true, true) => {
+                if $null_equal_null {

Review comment:
       This can be `!$null_equal_null` instead (without if/else)

##########
File path: datafusion/src/physical_plan/hash_join.rs
##########
@@ -751,12 +788,19 @@ fn build_join_indexes(
 }
 
 macro_rules! equal_rows_elem {
-    ($array_type:ident, $l: ident, $r: ident, $left: ident, $right: ident) => 
{{
+    ($array_type:ident, $l: ident, $r: ident, $left: ident, $right: ident, 
$null_equal_safe: ident) => {{
         let left_array = $l.as_any().downcast_ref::<$array_type>().unwrap();
         let right_array = $r.as_any().downcast_ref::<$array_type>().unwrap();
 
         match (left_array.is_null($left), right_array.is_null($right)) {
             (false, false) => left_array.value($left) == 
right_array.value($right),
+            (true, true) => {

Review comment:
       Also the `is_null` check and downcasting per item, I would expect this 
has a potential higher impact, even if the added code has some non-zero cost.

##########
File path: datafusion/src/physical_plan/hash_join.rs
##########
@@ -751,12 +788,19 @@ fn build_join_indexes(
 }
 
 macro_rules! equal_rows_elem {
-    ($array_type:ident, $l: ident, $r: ident, $left: ident, $right: ident) => 
{{
+    ($array_type:ident, $l: ident, $r: ident, $left: ident, $right: ident, 
$null_equal_null: ident) => {{
         let left_array = $l.as_any().downcast_ref::<$array_type>().unwrap();
         let right_array = $r.as_any().downcast_ref::<$array_type>().unwrap();
 
         match (left_array.is_null($left), right_array.is_null($right)) {
             (false, false) => left_array.value($left) == 
right_array.value($right),
+            (true, true) => {
+                if $null_equal_null {

Review comment:
       Yes, I thought true/false were swapped in if/else, but probably I had 
that incorrect 👍 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to