Dandandan commented on issue #7113:
URL:
https://github.com/apache/arrow-datafusion/issues/7113#issuecomment-1658320608
@metesynnada
I tried my own suggestion but so far did not recover the performance.
A simple solution might be just reversing the order of the candidate
build/probe indices, which doesn't seem to give much difference in performance:
```
diff --git a/datafusion/core/src/physical_plan/joins/hash_join.rs
b/datafusion/core/src/physical_plan/joins/hash_join.rs
index ce1d6dbcc..488af68c1 100644
--- a/datafusion/core/src/physical_plan/joins/hash_join.rs
+++ b/datafusion/core/src/physical_plan/joins/hash_join.rs
@@ -781,6 +781,8 @@ pub fn build_equal_condition_join_indices(
}
}
}
+ build_indices.as_slice_mut().reverse();
+ probe_indices.as_slice_mut().reverse();
let left: UInt64Array =
PrimitiveArray::new(build_indices.finish().into(), None);
let right: UInt32Array =
PrimitiveArray::new(probe_indices.finish().into(), None);
@@ -2757,12 +2759,12 @@ mod tests {
)?;
let mut left_ids = UInt64Builder::with_capacity(0);
- left_ids.append_value(0);
left_ids.append_value(1);
+ left_ids.append_value(0);
let mut right_ids = UInt32Builder::with_capacity(0);
- right_ids.append_value(0);
right_ids.append_value(1);
+ right_ids.append_value(0);
assert_eq!(left_ids.finish(), l);
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]