Copilot commented on code in PR #2161:
URL: https://github.com/apache/auron/pull/2161#discussion_r3031775981
##########
native-engine/datafusion-ext-plans/src/joins/join_hash_map.rs:
##########
@@ -253,15 +253,26 @@ impl Table {
let mut e = entries![i] as usize;
loop {
let hash_matched =
self.map[e].hashes.simd_eq(Simd::splat(hashes[i]));
- let empty = self.map[e].hashes.simd_eq(Simd::splat(0));
- if let Some(pos) = (hash_matched | empty).first_set() {
+ // Fast path: check hash match first (common case)
+ if let Some(pos) = hash_matched.first_set() {
hashes[i] = unsafe {
// safety: transmute MapValue(u32) to u32
std::mem::transmute(self.map[e].values[pos])
};
break;
}
+
+ // Slow path: check empty slot only when no match
+ let empty = self.map[e].hashes.simd_eq(Simd::splat(0));
+ if empty.any() {
Review Comment:
The correctness of checking `hash_matched.first_set()` before computing the
`empty` mask relies on an invariant that, within a `MapValueGroup`, all
occupied lanes are packed from the beginning (i.e., there cannot be an empty
lane before a later occupied lane). That invariant currently holds because
insertion always uses `empty.first_set()`, but it's not stated here and a
future change (e.g., tombstones/deletes or a different insertion strategy)
could break this lookup logic. Please document this invariant explicitly (or
add a debug-only assertion) so this fast-path doesn't become subtly incorrect
later.
##########
native-engine/datafusion-ext-plans/src/joins/join_hash_map.rs:
##########
@@ -253,15 +253,26 @@ impl Table {
let mut e = entries![i] as usize;
loop {
let hash_matched =
self.map[e].hashes.simd_eq(Simd::splat(hashes[i]));
- let empty = self.map[e].hashes.simd_eq(Simd::splat(0));
- if let Some(pos) = (hash_matched | empty).first_set() {
+ // Fast path: check hash match first (common case)
+ if let Some(pos) = hash_matched.first_set() {
hashes[i] = unsafe {
// safety: transmute MapValue(u32) to u32
std::mem::transmute(self.map[e].values[pos])
};
break;
}
+
+ // Slow path: check empty slot only when no match
+ let empty = self.map[e].hashes.simd_eq(Simd::splat(0));
+ if empty.any() {
+ hashes[i] = unsafe {
+ // safety: transmute MapValue(u32) to u32
+ std::mem::transmute(MapValue::EMPTY)
+ };
Review Comment:
This adds another `std::mem::transmute(MapValue::EMPTY)` to convert a
`MapValue` into `u32`. Since `MapValue` is not marked `#[repr(transparent)]`,
transmutes between `MapValue` and `u32` are not layout-guaranteed by Rust's
spec. Consider replacing this with a safe conversion (e.g., accessing the inner
`.0` field within the module, or adding an `as_u32()` method / `From<MapValue>
for u32`), or mark `MapValue` as `#[repr(transparent)]` to make the conversion
sound.
```suggestion
hashes[i] = MapValue::EMPTY.0;
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]