zanmato1984 commented on code in PR #45108:
URL: https://github.com/apache/arrow/pull/45108#discussion_r1913170496


##########
cpp/src/arrow/compute/key_map_internal_avx2.cc:
##########
@@ -392,16 +387,30 @@ int SwissTable::extract_group_ids_avx2(const int 
num_keys, const uint32_t* hashe
   } else {
     for (int i = 0; i < num_keys / unroll; ++i) {
       __m256i hash = _mm256_loadu_si256(reinterpret_cast<const 
__m256i*>(hashes) + i);
+      // Extend hash and local_slot to 64-bit to compute 64-bit group id 
offsets to
+      // gather from. This is to prevent index overflow issues in GH-44513.
+      // NB: Use zero-extend conversion for unsigned hash.
+      __m256i hash_lo = _mm256_cvtepu32_epi64(_mm256_castsi256_si128(hash));
+      __m256i hash_hi = _mm256_cvtepu32_epi64(_mm256_extracti128_si256(hash, 
1));
       __m256i local_slot =
           _mm256_set1_epi64x(reinterpret_cast<const 
uint64_t*>(local_slots)[i]);
-      local_slot = _mm256_shuffle_epi8(
-          local_slot, _mm256_setr_epi32(0x80808000, 0x80808001, 0x80808002, 
0x80808003,
-                                        0x80808004, 0x80808005, 0x80808006, 
0x80808007));
-      local_slot = _mm256_mullo_epi32(local_slot, 
_mm256_set1_epi32(byte_size));
-      __m256i pos = _mm256_srlv_epi32(hash, _mm256_set1_epi32(bits_hash_ - 
log_blocks_));
-      pos = _mm256_mullo_epi32(pos, _mm256_set1_epi32(byte_multiplier));
-      pos = _mm256_add_epi32(pos, local_slot);
-      __m256i group_id = _mm256_i32gather_epi32(elements, pos, 1);
+      __m256i local_slot_lo = _mm256_shuffle_epi8(
+          local_slot, _mm256_setr_epi32(0x80808000, 0x80808080, 0x80808001, 
0x80808080,
+                                        0x80808002, 0x80808080, 0x80808003, 
0x80808080));
+      __m256i local_slot_hi = _mm256_shuffle_epi8(
+          local_slot, _mm256_setr_epi32(0x80808004, 0x80808080, 0x80808005, 
0x80808080,
+                                        0x80808006, 0x80808080, 0x80808007, 
0x80808080));
+      local_slot_lo = _mm256_mul_epu32(local_slot_lo, 
_mm256_set1_epi32(byte_size));
+      local_slot_lo = _mm256_mul_epu32(local_slot_hi, 
_mm256_set1_epi32(byte_size));

Review Comment:
   This typo is made by [`86adc3d` 
(#45108)](https://github.com/apache/arrow/pull/45108/commits/86adc3dcc1970924f99fb8f296e599f9acdd00a7)
 when I was addressing one of the review comment.
   
   Because this code path requires BMI2 
https://github.com/apache/arrow/blob/b1bb4808d2a9e7d136c8a10bf544d67f3b906fb6/cpp/src/arrow/compute/key_map_internal.cc#L139-L144
 to enable, and our CI machine seems not having it, so it is not detected by 
CI. (I'm launching a crossbow submit with the typo commit to see if it can be 
detected by other CI jobs).
   
   After this typo is being made, I only ran the UT and the reported case (too 
big to put into UT) on a machine w/o BMI2 so I also failed to catch it. (I 
should've ran them on my other machine which has BMI2).
   
   Now I've ran them on my BMI2 machine and confirmed that this typo failed 
both the UT and the reported case. And fixing the typo passed them again.
   
   Sorry for the confusion and much thanks again for spotting it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to