zanmato1984 commented on issue #41813:
URL: https://github.com/apache/arrow/issues/41813#issuecomment-2168286330

   The bug is that in this line:
   
https://github.com/apache/arrow/blob/69e8a78c018da88b60f9eb2b3b45703f81f3c93d/cpp/src/arrow/compute/row/compare_internal_avx2.cc#L284
   If a slot of `offset_right` contains a value `>= 0x80000000`, which is an 
offset in row bigger than `2GB`, then it is added to `right_base` as a negative 
integer, causing gathering data from an invalid address.
   
   Proval followed:
   Similar to @amoeba 's reproducing, mine is:
   ```
   fault address: 0x4a7f85638
   right_base: 0x0000000527e1e800
   offset_right: (400023873834003288, 400025248223538328, 400025523922057392, 
-9217058400476779112)
   ```
   Further decoding each slot of `offset_right`, it is:
   ```
   (0x58D2B58 0x58D2B98 0x58D2C98 0x58D2CD8 0x3676B8B0 0x58D2D18 0x58D2D98 
0x80166E38)
   ```
   
   Note that the last offset is larger than `0x80000000`, and its signed 
interpretation is `-2146013640`. And `right_base(0x0000000527e1e800) + 
(-2146013640) = 0x4a7f85638` is exactly the offending address. I didn't 
calculate @amoeba 's case but I believe it has the same math.
   
   I'm working on a fix.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to