Crystrix opened a new pull request #11789:
URL: https://github.com/apache/arrow/pull/11789


   I encountered a crash when executing GroupBy on specific data. The code and 
data to reproduce the crash can be found in the related JIRA ticket 
https://issues.apache.org/jira/browse/ARROW-14898
    
   I think the root cause is the tail process in `Hashing::hash_varlen` of 
`key_hash.cc`.
   The steps of related code are as follows:
   1. `Hashing::hash_varlen` calls `helper_tail` if key_length for the tail 
part of the key
   2. `helper_tail` calls `util::SafeLoadAs` to load 8 bytes of data from the 
key
   3. `util::SafeLoadAs` calls `std::memcpy` to copy 8 bytes of data from the 
key
   
   If the key is less than 8 bytes, the `std::memcpy` still copies 8 bytes 
which may access illegal memory.
   
   This PR adds a `length` parameter to those functions to copy just the size 
of the key for the tail. I'm not sure how to add a UT to test it, as it only 
happens on my specific data.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to