Crystrix opened a new pull request #11789: URL: https://github.com/apache/arrow/pull/11789
I encountered a crash when executing GroupBy on specific data. The code and data to reproduce the crash can be found in the related JIRA ticket https://issues.apache.org/jira/browse/ARROW-14898 I think the root cause is the tail process in `Hashing::hash_varlen` of `key_hash.cc`. The steps of related code are as follows: 1. `Hashing::hash_varlen` calls `helper_tail` if key_length for the tail part of the key 2. `helper_tail` calls `util::SafeLoadAs` to load 8 bytes of data from the key 3. `util::SafeLoadAs` calls `std::memcpy` to copy 8 bytes of data from the key If the key is less than 8 bytes, the `std::memcpy` still copies 8 bytes which may access illegal memory. This PR adds a `length` parameter to those functions to copy just the size of the key for the tail. I'm not sure how to add a UT to test it, as it only happens on my specific data. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
