alamb commented on issue #18411: URL: https://github.com/apache/datafusion/issues/18411#issuecomment-3503726327
> The decoder maintained a hash table of strings and single instanced everything and memoized hash values. This is basically how the existing ByteViewGroupBy thing works (it is basically interning the string values into a (new) output array) There is no global string interning, however, so when we see strings coming in from arrow arrays, they aren't already interned and we don't have a pre-existing hash However, it is likely we compute a hash for the same string many times (e.g. they will be computed for repartitioning and then again in the group by itself) We discussed something similar here - https://github.com/apache/datafusion/issues/12596 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
