Dandandan opened a new pull request #9234: URL: https://github.com/apache/arrow/pull/9234
Currently, we loop to the hashmap for every key. However, as we receive a batch, if we have low cardinality keys in the table (or sorted data, etc.) then we could create a lot of empty batches. In the PR we keep track of which keys we received in the batch and only update the accumulators with the same keys. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
