ccat3z opened a new issue, #8025: URL: https://github.com/apache/incubator-gluten/issues/8025
### Backend VL (Velox) ### Bug description Distinct aggregation will merge all sorted spill file in `getOutput()` (`SpillPartition::createOrderedReader`). If there are too many spill files, reading the first batch of each file into memory will consume a significant amount of memory. In one of our internal cases, one task generated 300 spill files, which requires close to 3G of memory.  ### Spark version None ### Spark configurations _No response_ ### System information _No response_ ### Relevant logs _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
