maropu commented on pull request #29360: URL: https://github.com/apache/spark/pull/29360#issuecomment-673753587
> But shuffle is happened during Aggregate here, right? By splitting, the total amount of shuffled data is not changed, but split into several ones. Does it really result significant improvement? As @viirya said above, I think the same. Why can this reduce the amount of shuffle writes? In the case of `expand -> partial aggregates`, the aggregates seem to have the same **total** amount of output size. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org