[GitHub] [spark] maropu commented on pull request #29360: [SPARK-32542][SQL]Add a Batch in Optimizer to improve performance in multidimensional analysis

GitBox Thu, 13 Aug 2020 16:10:54 -0700


maropu commented on pull request #29360:
URL: https://github.com/apache/spark/pull/29360#issuecomment-673753587



   > But shuffle is happened during Aggregate here, right? By splitting, the 
total amount of shuffled data is not changed, but split into several ones. Does 
it really result significant improvement?
   
   As @viirya said above, I think the same. Why can this reduce the amount of 
shuffle writes? In the case of `expand -> partial aggregates`, the aggregates 
seem to have the same **total** amount of output size.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] maropu commented on pull request #29360: [SPARK-32542][SQL]Add a Batch in Optimizer to improve performance in multidimensional analysis

Reply via email to