[GitHub] [spark] YuzhouSun commented on pull request #35806: [SPARK-38505][SQL] Make partial aggregation adaptive

GitBox Tue, 06 Dec 2022 14:51:07 -0800


YuzhouSun commented on PR #35806:
URL: https://github.com/apache/spark/pull/35806#issuecomment-1340116541


   > I was interested in working on this, but I tested it with an online 
production task and found that the performance was regressing. Even though the 
aggregation time is shortened, the whole stage is more time consuming. Have you 
encountered this situation please?
   
   Hi @DenineLu, just curious, if possible, could you share more details about 
the regression case you encountered, please? For example, how much aggregation 
time is shortened, how much longer is the whole stage, number of partial 
aggregate input rows and number of output rows with and without the 
optimization, etc. Thank you.
   
   BTW the author moved related changes to a newer and larger PR: 
https://github.com/apache/spark/pull/36552 (for 
[SPARK-38506](https://issues.apache.org/jira/browse/SPARK-38506)
   Push partial aggregation through join)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] YuzhouSun commented on pull request #35806: [SPARK-38505][SQL] Make partial aggregation adaptive

Reply via email to