YuzhouSun commented on PR #35806: URL: https://github.com/apache/spark/pull/35806#issuecomment-1340116541
> I was interested in working on this, but I tested it with an online production task and found that the performance was regressing. Even though the aggregation time is shortened, the whole stage is more time consuming. Have you encountered this situation please? Hi @DenineLu, just curious, if possible, could you share more details about the regression case you encountered, please? For example, how much aggregation time is shortened, how much longer is the whole stage, number of partial aggregate input rows and number of output rows with and without the optimization, etc. Thank you. BTW the author moved related changes to a newer and larger PR: https://github.com/apache/spark/pull/36552 (for [SPARK-38506](https://issues.apache.org/jira/browse/SPARK-38506) Push partial aggregation through join) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org