Github user viirya commented on the issue: https://github.com/apache/spark/pull/19082 @maropu The codes to do aggregation are actually wrapped in a function `doAggregateWithKeys`/`doAggregateWithoutKey`. This is also the part of generated codes this PR improves by extracting functions. My initial thought is, during the processing of the query, this function `doAggregateWithKeys`/`doAggregateWithoutKey` actually only runs once to aggregate on all rows. No matter it is a long function or not, we don't have chance for JIT to step in. That said the length of this function doesn't impact too much in JIT issue. The long function issue affects the performance of wholestage codegen, because it is run many times in non-compiled way. It drags the speed of other generated codes. Since `doAggregateWithKeys`/`doAggregateWithoutKey` only run once, it doesn't impact much. So wholestage codegen query is still faster than non-wholestage codegen one. This PR improves the aggregation because it extracts small functions from `doAggregateWithKeys`/`doAggregateWithoutKey`. Those functions will be run many times in the wrapping function. So JIT has room to step in now.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org