Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/19082
  
    @maropu The codes to do aggregation are actually wrapped in a function 
`doAggregateWithKeys`/`doAggregateWithoutKey`. This is also the part of 
generated codes this PR improves by extracting functions.
    
    My initial thought is, during the processing of the query, this function 
`doAggregateWithKeys`/`doAggregateWithoutKey` actually only runs once to 
aggregate on all rows. No matter it is a long function or not, we don't have 
chance for JIT to step in. That said the length of this function doesn't impact 
too much in JIT issue.
    
    The long function issue affects the performance of wholestage codegen, 
because it is run many times in non-compiled way. It drags the speed of other 
generated codes. Since `doAggregateWithKeys`/`doAggregateWithoutKey` only run 
once, it doesn't impact much. So wholestage codegen query is still faster than 
non-wholestage codegen one.
    
    This PR improves the aggregation because it extracts small functions from 
`doAggregateWithKeys`/`doAggregateWithoutKey`. Those functions will be run many 
times in the wrapping function. So JIT has room to step in now.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to