Paul Rogers created DRILL-5779:
----------------------------------

             Summary: HashAgg template is far too large, cause performance hit
                 Key: DRILL-5779
                 URL: https://issues.apache.org/jira/browse/DRILL-5779
             Project: Apache Drill
          Issue Type: Bug
    Affects Versions: 1.11.0
            Reporter: Paul Rogers


Drill uses code generation to produce query-specific code to copy values, 
perform calculations, and so on. Drill does this by generating code based on 
templates. Drill, internally, copies the template byte codes and merges them 
with generated by byte codes. (Drill does not use Java subclassing for 
generated code.)

The Hash Agg batch places thousands of lines of boilerplate code into the 
template. This forces Drill to:

1. Copy those byte codes *for every query*.
2. The "byte code fixup" logic to walk the byte code tree for the template *for 
every query.*
3. The code cache to cache a separate copy of the template *for every query*.

There is a clear performance cost from doing the copying and tree walking. 
There is a memory cost to buffering multiple copies of the same code. It is not 
clear that we have any data that says that doing this work provides benefits to 
the Drill user in terms of better stability, greater performance or more 
features.

We should consider moving the bulk of the code out of the template to avoid the 
overheads cited above. The result may be better performance and reduced memory 
pressure.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to