[ 
https://issues.apache.org/jira/browse/DRILL-8088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17467749#comment-17467749
 ] 

ASF GitHub Bot commented on DRILL-8088:
---------------------------------------

paul-rogers commented on pull request #2412:
URL: https://github.com/apache/drill/pull/2412#issuecomment-1003831613


   Hi @luocooong, looks like you're looking at the expression and operator 
code. I wonder, is there anything you're trying to improve? Execution 
performance, maybe?
   
   As you know, Drill is very complicated. Drill uses code generation for 
expression evaluation. The code generation goes though a path that made sense 
for Java 5 (when Drill was written), but is now a bit awkward. We do have a way 
to use the native Java tools, which worked faster several years ago; that path 
is probably even faster now.
   
   Operator setup (another of your PRs) is impacted by code gen cost. Drill 
generates code for each fragment. If your query has 20 fragments, we generate 
code 20 times. The reason we must do that is that, in theory, every fragment 
can see a different schema, so the generated code could differ. By comparison, 
Spark generates code once, then pushes that code to all its executors.
   
   The generated code itself can be rather awkward for large queries: the code 
tries to inline everything which is great for small functions, but causes 
optimization problems as code blocks get larger.
   
   The mechanism to generate code, especially in the PROJECT operator, is 
vastly overly complex and could use a good re-think. It is so complex that it 
is hard to optimize because of the many assumptions and other issues embedded 
in the code.
   
   The generated code is meant to be small. But, over time, some operators 
added lots of "standard" code to the code generation path. The work is more 
work for the compiler and "byte code optimizer" that adds no per-query value. 
We've taken several passes at refactoring to pull that code of the code gen 
path, but there is more to do.
   
   Drill was designed to allow vector operations (hence Value Vectors), but the 
code was never written. In part because there are no CPU vector instructions 
that work with SQL nullable data. Arrow is supposed to have figured out 
solutions (Gandiva, is it?) which, perhaps we could consider (but probably only 
for non-nullable data.)
   
   Anyway, there are many areas we can improve. I can give you more details if 
I know what you're trying to accomplish.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


> Improve expression evaluation performance
> -----------------------------------------
>
>                 Key: DRILL-8088
>                 URL: https://issues.apache.org/jira/browse/DRILL-8088
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Execution - Codegen
>            Reporter: wtf
>            Assignee: wtf
>            Priority: Minor
>
> Found unnecessary map copy when doing expression evaluation, it will slow 
> down the codegen when the query include many "case when" or avg/stddev(the 
> reduced expressions include "case when"). In our case, the query include 314 
> avg, it takes 3+ seconds to generate the projector expressions(Intel(R) 
> Xeon(R) CPU E5-2682 v4 @ 2.50GHz 32cores).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to