[jira] [Commented] (DRILL-8088) Improve expression evaluation performance

ASF GitHub Bot (Jira) Mon, 03 Jan 2022 14:42:06 -0800


    [ 
https://issues.apache.org/jira/browse/DRILL-8088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17468259#comment-17468259
 ]


ASF GitHub Bot commented on DRILL-8088:
---------------------------------------

paul-rogers commented on pull request #2412:
URL: https://github.com/apache/drill/pull/2412#issuecomment-1004406533


   @luocooong, here are answers to your questions:
   
   **Code gen**: Drill already supports "plain Java" code gen and use of the 
standard compiler without byte code fixup. It is what is used when you set the 
magic flag in each operator, then ask to save code for debugging. In the tests 
I did way back when, he "plain Java" path performed at least as well as the 
Janino/byte-code-fixup path.
   
   If you are not familiar with the "save code for debugging" mechanism, you 
should be if you want to look at optimization. I'd by happy to describe it (or 
hunt down to see if it is already described in the Wiki.)
   
   **Provided schema**: There are three cases to consider.
   
   1. Explicit SELECT: `SELECT a, b, c FROM ...`. In this case, if we have a 
schema, then all operators will use exactly the same code and we can generate 
once.
   2. "Lenient" wildcard: `SELECT * FROM ...`, where the file (such as JSON or 
CSV) may have more columns than described by the "provided schema". In this 
case, each reader is free to add the extra columns. Since each file may be 
different, each reader will produce a different schema, and downstream 
operators must deal with schema-on-read; the code cannot be shared.
   3. "Strict" wildcard: readers include only those columns defined in the 
schema. For this option, we can also generate code once. 
   
   **Refactors**: there are probably some random assortment of tickets filed as 
various people looked into this area. However, this is more than a "change 
this, improve that" kind of thing, it probably needs someone to spend time to 
fully understand what we have today and to do some research to see if there are 
ways to improve the execution model. Hence, this discussion.
   
   **Vectorization**: that is a complex discussion. I'll tackle that in another 
note. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


> Improve expression evaluation performance
> -----------------------------------------
>
>                 Key: DRILL-8088
>                 URL: https://issues.apache.org/jira/browse/DRILL-8088
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Execution - Codegen
>            Reporter: wtf
>            Assignee: wtf
>            Priority: Minor
>
> Found unnecessary map copy when doing expression evaluation, it will slow 
> down the codegen when the query include many "case when" or avg/stddev(the 
> reduced expressions include "case when"). In our case, the query include 314 
> avg, it takes 3+ seconds to generate the projector expressions(Intel(R) 
> Xeon(R) CPU E5-2682 v4 @ 2.50GHz 32cores).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (DRILL-8088) Improve expression evaluation performance

Reply via email to