[
https://issues.apache.org/jira/browse/DRILL-8088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17468259#comment-17468259
]
ASF GitHub Bot commented on DRILL-8088:
---------------------------------------
paul-rogers commented on pull request #2412:
URL: https://github.com/apache/drill/pull/2412#issuecomment-1004406533
@luocooong, here are answers to your questions:
**Code gen**: Drill already supports "plain Java" code gen and use of the
standard compiler without byte code fixup. It is what is used when you set the
magic flag in each operator, then ask to save code for debugging. In the tests
I did way back when, he "plain Java" path performed at least as well as the
Janino/byte-code-fixup path.
If you are not familiar with the "save code for debugging" mechanism, you
should be if you want to look at optimization. I'd by happy to describe it (or
hunt down to see if it is already described in the Wiki.)
**Provided schema**: There are three cases to consider.
1. Explicit SELECT: `SELECT a, b, c FROM ...`. In this case, if we have a
schema, then all operators will use exactly the same code and we can generate
once.
2. "Lenient" wildcard: `SELECT * FROM ...`, where the file (such as JSON or
CSV) may have more columns than described by the "provided schema". In this
case, each reader is free to add the extra columns. Since each file may be
different, each reader will produce a different schema, and downstream
operators must deal with schema-on-read; the code cannot be shared.
3. "Strict" wildcard: readers include only those columns defined in the
schema. For this option, we can also generate code once.
**Refactors**: there are probably some random assortment of tickets filed as
various people looked into this area. However, this is more than a "change
this, improve that" kind of thing, it probably needs someone to spend time to
fully understand what we have today and to do some research to see if there are
ways to improve the execution model. Hence, this discussion.
**Vectorization**: that is a complex discussion. I'll tackle that in another
note.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
> Improve expression evaluation performance
> -----------------------------------------
>
> Key: DRILL-8088
> URL: https://issues.apache.org/jira/browse/DRILL-8088
> Project: Apache Drill
> Issue Type: Improvement
> Components: Execution - Codegen
> Reporter: wtf
> Assignee: wtf
> Priority: Minor
>
> Found unnecessary map copy when doing expression evaluation, it will slow
> down the codegen when the query include many "case when" or avg/stddev(the
> reduced expressions include "case when"). In our case, the query include 314
> avg, it takes 3+ seconds to generate the projector expressions(Intel(R)
> Xeon(R) CPU E5-2682 v4 @ 2.50GHz 32cores).
--
This message was sent by Atlassian Jira
(v8.20.1#820001)