paul-rogers commented on pull request #2412: URL: https://github.com/apache/drill/pull/2412#issuecomment-1004406533
@luocooong, here are answers to your questions: **Code gen**: Drill already supports "plain Java" code gen and use of the standard compiler without byte code fixup. It is what is used when you set the magic flag in each operator, then ask to save code for debugging. In the tests I did way back when, he "plain Java" path performed at least as well as the Janino/byte-code-fixup path. If you are not familiar with the "save code for debugging" mechanism, you should be if you want to look at optimization. I'd by happy to describe it (or hunt down to see if it is already described in the Wiki.) **Provided schema**: There are three cases to consider. 1. Explicit SELECT: `SELECT a, b, c FROM ...`. In this case, if we have a schema, then all operators will use exactly the same code and we can generate once. 2. "Lenient" wildcard: `SELECT * FROM ...`, where the file (such as JSON or CSV) may have more columns than described by the "provided schema". In this case, each reader is free to add the extra columns. Since each file may be different, each reader will produce a different schema, and downstream operators must deal with schema-on-read; the code cannot be shared. 3. "Strict" wildcard: readers include only those columns defined in the schema. For this option, we can also generate code once. **Refactors**: there are probably some random assortment of tickets filed as various people looked into this area. However, this is more than a "change this, improve that" kind of thing, it probably needs someone to spend time to fully understand what we have today and to do some research to see if there are ways to improve the execution model. Hence, this discussion. **Vectorization**: that is a complex discussion. I'll tackle that in another note. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
