[jira] [Created] (DRILL-5071) CodeGenerator class unnecessarily keeps two copies of generated code
Paul Rogers created DRILL-5071: -- Summary: CodeGenerator class unnecessarily keeps two copies of generated code Key: DRILL-5071 URL: https://issues.apache.org/jira/browse/DRILL-5071 Project: Apache Drill Issue Type: Improvement Affects Versions: 1.8.0 Reporter: Paul Rogers Priority: Minor Drill uses a code cache to avoid recompiling the same code multiple times. The cache is keyed on the generated code itself. The generated code contains an ever-increasing name suffix of the form {{ProjectorGen123}}. The unique name would be necessary if generated code shared a single name space. But, as currently implemented, each bit of generated code resides in its own private class loader: the code generated for one operator (say) can never class with that for another. As a result, we can reduce the size and cost of the code cache by: 1. Eliminate the numeric suffix on the class name. 2. Eliminate the {{generifiedCode}} member variable in {{CodeGenerator}}. 3. Eliminate the search and replace that produces the "generified" code. 4. Use the actual generated code as the cache key instead of the "generified" version. 5. Rely on the distinct class loaders to keep generated class names separate. The code cache holds up to 1000 classes. Classes can range from a few K to hundreds of K. By eliminating the second code copy, we may reduce heap memory pressure on the order of 50K * 1000 = 50 MB or so. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-5070) Code cache compares sources, but method order varies
Paul Rogers created DRILL-5070: -- Summary: Code cache compares sources, but method order varies Key: DRILL-5070 URL: https://issues.apache.org/jira/browse/DRILL-5070 Project: Apache Drill Issue Type: Bug Affects Versions: 1.8.0 Reporter: Paul Rogers Priority: Minor The Drill generated code cache compares the sources from two different generation events to detect duplicate code. Unfortunately, the code generator emits methods in the order returned by {{Class.getDeclaredMethods}}, but this method makes no guarantee about the order of the methods. This issue appeared when attempting to modify tests to capture generated code for comparison to future results. Even a simple generated case from {{ExpressionTest.testBasicExpression()}} that generates {{if(true) then 1 else 0 end}} (all constants) produced methods in different orders on each test run. The fix is simple, in the {{SignatureHolder}} constructor, sort methods by name after retrieving them from the class. The sort ensures that method order is deterministic. Fortunately, the number of methods is small, so the sort step adds little cost. Without this fix, it is likely that the code cache holds many "copies" of the same code: equivalent code but with different method orders. After this fix, the cache should hold only one copy of each bit of equivalent code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-5069) MaterializeVisitor.visitSchemaPath silently ignores missing fields
Paul Rogers created DRILL-5069: -- Summary: MaterializeVisitor.visitSchemaPath silently ignores missing fields Key: DRILL-5069 URL: https://issues.apache.org/jira/browse/DRILL-5069 Project: Apache Drill Issue Type: Bug Affects Versions: 1.8.0 Reporter: Paul Rogers Priority: Minor Not sure if this is a bug or a feature... The test {{ExpressionTest}} tests various expressions by parsing them and generating code. The expression under test in the {{testExprParseLowerExponent}} test is: {code} multiply(`$f0`, 1.0e-4) {code} The first argument appears to be a reference to the first field (0-based indexes) in the given {{RecordBatch}}: {code} getExpressionCode("multiply(`$f0`, 1.0e-4)", batch); {code} Because of the way the mocked {{RecordBatch}} is handled, resolving {{$f0}} will produce a null result from: {{batch.getValueVectorId("$f0");}}. The code in {{MaterializeVisitor.visitSchemaPath}} logs a warning when the field reference is not found: {code} Unable to find value vector of path $f0, returning null instance. {code} The code for building up the multiply function appears to treat this case as a the equivalent of: {code} multiply(1.0e-4) {code} That is, it just ignores the missing field. This seems to invite errors. Would expect that if code elsewhere in Drill generated the a reference to a missing field, that Drill should flag this as a serious error. Said another way, if the rest of Drill works properly, the missing field reference scenario should never occur in production. For the particular test in question, use the mocking mechanism to pretend that the target field exists. See the {{testSchemaExpression}} test for an example. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4995) Allow lazy init when dynamic UDF support is disabled
[ https://issues.apache.org/jira/browse/DRILL-4995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zelaine Fong updated DRILL-4995: Fix Version/s: (was: 1.10) 1.10.0 > Allow lazy init when dynamic UDF support is disabled > > > Key: DRILL-4995 > URL: https://issues.apache.org/jira/browse/DRILL-4995 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Affects Versions: 1.9.0 >Reporter: Roman >Assignee: Arina Ielchiieva > Labels: ready-to-commit > Fix For: 1.10.0 > > > Steps in 2 nodes cluster: > In 1st node: > 1. Register jar > 2. Run function (success) > 3. Disable dynamic UDF support > 4. Run function again (success) > In 2nd node: > 5. Try to run function (failed). > In 1st node the function was initialized before disabling dynamic UDF > support. But in 2nd node the function was not initialized. So It seems we > need to allow lazy initialization when dynamic UDF support is disabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4980) Upgrading of the approach of parquet date correctness status detection
[ https://issues.apache.org/jira/browse/DRILL-4980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zelaine Fong updated DRILL-4980: Fix Version/s: (was: 1.10) 1.10.0 > Upgrading of the approach of parquet date correctness status detection > -- > > Key: DRILL-4980 > URL: https://issues.apache.org/jira/browse/DRILL-4980 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Parquet >Affects Versions: 1.9.0 >Reporter: Vitalii Diravka >Assignee: Vitalii Diravka > Fix For: 1.10.0 > > > This jira is an addition for the > [DRILL-4203|https://issues.apache.org/jira/browse/DRILL-4203]. > The date correctness label for the new generated parquet files should be > upgraded. -- This message was sent by Atlassian JIRA (v6.3.4#6332)