[jira] [Created] (DRILL-5071) CodeGenerator class unnecessarily keeps two copies of generated code

2016-11-25 Thread Paul Rogers (JIRA)
Paul Rogers created DRILL-5071:
--

 Summary: CodeGenerator class unnecessarily keeps two copies of 
generated code 
 Key: DRILL-5071
 URL: https://issues.apache.org/jira/browse/DRILL-5071
 Project: Apache Drill
  Issue Type: Improvement
Affects Versions: 1.8.0
Reporter: Paul Rogers
Priority: Minor


Drill uses a code cache to avoid recompiling the same code multiple times. The 
cache is keyed on the generated code itself.

The generated code contains an ever-increasing name suffix of the form 
{{ProjectorGen123}}.

The unique name would be necessary if generated code shared a single name 
space. But, as currently implemented, each bit of generated code resides in its 
own private class loader: the code generated for one operator (say) can never 
class with that for another.

As a result, we can reduce the size and cost of the code cache by:

1. Eliminate the numeric suffix on the class name.
2. Eliminate the {{generifiedCode}} member variable in {{CodeGenerator}}.
3. Eliminate the search and replace that produces the "generified" code.
4. Use the actual generated code as the cache key instead of the "generified" 
version.
5. Rely on the distinct class loaders to keep generated class names separate.

The code cache holds up to 1000 classes. Classes can range from a few K to 
hundreds of K. By eliminating the second code copy, we may reduce heap memory 
pressure on the order of 50K * 1000 = 50 MB or so.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-5070) Code cache compares sources, but method order varies

2016-11-25 Thread Paul Rogers (JIRA)
Paul Rogers created DRILL-5070:
--

 Summary: Code cache compares sources, but method order varies
 Key: DRILL-5070
 URL: https://issues.apache.org/jira/browse/DRILL-5070
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.8.0
Reporter: Paul Rogers
Priority: Minor


The Drill generated code cache compares the sources from two different 
generation events to detect duplicate code. Unfortunately, the code generator 
emits methods in the order returned by {{Class.getDeclaredMethods}}, but this 
method makes no guarantee about the order of the methods.

This issue appeared when attempting to modify tests to capture generated code 
for comparison to future results. Even a simple generated case from 
{{ExpressionTest.testBasicExpression()}} that generates {{if(true) then 1 else 
0 end}} (all constants) produced methods in different orders on each test run.

The fix is simple, in the {{SignatureHolder}} constructor, sort methods by name 
after retrieving them from the class. The sort ensures that method order is 
deterministic. Fortunately, the number of methods is small, so the sort step 
adds little cost.

Without this fix, it is likely that the code cache holds many "copies" of the 
same code: equivalent code but with different method orders. After this fix, 
the cache should hold only one copy of each bit of equivalent code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-5069) MaterializeVisitor.visitSchemaPath silently ignores missing fields

2016-11-25 Thread Paul Rogers (JIRA)
Paul Rogers created DRILL-5069:
--

 Summary: MaterializeVisitor.visitSchemaPath silently ignores 
missing fields
 Key: DRILL-5069
 URL: https://issues.apache.org/jira/browse/DRILL-5069
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.8.0
Reporter: Paul Rogers
Priority: Minor


Not sure if this is a bug or a feature...

The test {{ExpressionTest}} tests various expressions by parsing them and 
generating code. The expression under test in the 
{{testExprParseLowerExponent}} test is:

{code}
multiply(`$f0`, 1.0e-4)
{code}

The first argument appears to be a reference to the first field (0-based 
indexes) in the given {{RecordBatch}}:

{code}
getExpressionCode("multiply(`$f0`, 1.0e-4)", batch);
{code}

Because of the way the mocked {{RecordBatch}} is handled, resolving {{$f0}} 
will produce a null result from: {{batch.getValueVectorId("$f0");}}. The code 
in {{MaterializeVisitor.visitSchemaPath}} logs a warning when the field 
reference is not found:

{code}
Unable to find value vector of path $f0, returning null instance.
{code}

The code for building up the multiply function appears to treat this case as a 
the equivalent of:

{code}
multiply(1.0e-4)
{code}

That is, it just ignores the missing field.

This seems to invite errors. Would expect that if code elsewhere in Drill 
generated the a reference to a missing field, that Drill should flag this as a 
serious error. Said another way, if the rest of Drill works properly, the 
missing field reference scenario should never occur in production.

For the particular test in question, use the mocking mechanism to pretend that 
the target field exists. See the {{testSchemaExpression}} test for an example.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4995) Allow lazy init when dynamic UDF support is disabled

2016-11-25 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong updated DRILL-4995:

Fix Version/s: (was: 1.10)
   1.10.0

> Allow lazy init when dynamic UDF support is disabled
> 
>
> Key: DRILL-4995
> URL: https://issues.apache.org/jira/browse/DRILL-4995
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.9.0
>Reporter: Roman
>Assignee: Arina Ielchiieva
>  Labels: ready-to-commit
> Fix For: 1.10.0
>
>
> Steps in 2 nodes cluster:
> In 1st node:
> 1. Register jar
> 2. Run function (success)
> 3. Disable dynamic UDF support 
> 4. Run function again (success)
> In 2nd node:
> 5. Try to run function (failed).
> In 1st node the function was initialized before disabling dynamic UDF 
> support. But in 2nd node the function was not initialized. So It seems we 
> need to allow lazy initialization when dynamic UDF support is disabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4980) Upgrading of the approach of parquet date correctness status detection

2016-11-25 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong updated DRILL-4980:

Fix Version/s: (was: 1.10)
   1.10.0

> Upgrading of the approach of parquet date correctness status detection
> --
>
> Key: DRILL-4980
> URL: https://issues.apache.org/jira/browse/DRILL-4980
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Affects Versions: 1.9.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
> Fix For: 1.10.0
>
>
> This jira is an addition for the 
> [DRILL-4203|https://issues.apache.org/jira/browse/DRILL-4203].
> The date correctness label for the new generated parquet files should be 
> upgraded. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)