[ 
https://issues.apache.org/jira/browse/DRILL-6028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6028:
------------------------------------
    Description: 
Allow splitting generated code in ChainedHashTable into blocks to avoid "code 
too large" error.

*REPRODUCE*
File {{1200_columns.csv}}
{noformat}
0,1,2,3...1200
0,1,2,3...1200
{noformat}

Query
{noformat}
select columns[0], column[1]...columns[1200] from dfs.`1200_columns.csv`
union
select columns[0], column[1]...columns[1200] from dfs.`1200_columns.csv`
{noformat}

Error
{noformat}
Error: SYSTEM ERROR: CompileException: File 
'org.apache.drill.exec.compile.DrillJavaFileObject[HashTableGen10.java]', Line 
-7886, Column 24: HashTableGen10.java:57650: error: code too large
        public boolean isKeyMatchInternalBuild(int incomingRowIdx, int htRowIdx)
                       ^ (compiler.err.limit.code)
{noformat}

*ROOT CAUSE*
DRILL-4715 added ability to ensure that methods size won't go beyond the 64k 
limit imposed by JVM. {{BlkCreateMode.TRUE_IF_BOUND}} was added to create new 
block only if # of expressions added hit upper-bound defined by 
{{exec.java.compiler.exp_in_method_size}}. Once number of expressions in 
methods hits upper bound we create from call inner method.
Example: 
{noformat}
public void doSetup(RecordBatch incomingBuild, RecordBatch incomingProbe) 
throws SchemaChangeException {
// some logic

return doSetup0(incomingBuild, incomingProbe);
}
{noformat}

During code generation {{ChainedHashTable}} added all code in its methods in 
one block (using {{BlkCreateMode.FALSE}}) since {{getHashBuild}} and 
{{getHashProbe}} methods contained state and thus could not be split. In these 
methods hash was generated for each key expression. For the first key seed was 
0, subsequent keys hash was generated based on seed from previous key.
To allow splitting for there methods the following was done:
1. Method signatures was changed: added new parameter {{seedValue}}. Initially 
starting seed value was hard-coded during code generation (set to 0), now it is 
passed as method parameter.
2. Initially hash function call for all keys was transformed into one logical 
expression which did not allow splitting. Now we create logical expression for 
each key and thus splitting is possible. New {{seedValue}} parameter is used as 
seed holder to pass seed value for the next key.
3. {{ParameterExpression}} was added to generate reference to method parameter 
during code generation.

Code example:
{noformat}
    public int getHashBuild(int incomingRowIdx, int seedValue)
        throws SchemaChangeException
    {
        {
            NullableVarCharHolder out3 = new NullableVarCharHolder();
            {
                out3 .isSet = vv0 .getAccessor().isSet((incomingRowIdx));
                if (out3 .isSet == 1) {
                    out3 .buffer = vv0 .getBuffer();
                    long startEnd = vv0 
.getAccessor().getStartEnd((incomingRowIdx));
                    out3 .start = ((int) startEnd);
                    out3 .end = ((int)(startEnd >> 32));
                }
            }
            IntHolder seedValue4 = new IntHolder();
            seedValue4 .value = seedValue;
            //---- start of eval portion of hash32 function. ----//
            IntHolder out5 = new IntHolder();
            {
                final IntHolder out = new IntHolder();
                NullableVarCharHolder in = out3;
                IntHolder seed = seedValue4;
                 
Hash32FunctionsWithSeed$NullableVarCharHash_eval: {
    if (in.isSet == 0) {
        out.value = seed.value;
    } else
    {
        out.value = 
org.apache.drill.exec.expr.fn.impl.HashHelper.hash32(in.start, in.end, 
in.buffer, seed.value);
    }
}
 
                out5 = out;
            }
            //---- end of eval portion of hash32 function. ----//
            seedValue = out5 .value;

   return getHashBuild0((incomingRowIdx), (seedValue));
}
{noformat}

Examples of code generation:
{{HashTableGen5_for_40_columns_BEFORE.java}} - code compiles
{{HashTableGen5_for_40_columns_AFTER.java}} - code compiles

{{HashTableGen5_for_1200_columns_BEFORE.java}} - error during compilation, 
method too large
{{HashTableGen5_for_1200_columns_AFTER.java}} - code compiles since methods 
where split

  was:
Allow splitting generated code in ChainedHashTable into blocks to avoid "code 
too large" error.

*REPRODUCE*
File {{1200_columns.csv}}
{noformat}
0,1,2,3...1200
0,1,2,3...1200
{noformat}

Query
{noformat}
select columns[0], column[1]...columns[1200] from dfs.`1200_columns.csv`
union
select columns[0], column[1]...columns[1200] from dfs.`1200_columns.csv`
{noformat}

Error
{noformat}
Error: SYSTEM ERROR: CompileException: File 
'org.apache.drill.exec.compile.DrillJavaFileObject[HashTableGen10.java]', Line 
-7886, Column 24: HashTableGen10.java:57650: error: code too large
        public boolean isKeyMatchInternalBuild(int incomingRowIdx, int htRowIdx)
                       ^ (compiler.err.limit.code)
{noformat}

*ROOT CAUSE*
DRILL-4715 added ability to ensure that methods size won't go beyond the 64k 
limit imposed by JVM. {{BlkCreateMode.TRUE_IF_BOUND}} was added to create new 
block only if # of expressions added hit upper-bound defined by 
{{exec.java.compiler.exp_in_method_size}}. Once number of expressions in 
methods hits upper bound we create from call inner method.
Example: 
{noformat}
public void doSetup(RecordBatch incomingBuild, RecordBatch incomingProbe) 
throws SchemaChangeException {
// some logic

return doSetup0(incomingBuild, incomingProbe);
}
{noformat}

During code generation {{ChainedHashTable}} added all code in its methods in 
one block (using {{BlkCreateMode.FALSE}}) since {{getHashBuild}} and 
{{getHashProbe}} methods contained state and thus could not be split. In these 
methods hash was generated for each key expression. For the first key seed was 
0, subsequent keys hash was generated based on seed from previous key.
To allow splitting for there methods the following was done:
1. Method signatures was changed: added new parameter {{seedValue}}. Initially 
starting seed value was hard-coded during code generation (set to 0), now it is 
passed as method parameter.
2. Initially hash function call for all keys was transformed into one logical 
expression which did not allow splitting. Now we create logical expression for 
each key and thus splitting is possible. New {{seedValue}} parameter is used as 
seed holder to pass seed value for the next key.
3. {{ParameterExpression}} was added to generate reference to method parameter 
during code generation.

Code example:
{noformat}
    public int getHashBuild(int incomingRowIdx, int seedValue)
        throws SchemaChangeException
    {
        {
            NullableVarCharHolder out3 = new NullableVarCharHolder();
            {
                out3 .isSet = vv0 .getAccessor().isSet((incomingRowIdx));
                if (out3 .isSet == 1) {
                    out3 .buffer = vv0 .getBuffer();
                    long startEnd = vv0 
.getAccessor().getStartEnd((incomingRowIdx));
                    out3 .start = ((int) startEnd);
                    out3 .end = ((int)(startEnd >> 32));
                }
            }
            IntHolder seedValue4 = new IntHolder();
            seedValue4 .value = seedValue;
            //---- start of eval portion of hash32 function. ----//
            IntHolder out5 = new IntHolder();
            {
                final IntHolder out = new IntHolder();
                NullableVarCharHolder in = out3;
                IntHolder seed = seedValue4;
                 
Hash32FunctionsWithSeed$NullableVarCharHash_eval: {
    if (in.isSet == 0) {
        out.value = seed.value;
    } else
    {
        out.value = 
org.apache.drill.exec.expr.fn.impl.HashHelper.hash32(in.start, in.end, 
in.buffer, seed.value);
    }
}
 
                out5 = out;
            }
            //---- end of eval portion of hash32 function. ----//
            seedValue = out5 .value;

   return getHashBuild0((incomingRowIdx), (seedValue));
}
{noformat}


> Allow splitting generated code in ChainedHashTable into blocks to avoid "code 
> too large" error
> ----------------------------------------------------------------------------------------------
>
>                 Key: DRILL-6028
>                 URL: https://issues.apache.org/jira/browse/DRILL-6028
>             Project: Apache Drill
>          Issue Type: Improvement
>    Affects Versions: 1.10.0
>            Reporter: Arina Ielchiieva
>            Assignee: Arina Ielchiieva
>             Fix For: 1.13.0
>
>
> Allow splitting generated code in ChainedHashTable into blocks to avoid "code 
> too large" error.
> *REPRODUCE*
> File {{1200_columns.csv}}
> {noformat}
> 0,1,2,3...1200
> 0,1,2,3...1200
> {noformat}
> Query
> {noformat}
> select columns[0], column[1]...columns[1200] from dfs.`1200_columns.csv`
> union
> select columns[0], column[1]...columns[1200] from dfs.`1200_columns.csv`
> {noformat}
> Error
> {noformat}
> Error: SYSTEM ERROR: CompileException: File 
> 'org.apache.drill.exec.compile.DrillJavaFileObject[HashTableGen10.java]', 
> Line -7886, Column 24: HashTableGen10.java:57650: error: code too large
>         public boolean isKeyMatchInternalBuild(int incomingRowIdx, int 
> htRowIdx)
>                        ^ (compiler.err.limit.code)
> {noformat}
> *ROOT CAUSE*
> DRILL-4715 added ability to ensure that methods size won't go beyond the 64k 
> limit imposed by JVM. {{BlkCreateMode.TRUE_IF_BOUND}} was added to create new 
> block only if # of expressions added hit upper-bound defined by 
> {{exec.java.compiler.exp_in_method_size}}. Once number of expressions in 
> methods hits upper bound we create from call inner method.
> Example: 
> {noformat}
> public void doSetup(RecordBatch incomingBuild, RecordBatch incomingProbe) 
> throws SchemaChangeException {
> // some logic
> return doSetup0(incomingBuild, incomingProbe);
> }
> {noformat}
> During code generation {{ChainedHashTable}} added all code in its methods in 
> one block (using {{BlkCreateMode.FALSE}}) since {{getHashBuild}} and 
> {{getHashProbe}} methods contained state and thus could not be split. In 
> these methods hash was generated for each key expression. For the first key 
> seed was 0, subsequent keys hash was generated based on seed from previous 
> key.
> To allow splitting for there methods the following was done:
> 1. Method signatures was changed: added new parameter {{seedValue}}. 
> Initially starting seed value was hard-coded during code generation (set to 
> 0), now it is passed as method parameter.
> 2. Initially hash function call for all keys was transformed into one logical 
> expression which did not allow splitting. Now we create logical expression 
> for each key and thus splitting is possible. New {{seedValue}} parameter is 
> used as seed holder to pass seed value for the next key.
> 3. {{ParameterExpression}} was added to generate reference to method 
> parameter during code generation.
> Code example:
> {noformat}
>     public int getHashBuild(int incomingRowIdx, int seedValue)
>         throws SchemaChangeException
>     {
>         {
>             NullableVarCharHolder out3 = new NullableVarCharHolder();
>             {
>                 out3 .isSet = vv0 .getAccessor().isSet((incomingRowIdx));
>                 if (out3 .isSet == 1) {
>                     out3 .buffer = vv0 .getBuffer();
>                     long startEnd = vv0 
> .getAccessor().getStartEnd((incomingRowIdx));
>                     out3 .start = ((int) startEnd);
>                     out3 .end = ((int)(startEnd >> 32));
>                 }
>             }
>             IntHolder seedValue4 = new IntHolder();
>             seedValue4 .value = seedValue;
>             //---- start of eval portion of hash32 function. ----//
>             IntHolder out5 = new IntHolder();
>             {
>                 final IntHolder out = new IntHolder();
>                 NullableVarCharHolder in = out3;
>                 IntHolder seed = seedValue4;
>                  
> Hash32FunctionsWithSeed$NullableVarCharHash_eval: {
>     if (in.isSet == 0) {
>         out.value = seed.value;
>     } else
>     {
>         out.value = 
> org.apache.drill.exec.expr.fn.impl.HashHelper.hash32(in.start, in.end, 
> in.buffer, seed.value);
>     }
> }
>  
>                 out5 = out;
>             }
>             //---- end of eval portion of hash32 function. ----//
>             seedValue = out5 .value;
>    return getHashBuild0((incomingRowIdx), (seedValue));
> }
> {noformat}
> Examples of code generation:
> {{HashTableGen5_for_40_columns_BEFORE.java}} - code compiles
> {{HashTableGen5_for_40_columns_AFTER.java}} - code compiles
> {{HashTableGen5_for_1200_columns_BEFORE.java}} - error during compilation, 
> method too large
> {{HashTableGen5_for_1200_columns_AFTER.java}} - code compiles since methods 
> where split



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to