GitHub user lw-lin opened a pull request:

    https://github.com/apache/spark/pull/14370

    [SPARK-16713][SQL] Check codegen method size ≤ 8K on compile

    ## What changes were proposed in this pull request?
    
    Ideally, we would wish codegen methods to be less than 8KB for bytecode 
size. Beyond 8K JIT won't compile and can cause performance degradation.
    
    Instead of understanding the generated source code and automatically 
breaking large methods into smaller ones (which is also a little hard to do), 
it'd be better we discover large methods asap and then manually improve the 
source code.
    
    This patch adds support for checking codegen method size on compile. We can 
specify for each method what is the expected behavior when it exceeds 8KB for 
bytecode size:
    
    ```scala
     /** No-op when the method exceeds 8K size. */
     case object NO_OP extends FunctionSizeHint
    
     /** Log a warning when the method exceeds 8K size. */
     case object WARN_IF_EXCEEDS_JIT_LIMIT extends FunctionSizeHint
    
     /**
      * Throw a compilation exception when the method exceeds 8K size.
      * Fail fast so that we can catch it asap; this is useful in testing 
corner/edge cases.
      */
     case object ERROR_IF_EXCEEDS_JIT_LIMIT extends FunctionSizeHint
    ```
    
    This way we can test against some extreme case such as a 10000-columns-wide 
table, to see if the generated code is small enough to get a chance to be JITed 
at runtime.
    
    ## Sample Usage
    
    sample usage: 
    
    ```scala
    
    val codeBody =
        s"""
          public static void inc() {
            int i = 0;
            i++;
            i++;
            ... // enough i++ s for this inc() methods to exceed 8K size
          }
        """
    
    // == prior to this patch ===================
    ctx = new CodegenContext()
    ctx.addNewFunction("inc", genCode(15000))
    CodeGenerator.compile(
      new CodeAndComment(ctx.declareAddedFunctions(), emptyComments))
    
    // == after this patch ======================
    // Exception: failed to compile. Method 
org.apache.spark.sql.catalyst.expressions.GeneratedClass.inc
    // should not exceed 8K size limit -- observed size is 45003
    ctx = new CodegenContext()
    ctx.addNewFunction("inc", genCode(15000), 
CodegenContext.ERROR_IF_EXCEEDS_JIT_LIMIT)
    CodeGenerator.compile(
      new CodeAndComment(ctx.declareAddedFunctions(), emptyComments, 
ctx.getFuncToSizeHintMap))
    ```
    
    ## How was this patch tested?
    
    - new unit test
    
    
    (If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)
    
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/lw-lin/spark codegen-method-size-8k

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/14370.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #14370
    
----
commit 19b6d1d056f2bd1b1bb4d502c31b418ffbfe8d65
Author: Liwei Lin <lwl...@gmail.com>
Date:   2016-07-26T08:39:36Z

    Check codegen method size ≤ 8K on compile

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to