GitHub user lw-lin opened a pull request: https://github.com/apache/spark/pull/14370
[SPARK-16713][SQL] Check codegen method size ⤠8K on compile ## What changes were proposed in this pull request? Ideally, we would wish codegen methods to be less than 8KB for bytecode size. Beyond 8K JIT won't compile and can cause performance degradation. Instead of understanding the generated source code and automatically breaking large methods into smaller ones (which is also a little hard to do), it'd be better we discover large methods asap and then manually improve the source code. This patch adds support for checking codegen method size on compile. We can specify for each method what is the expected behavior when it exceeds 8KB for bytecode size: ```scala /** No-op when the method exceeds 8K size. */ case object NO_OP extends FunctionSizeHint /** Log a warning when the method exceeds 8K size. */ case object WARN_IF_EXCEEDS_JIT_LIMIT extends FunctionSizeHint /** * Throw a compilation exception when the method exceeds 8K size. * Fail fast so that we can catch it asap; this is useful in testing corner/edge cases. */ case object ERROR_IF_EXCEEDS_JIT_LIMIT extends FunctionSizeHint ``` This way we can test against some extreme case such as a 10000-columns-wide table, to see if the generated code is small enough to get a chance to be JITed at runtime. ## Sample Usage sample usage: ```scala val codeBody = s""" public static void inc() { int i = 0; i++; i++; ... // enough i++ s for this inc() methods to exceed 8K size } """ // == prior to this patch =================== ctx = new CodegenContext() ctx.addNewFunction("inc", genCode(15000)) CodeGenerator.compile( new CodeAndComment(ctx.declareAddedFunctions(), emptyComments)) // == after this patch ====================== // Exception: failed to compile. Method org.apache.spark.sql.catalyst.expressions.GeneratedClass.inc // should not exceed 8K size limit -- observed size is 45003 ctx = new CodegenContext() ctx.addNewFunction("inc", genCode(15000), CodegenContext.ERROR_IF_EXCEEDS_JIT_LIMIT) CodeGenerator.compile( new CodeAndComment(ctx.declareAddedFunctions(), emptyComments, ctx.getFuncToSizeHintMap)) ``` ## How was this patch tested? - new unit test (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) You can merge this pull request into a Git repository by running: $ git pull https://github.com/lw-lin/spark codegen-method-size-8k Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14370.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14370 ---- commit 19b6d1d056f2bd1b1bb4d502c31b418ffbfe8d65 Author: Liwei Lin <lwl...@gmail.com> Date: 2016-07-26T08:39:36Z Check codegen method size ⤠8K on compile ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org