Kapil Singh created SPARK-18091:
-----------------------------------

             Summary: Deep if expressions cause Generated 
SpecificUnsafeProjection code to exceed JVM code size limit
                 Key: SPARK-18091
                 URL: https://issues.apache.org/jira/browse/SPARK-18091
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 1.6.1
            Reporter: Kapil Singh
            Priority: Critical


*Problem Description:*
I have an application in which a lot of if-else decisioning is involved to 
generate output. I'm getting following exception:
Caused by: org.codehaus.janino.JaninoRuntimeException: Code of method 
"(Lorg/apache/spark/sql/catalyst/expressions/GeneratedClass$SpecificUnsafeProjection;Lorg/apache/spark/sql/catalyst/InternalRow;)V"
 of class 
"org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection"
 grows beyond 64 KB
        at org.codehaus.janino.CodeContext.makeSpace(CodeContext.java:941)
        at org.codehaus.janino.CodeContext.write(CodeContext.java:874)
        at org.codehaus.janino.CodeContext.writeBranch(CodeContext.java:965)
        at org.codehaus.janino.UnitCompiler.writeBranch(UnitCompiler.java:10261)

*Steps to Reproduce:*
I've come up with a unit test which I was able to run in 
CodeGenerationSuite.scala:
{code}
test("split large if expressions into blocks due to JVM code size limit") {
    val row = 
create_row("afafFAFFsqcategory2dadDADcategory8sasasadscategory24", 0)
    val inputStr = 'a.string.at(0)
    val inputIdx = 'a.int.at(1)

    val length = 10
    val valuesToCompareTo = for (i <- 1 to (length + 1)) yield ("category" + i)

    val initCondition = EqualTo(RegExpExtract(inputStr, Literal("category1"), 
inputIdx), valuesToCompareTo(0))
    var res: Expression = If(initCondition, Literal("category1"), 
Literal("NULL"))
    var cummulativeCondition: Expression = Not(initCondition)
    for (index <- 1 to length) {
      val valueExtractedFromInput = RegExpExtract(inputStr, Literal("category" 
+ (index + 1).toString), inputIdx)
      val currComparee = If(cummulativeCondition, valueExtractedFromInput, 
Literal("NULL"))
      val currCondition = EqualTo(currComparee, valuesToCompareTo(index))
      val combinedCond = And(cummulativeCondition, currCondition)
      res = If(combinedCond, If(combinedCond, valueExtractedFromInput, 
Literal("NULL")), res)
      cummulativeCondition = And(Not(currCondition), cummulativeCondition)
    }

    val expressions = Seq(res)
    val plan = GenerateUnsafeProjection.generate(expressions, true)
    val actual = plan(row).toSeq(expressions.map(_.dataType))
    val expected = Seq(UTF8String.fromString("category2"))

    if (!checkResult(actual, expected)) {
      fail(s"Incorrect Evaluation: expressions: $expressions, actual: $actual, 
expected: $expected")
    }
  }
{code}

*Root Cause:*
Current splitting of Projection codes doesn't (and can't) take care of 
splitting the generated code for individual output column expressions. So it 
can grow to exceed JVM limit.

*Proposed Fix:*
If expression should place it's predicate, true value and false value 
expressions' generated code in separate methods in context and call these 
methods instead of putting the whole code directly in its generated code



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to