[ 
https://issues.apache.org/jira/browse/SPARK-18091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell updated SPARK-18091:
--------------------------------------
    Target Version/s: 2.1.0

> Deep if expressions cause Generated SpecificUnsafeProjection code to exceed 
> JVM code size limit
> -----------------------------------------------------------------------------------------------
>
>                 Key: SPARK-18091
>                 URL: https://issues.apache.org/jira/browse/SPARK-18091
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.6.1
>            Reporter: Kapil Singh
>            Priority: Critical
>
> *Problem Description:*
> I have an application in which a lot of if-else decisioning is involved to 
> generate output. I'm getting following exception:
> Caused by: org.codehaus.janino.JaninoRuntimeException: Code of method 
> "(Lorg/apache/spark/sql/catalyst/expressions/GeneratedClass$SpecificUnsafeProjection;Lorg/apache/spark/sql/catalyst/InternalRow;)V"
>  of class 
> "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection"
>  grows beyond 64 KB
>       at org.codehaus.janino.CodeContext.makeSpace(CodeContext.java:941)
>       at org.codehaus.janino.CodeContext.write(CodeContext.java:874)
>       at org.codehaus.janino.CodeContext.writeBranch(CodeContext.java:965)
>       at org.codehaus.janino.UnitCompiler.writeBranch(UnitCompiler.java:10261)
> *Steps to Reproduce:*
> I've come up with a unit test which I was able to run in 
> CodeGenerationSuite.scala:
> {code}
> test("split large if expressions into blocks due to JVM code size limit") {
>     val row = 
> create_row("afafFAFFsqcategory2dadDADcategory8sasasadscategory24", 0)
>     val inputStr = 'a.string.at(0)
>     val inputIdx = 'a.int.at(1)
>     val length = 10
>     val valuesToCompareTo = for (i <- 1 to (length + 1)) yield ("category" + 
> i)
>     val initCondition = EqualTo(RegExpExtract(inputStr, Literal("category1"), 
> inputIdx), valuesToCompareTo(0))
>     var res: Expression = If(initCondition, Literal("category1"), 
> Literal("NULL"))
>     var cummulativeCondition: Expression = Not(initCondition)
>     for (index <- 1 to length) {
>       val valueExtractedFromInput = RegExpExtract(inputStr, 
> Literal("category" + (index + 1).toString), inputIdx)
>       val currComparee = If(cummulativeCondition, valueExtractedFromInput, 
> Literal("NULL"))
>       val currCondition = EqualTo(currComparee, valuesToCompareTo(index))
>       val combinedCond = And(cummulativeCondition, currCondition)
>       res = If(combinedCond, If(combinedCond, valueExtractedFromInput, 
> Literal("NULL")), res)
>       cummulativeCondition = And(Not(currCondition), cummulativeCondition)
>     }
>     val expressions = Seq(res)
>     val plan = GenerateUnsafeProjection.generate(expressions, true)
>     val actual = plan(row).toSeq(expressions.map(_.dataType))
>     val expected = Seq(UTF8String.fromString("category2"))
>     if (!checkResult(actual, expected)) {
>       fail(s"Incorrect Evaluation: expressions: $expressions, actual: 
> $actual, expected: $expected")
>     }
>   }
> {code}
> *Root Cause:*
> Current splitting of Projection codes doesn't (and can't) take care of 
> splitting the generated code for individual output column expressions. So it 
> can grow to exceed JVM limit.
> *Note:* This issue seems related to SPARK-14887 but I'm not sure whether the 
> root cause is same
>  
> *Proposed Fix:*
> If expression should place it's predicate, true value and false value 
> expressions' generated code in separate methods in context and call these 
> methods instead of putting the whole code directly in its generated code



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to