[ https://issues.apache.org/jira/browse/SPARK-18091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Herman van Hovell updated SPARK-18091: -------------------------------------- Target Version/s: 2.1.0 > Deep if expressions cause Generated SpecificUnsafeProjection code to exceed > JVM code size limit > ----------------------------------------------------------------------------------------------- > > Key: SPARK-18091 > URL: https://issues.apache.org/jira/browse/SPARK-18091 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.6.1 > Reporter: Kapil Singh > Priority: Critical > > *Problem Description:* > I have an application in which a lot of if-else decisioning is involved to > generate output. I'm getting following exception: > Caused by: org.codehaus.janino.JaninoRuntimeException: Code of method > "(Lorg/apache/spark/sql/catalyst/expressions/GeneratedClass$SpecificUnsafeProjection;Lorg/apache/spark/sql/catalyst/InternalRow;)V" > of class > "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection" > grows beyond 64 KB > at org.codehaus.janino.CodeContext.makeSpace(CodeContext.java:941) > at org.codehaus.janino.CodeContext.write(CodeContext.java:874) > at org.codehaus.janino.CodeContext.writeBranch(CodeContext.java:965) > at org.codehaus.janino.UnitCompiler.writeBranch(UnitCompiler.java:10261) > *Steps to Reproduce:* > I've come up with a unit test which I was able to run in > CodeGenerationSuite.scala: > {code} > test("split large if expressions into blocks due to JVM code size limit") { > val row = > create_row("afafFAFFsqcategory2dadDADcategory8sasasadscategory24", 0) > val inputStr = 'a.string.at(0) > val inputIdx = 'a.int.at(1) > val length = 10 > val valuesToCompareTo = for (i <- 1 to (length + 1)) yield ("category" + > i) > val initCondition = EqualTo(RegExpExtract(inputStr, Literal("category1"), > inputIdx), valuesToCompareTo(0)) > var res: Expression = If(initCondition, Literal("category1"), > Literal("NULL")) > var cummulativeCondition: Expression = Not(initCondition) > for (index <- 1 to length) { > val valueExtractedFromInput = RegExpExtract(inputStr, > Literal("category" + (index + 1).toString), inputIdx) > val currComparee = If(cummulativeCondition, valueExtractedFromInput, > Literal("NULL")) > val currCondition = EqualTo(currComparee, valuesToCompareTo(index)) > val combinedCond = And(cummulativeCondition, currCondition) > res = If(combinedCond, If(combinedCond, valueExtractedFromInput, > Literal("NULL")), res) > cummulativeCondition = And(Not(currCondition), cummulativeCondition) > } > val expressions = Seq(res) > val plan = GenerateUnsafeProjection.generate(expressions, true) > val actual = plan(row).toSeq(expressions.map(_.dataType)) > val expected = Seq(UTF8String.fromString("category2")) > if (!checkResult(actual, expected)) { > fail(s"Incorrect Evaluation: expressions: $expressions, actual: > $actual, expected: $expected") > } > } > {code} > *Root Cause:* > Current splitting of Projection codes doesn't (and can't) take care of > splitting the generated code for individual output column expressions. So it > can grow to exceed JVM limit. > *Note:* This issue seems related to SPARK-14887 but I'm not sure whether the > root cause is same > > *Proposed Fix:* > If expression should place it's predicate, true value and false value > expressions' generated code in separate methods in context and call these > methods instead of putting the whole code directly in its generated code -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org