[jira] [Commented] (SPARK-8443) GenerateMutableProjection Exceeds JVM Code Size Limits
[ https://issues.apache.org/jira/browse/SPARK-8443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15664779#comment-15664779 ] Barry Becker commented on SPARK-8443: - I see the same error in spark 1.6.3. Is there a workaround? Should this issue be reopened? I am working with a dataset with more than 200 columns. Before 1.6.3 I could not even try this case because of SPARK-16664. {code} Caused by: org.codehaus.janino.JaninoRuntimeException: Code of method "(Lorg/apache/spark/sql/catalyst/InternalRow;Lorg/apache/spark/sql/catalyst/InternalRow;)I" of class "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering" grows beyond 64 KB at org.codehaus.janino.CodeContext.makeSpace(CodeContext.java:941) at org.codehaus.janino.CodeContext.write(CodeContext.java:836) {code} > GenerateMutableProjection Exceeds JVM Code Size Limits > -- > > Key: SPARK-8443 > URL: https://issues.apache.org/jira/browse/SPARK-8443 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.4.0 >Reporter: Sen Fang >Assignee: Sen Fang > Fix For: 1.5.0 > > > GenerateMutableProjection put all expressions columns into a single apply > function. When there are a lot of columns, the apply function code size > exceeds the 64kb limit, which is a hard limit on jvm that cannot change. > This comes up when we were aggregating about 100 columns using codegen and > unsafe feature. > I wrote an unit test that reproduces this issue. > https://github.com/saurfang/spark/blob/codegen_size_limit/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CodeGenerationSuite.scala > This test currently fails at 2048 expressions. It seems the master is more > tolerant than branch-1.4 about this because code is more concise. > While the code on master has changed since branch-1.4, I am able to reproduce > the problem in master. For now I hacked my way in branch-1.4 to workaround > this problem by wrapping each expression with a separate function then call > those functions sequentially in apply. The proper way is probably check the > length of the projectCode and break it up as necessary. (This seems to be > easier in master actually since we are building code by string rather than > quasiquote) > Let me know if anyone has additional thoughts on this, I'm happy to > contribute a pull request. > Attaching stack trace produced by unit test > {code} > [info] - code size limit *** FAILED *** (7 seconds, 103 milliseconds) > [info] com.google.common.util.concurrent.UncheckedExecutionException: > org.codehaus.janino.JaninoRuntimeException: Code of method > "(Ljava/lang/Object;)Ljava/lang/Object;" of class "SC$SpecificProjection" > grows beyond 64 KB > [info] at > com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2263) > [info] at com.google.common.cache.LocalCache.get(LocalCache.java:4000) > [info] at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:4004) > [info] at > com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874) > [info] at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:285) > [info] at > org.apache.spark.sql.catalyst.expressions.CodeGenerationSuite$$anonfun$2$$anonfun$apply$mcV$sp$2.apply(CodeGenerationSuite.scala:50) > [info] at > org.apache.spark.sql.catalyst.expressions.CodeGenerationSuite$$anonfun$2$$anonfun$apply$mcV$sp$2.apply(CodeGenerationSuite.scala:48) > [info] at > scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:144) > [info] at > scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:144) > [info] at scala.collection.immutable.Range.foreach(Range.scala:141) > [info] at > scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:144) > [info] at > scala.collection.AbstractTraversable.foldLeft(Traversable.scala:105) > [info] at > org.apache.spark.sql.catalyst.expressions.CodeGenerationSuite$$anonfun$2.apply$mcV$sp(CodeGenerationSuite.scala:47) > [info] at > org.apache.spark.sql.catalyst.expressions.CodeGenerationSuite$$anonfun$2.apply(CodeGenerationSuite.scala:47) > [info] at > org.apache.spark.sql.catalyst.expressions.CodeGenerationSuite$$anonfun$2.apply(CodeGenerationSuite.scala:47) > [info] at > org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) > [info] at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) > [info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > [info] at org.scalatest.Transformer.apply(Transformer.scala:22) > [info] at org.scalatest.Transformer.apply(Transformer.scala:20) > [info] at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:
[jira] [Commented] (SPARK-8443) GenerateMutableProjection Exceeds JVM Code Size Limits
[ https://issues.apache.org/jira/browse/SPARK-8443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15385667#comment-15385667 ] Hayri Volkan Agun commented on SPARK-8443: -- The same issue for large sql syntax with a lot of unions and iterations over dataframe at spark 1.6.2 Caused by: org.codehaus.janino.JaninoRuntimeException: Code of method "(Lorg/apache/spark/sql/catalyst/expressions/GeneratedClass$SpecificUnsafeProjection;Lorg/apache/spark/sql/catalyst/InternalRow;)V" of class "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection" grows beyond 64 KB at org.codehaus.janino.CodeContext.makeSpace(CodeContext.java:941) at org.codehaus.janino.CodeContext.write(CodeContext.java:854) at org.codehaus.janino.CodeContext.writeShort(CodeContext.java:959) at org.codehaus.janino.UnitCompiler.writeConstantFieldrefInfo(UnitCompiler.java:10279) at org.codehaus.janino.UnitCompiler.getfield(UnitCompiler.java:9946) at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:3322) at org.codehaus.janino.UnitCompiler.access$8200(UnitCompiler.java:185) at org.codehaus.janino.UnitCompiler$10.visitFieldAccess(UnitCompiler.java:3282) at org.codehaus.janino.Java$FieldAccess.accept(Java.java:3232) at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:3290) at org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:4368) at org.codehaus.janino.UnitCompiler.compileContext2(UnitCompiler.java:3190) at org.codehaus.janino.UnitCompiler.access$5600(UnitCompiler.java:185) at org.codehaus.janino.UnitCompiler$9.visitFieldAccess(UnitCompiler.java:3152) at org.codehaus.janino.Java$FieldAccess.accept(Java.java:3232) at org.codehaus.janino.UnitCompiler.compileContext(UnitCompiler.java:3160) at org.codehaus.janino.UnitCompiler.compileContext2(UnitCompiler.java:3172) at org.codehaus.janino.UnitCompiler.access$5400(UnitCompiler.java:185) at org.codehaus.janino.UnitCompiler$9.visitAmbiguousName(UnitCompiler.java:3150) at org.codehaus.janino.Java$AmbiguousName.accept(Java.java:3138) at org.codehaus.janino.UnitCompiler.compileContext(UnitCompiler.java:3160) at org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:4367) at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:3975) at org.codehaus.janino.UnitCompiler.access$6900(UnitCompiler.java:185) at org.codehaus.janino.UnitCompiler$10.visitMethodInvocation(UnitCompiler.java:3263) at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:3974) at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:3290) at org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:4368) at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:2662) at org.codehaus.janino.UnitCompiler.access$4400(UnitCompiler.java:185) at org.codehaus.janino.UnitCompiler$7.visitMethodInvocation(UnitCompiler.java:2627) at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:3974) at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:2654) at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:1643) at org.codehaus.janino.UnitCompiler.access$1100(UnitCompiler.java:185) at org.codehaus.janino.UnitCompiler$4.visitExpressionStatement(UnitCompiler.java:936) at org.codehaus.janino.Java$ExpressionStatement.accept(Java.java:2097) at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:958) at org.codehaus.janino.UnitCompiler.compileStatements(UnitCompiler.java:1007) at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:993) at org.codehaus.janino.UnitCompiler.access$1000(UnitCompiler.java:185) at org.codehaus.janino.UnitCompiler$4.visitBlock(UnitCompiler.java:935) at org.codehaus.janino.Java$Block.accept(Java.java:2012) at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:958) at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:1742) at org.codehaus.janino.UnitCompiler.access$1200(UnitCompiler.java:185) at org.codehaus.janino.UnitCompiler$4.visitIfStatement(UnitCompiler.java:937) at org.codehaus.janino.Java$IfStatement.accept(Java.java:2157) at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:958) at org.codehaus.janino.UnitCompiler.compileStatements(UnitCompiler.java:1007) at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:993) at org.codehaus.janino.UnitCompiler.access$1000(UnitCompiler.java:185) at org.codehaus.janino.UnitCompiler$4.visitBlock(UnitCompiler.java:935) at org.codehaus.janino.Java$Block.accept(Java.java:20
[jira] [Commented] (SPARK-8443) GenerateMutableProjection Exceeds JVM Code Size Limits
[ https://issues.apache.org/jira/browse/SPARK-8443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14605028#comment-14605028 ] Apache Spark commented on SPARK-8443: - User 'saurfang' has created a pull request for this issue: https://github.com/apache/spark/pull/7076 > GenerateMutableProjection Exceeds JVM Code Size Limits > -- > > Key: SPARK-8443 > URL: https://issues.apache.org/jira/browse/SPARK-8443 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.4.0 >Reporter: Sen Fang > > GenerateMutableProjection put all expressions columns into a single apply > function. When there are a lot of columns, the apply function code size > exceeds the 64kb limit, which is a hard limit on jvm that cannot change. > This comes up when we were aggregating about 100 columns using codegen and > unsafe feature. > I wrote an unit test that reproduces this issue. > https://github.com/saurfang/spark/blob/codegen_size_limit/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CodeGenerationSuite.scala > This test currently fails at 2048 expressions. It seems the master is more > tolerant than branch-1.4 about this because code is more concise. > While the code on master has changed since branch-1.4, I am able to reproduce > the problem in master. For now I hacked my way in branch-1.4 to workaround > this problem by wrapping each expression with a separate function then call > those functions sequentially in apply. The proper way is probably check the > length of the projectCode and break it up as necessary. (This seems to be > easier in master actually since we are building code by string rather than > quasiquote) > Let me know if anyone has additional thoughts on this, I'm happy to > contribute a pull request. > Attaching stack trace produced by unit test > {code} > [info] - code size limit *** FAILED *** (7 seconds, 103 milliseconds) > [info] com.google.common.util.concurrent.UncheckedExecutionException: > org.codehaus.janino.JaninoRuntimeException: Code of method > "(Ljava/lang/Object;)Ljava/lang/Object;" of class "SC$SpecificProjection" > grows beyond 64 KB > [info] at > com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2263) > [info] at com.google.common.cache.LocalCache.get(LocalCache.java:4000) > [info] at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:4004) > [info] at > com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874) > [info] at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:285) > [info] at > org.apache.spark.sql.catalyst.expressions.CodeGenerationSuite$$anonfun$2$$anonfun$apply$mcV$sp$2.apply(CodeGenerationSuite.scala:50) > [info] at > org.apache.spark.sql.catalyst.expressions.CodeGenerationSuite$$anonfun$2$$anonfun$apply$mcV$sp$2.apply(CodeGenerationSuite.scala:48) > [info] at > scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:144) > [info] at > scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:144) > [info] at scala.collection.immutable.Range.foreach(Range.scala:141) > [info] at > scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:144) > [info] at > scala.collection.AbstractTraversable.foldLeft(Traversable.scala:105) > [info] at > org.apache.spark.sql.catalyst.expressions.CodeGenerationSuite$$anonfun$2.apply$mcV$sp(CodeGenerationSuite.scala:47) > [info] at > org.apache.spark.sql.catalyst.expressions.CodeGenerationSuite$$anonfun$2.apply(CodeGenerationSuite.scala:47) > [info] at > org.apache.spark.sql.catalyst.expressions.CodeGenerationSuite$$anonfun$2.apply(CodeGenerationSuite.scala:47) > [info] at > org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) > [info] at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) > [info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > [info] at org.scalatest.Transformer.apply(Transformer.scala:22) > [info] at org.scalatest.Transformer.apply(Transformer.scala:20) > [info] at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166) > [info] at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:42) > [info] at > org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163) > [info] at > org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) > [info] at > org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) > [info] at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) > [info] at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175) > [info] at org.scalatest.FunSuite.runTest(FunSuite.scala:1555) > [info] at > org.scalate