[ https://issues.apache.org/jira/browse/SPARK-22284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16205608#comment-16205608 ]
Ben commented on SPARK-22284: ----------------------------- I did try adding {code:java} --conf "spark.sql.codegen.wholeStage=false" {code} in *spark-submit*, but the same error occurred. > Code of class > \"org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection\" > grows beyond 64 KB > ---------------------------------------------------------------------------------------------------------------------- > > Key: SPARK-22284 > URL: https://issues.apache.org/jira/browse/SPARK-22284 > Project: Spark > Issue Type: Bug > Components: Optimizer, PySpark, SQL > Affects Versions: 2.1.0 > Reporter: Ben > > I am using pySpark 2.1.0 in a production environment, and trying to join two > DataFrames, one of which is very large and has complex nested structures. > Basically, I load both DataFrames and cache them. > Then, in the large DataFrame, I extract 3 nested values and save them as > direct columns. > Finally, I join on these three columns with the smaller DataFrame. > This would be a short code for this: > {code} > dataFrame.read......cache() > dataFrameSmall.read.......cache() > dataFrame = dataFrame.selectExpr(['*','nested.Value1 AS > Value1','nested.Value2 AS Value2','nested.Value3 AS Value3']) > dataFrame = dataFrame.dropDuplicates().join(dataFrameSmall, > ['Value1','Value2',Value3']) > dataFrame.count() > {code} > And this is the error I get when it gets to the count(): > {code} > org.apache.spark.SparkException: Job aborted due to stage failure: Task 11 in > stage 7.0 failed 4 times, most recent failure: Lost task 11.3 in stage 7.0 > (TID 11234, somehost.com, executor 10): > java.util.concurrent.ExecutionException: java.lang.Exception: failed to > compile: org.codehaus.janino.JaninoRuntimeException: Code of method > \"apply_1$(Lorg/apache/spark/sql/catalyst/expressions/GeneratedClass$SpecificUnsafeProjection;Lorg/apache/spark/sql/catalyst/InternalRow;)V\" > of class > \"org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection\" > grows beyond 64 KB > {code} > I have seen many tickets with similar issues here, but no proper solution. > Most of the fixes are until Spark 2.1.0 so I don't know if running it on > Spark 2.2.0 would fix it. In any case I cannot change the version of Spark > since it is in production. > I have also tried setting > {code:java} > spark.sql.codegen.wholeStage=false > {code} > but still the same error. > The job worked well up to now, also with large datasets, but apparently this > batch got larger, and that is the only thing that changed. Is there any > workaround for this? -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org