[jira] [Created] (SPARK-22284) Code of class \"org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection\" grows beyond 64 KB

Ben (JIRA) Mon, 16 Oct 2017 01:41:24 -0700

Ben created SPARK-22284:
---------------------------

             Summary: Code of class 
\"org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection\"
 grows beyond 64 KB
                 Key: SPARK-22284
                 URL: https://issues.apache.org/jira/browse/SPARK-22284
             Project: Spark
          Issue Type: Bug
          Components: Optimizer, PySpark
    Affects Versions: 2.1.0
            Reporter: Ben



I am using pySpark 2.1.0 in a production environment, and trying to join two 
DataFrames, one of which is very large and has complex nested structures.

Basically, I load both DataFrames and cache them.
Then, in the large DataFrame, I extract 3 nested values and save them as direct 
columns.
Finally, I join on these three columns with the smaller DataFrame.
This would be a short code for this:

{code:python}
dataFrame.read......cache()
dataFrameSmall.read.......cache()
dataFrame = dataFrame.selectExpr(['*','nested.Value1 AS Value1','nested.Value2 
AS Value2','nested.Value3 AS Value3'])
dataFrame = dataFrame.dropDuplicates().join(dataFrameSmall, 
['Value1','Value2',Value3'])
dataFrame.count()
{code}

And this is the error I get when it gets to the count():

{code:python}
org.apache.spark.SparkException: Job aborted due to stage failure: Task 11 in 
stage 7.0 failed 4 times, most recent failure: Lost task 11.3 in stage 7.0 (TID 
11234, somehost.com, executor 10): java.util.concurrent.ExecutionException: 
java.lang.Exception: failed to compile: 
org.codehaus.janino.JaninoRuntimeException: Code of method 
\"apply_1$(Lorg/apache/spark/sql/catalyst/expressions/GeneratedClass$SpecificUnsafeProjection;Lorg/apache/spark/sql/catalyst/InternalRow;)V\"
 of class 
\"org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection\"
 grows beyond 64 KB
{code}

I have seen many tickets with similar issues here, but no proper solution. Most 
of the fixes are until Spark 2.1.0 so I don't know if running it on Spark 2.2.0 
would fix it. In any case I cannot change the version of Spark since it is in 
production.
I have also tried setting spark.sql.codegen.wholeStage=false but still the same 
error.

The job worked well up to now, also with large datasets, but apparently this 
batch got larger than it could handle. Is there any workaround for this?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (SPARK-22284) Code of class \"org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection\" grows beyond 64 KB

Reply via email to