[jira] [Commented] (SPARK-18532) Code generation memory issue

Herman van Hovell (JIRA) Mon, 21 Nov 2016 14:17:54 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-18532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15684908#comment-15684908
 ]


Herman van Hovell commented on SPARK-18532:
-------------------------------------------

The code generated by whole stage code generation is hitting a JVM limit. Spark 
should fallback to a non-codegenerated path in that case, this is what you see 
happening with 20 columns. It should not OOM, but this depends on the 
complexity of the UDF used and the number of conversions Spark has to do 
invoking the UDF.

Could you share a reproducible example? Otherwise an overview of the query plan 
df.explain(), and the generated code would also help.

You can get the generated code by executing the following lines in the shell:
{noformat}
import org.apache.spark.sql.execution.debug._
df.debugCodegen
{noformat}

> Code generation memory issue
> ----------------------------
>
>                 Key: SPARK-18532
>                 URL: https://issues.apache.org/jira/browse/SPARK-18532
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.0.2
>         Environment: osx / macbook pro / local spark
>            Reporter: Georg Heiler
>
> Trying to create a spark data frame with multiple additional columns based on 
> conditions like this
> df
>     .withColumn("name1", someCondition1)
>     .withColumn("name2", someCondition2)
>     .withColumn("name3", someCondition3)
>     .withColumn("name4", someCondition4)
>     .withColumn("name5", someCondition5)
>     .withColumn("name6", someCondition6)
>     .withColumn("name7", someCondition7)
> I am faced with the following exception in case more than 6 .withColumn 
> clauses are added
> org.codehaus.janino.JaninoRuntimeException: Code of method "()V" of class 
> "org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator" 
> grows beyond 64 KB
> f even more columns are created e.g. around 20 I do no longer receive the 
> aforementioned exception, but rather get the following error after 5 minutes 
> of waiting:
> java.lang.OutOfMemoryError: GC overhead limit exceeded
> What I want to perform is a spelling/error correction. some simple cases 
> could be handled easily via a map& replacement in a UDF. Still, several other 
> cases with multiple chained conditions remain.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18532) Code generation memory issue

Reply via email to