Georg Heiler created SPARK-18532:
------------------------------------

             Summary: Code generation memory issue
                 Key: SPARK-18532
                 URL: https://issues.apache.org/jira/browse/SPARK-18532
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 2.0.2
         Environment: osx / macbook pro / local spark
            Reporter: Georg Heiler


Trying to create a spark data frame with multiple additional columns based on 
conditions like this

df
    .withColumn("name1", someCondition1)
    .withColumn("name2", someCondition2)
    .withColumn("name3", someCondition3)
    .withColumn("name4", someCondition4)
    .withColumn("name5", someCondition5)
    .withColumn("name6", someCondition6)
    .withColumn("name7", someCondition7)
I am faced with the following exception in case more than 6 .withColumn clauses 
are added

org.codehaus.janino.JaninoRuntimeException: Code of method "()V" of class 
"org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator" 
grows beyond 64 KB

f even more columns are created e.g. around 20 I do no longer receive the 
aforementioned exception, but rather get the following error after 5 minutes of 
waiting:

java.lang.OutOfMemoryError: GC overhead limit exceeded
What I want to perform is a spelling/error correction. some simple cases could 
be handled easily via a map& replacement in a UDF. Still, several other cases 
with multiple chained conditions remain.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to