Georg Heiler created SPARK-18532: ------------------------------------ Summary: Code generation memory issue Key: SPARK-18532 URL: https://issues.apache.org/jira/browse/SPARK-18532 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.0.2 Environment: osx / macbook pro / local spark Reporter: Georg Heiler
Trying to create a spark data frame with multiple additional columns based on conditions like this df .withColumn("name1", someCondition1) .withColumn("name2", someCondition2) .withColumn("name3", someCondition3) .withColumn("name4", someCondition4) .withColumn("name5", someCondition5) .withColumn("name6", someCondition6) .withColumn("name7", someCondition7) I am faced with the following exception in case more than 6 .withColumn clauses are added org.codehaus.janino.JaninoRuntimeException: Code of method "()V" of class "org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator" grows beyond 64 KB f even more columns are created e.g. around 20 I do no longer receive the aforementioned exception, but rather get the following error after 5 minutes of waiting: java.lang.OutOfMemoryError: GC overhead limit exceeded What I want to perform is a spelling/error correction. some simple cases could be handled easily via a map& replacement in a UDF. Still, several other cases with multiple chained conditions remain. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org