[ https://issues.apache.org/jira/browse/SPARK-18532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
holdenk updated SPARK-18532: ---------------------------- Component/s: (was: Spark Core) SQL > Code generation memory issue > ---------------------------- > > Key: SPARK-18532 > URL: https://issues.apache.org/jira/browse/SPARK-18532 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.0.2 > Environment: osx / macbook pro / local spark > Reporter: Georg Heiler > > Trying to create a spark data frame with multiple additional columns based on > conditions like this > df > .withColumn("name1", someCondition1) > .withColumn("name2", someCondition2) > .withColumn("name3", someCondition3) > .withColumn("name4", someCondition4) > .withColumn("name5", someCondition5) > .withColumn("name6", someCondition6) > .withColumn("name7", someCondition7) > I am faced with the following exception in case more than 6 .withColumn > clauses are added > org.codehaus.janino.JaninoRuntimeException: Code of method "()V" of class > "org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator" > grows beyond 64 KB > f even more columns are created e.g. around 20 I do no longer receive the > aforementioned exception, but rather get the following error after 5 minutes > of waiting: > java.lang.OutOfMemoryError: GC overhead limit exceeded > What I want to perform is a spelling/error correction. some simple cases > could be handled easily via a map& replacement in a UDF. Still, several other > cases with multiple chained conditions remain. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org