[ https://issues.apache.org/jira/browse/SPARK-18532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15685920#comment-15685920 ]
Georg Heiler commented on SPARK-18532: -------------------------------------- Please find a minimal example here: https://gist.github.com/geoHeil/86e5401fc57351c70fd49047c88cea05 I believe most of the transformations are rather pretty simple. In fact utilizing a UDF helps me to prevent this problem (see the comment in the code sample) Most of the time "regular" spark transformations e.g. string matching like df.withColumn("city", when('city === "123", "Munich").otherwise('city)) are used. > Code generation memory issue > ---------------------------- > > Key: SPARK-18532 > URL: https://issues.apache.org/jira/browse/SPARK-18532 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 2.0.2 > Environment: osx / macbook pro / local spark > Reporter: Georg Heiler > > Trying to create a spark data frame with multiple additional columns based on > conditions like this > df > .withColumn("name1", someCondition1) > .withColumn("name2", someCondition2) > .withColumn("name3", someCondition3) > .withColumn("name4", someCondition4) > .withColumn("name5", someCondition5) > .withColumn("name6", someCondition6) > .withColumn("name7", someCondition7) > I am faced with the following exception in case more than 6 .withColumn > clauses are added > org.codehaus.janino.JaninoRuntimeException: Code of method "()V" of class > "org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator" > grows beyond 64 KB > f even more columns are created e.g. around 20 I do no longer receive the > aforementioned exception, but rather get the following error after 5 minutes > of waiting: > java.lang.OutOfMemoryError: GC overhead limit exceeded > What I want to perform is a spelling/error correction. some simple cases > could be handled easily via a map& replacement in a UDF. Still, several other > cases with multiple chained conditions remain. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org