[ https://issues.apache.org/jira/browse/SPARK-16845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15762805#comment-15762805 ]
Barry Becker edited comment on SPARK-16845 at 12/20/16 9:24 PM: ---------------------------------------------------------------- I found a workaround that allows me to avoid the 64 KB error, but it still reuns much slower than I expected. I switched to use a batch select statement insted of calls to withColumns in a loop. Here is an example of what I did Old way: {code} stringCols.foreach(column => { val qCol = col(column) datasetDf = datasetDf .withColumn(column + CLEAN_SUFFIX, when(qCol.isNull, lit(MISSING)).otherwise(qCol)) }) {code} New way: {code} val replaceStringNull = udf((s: String) => if (s == null) MISSING else s) var newCols = datasetDf.columns.map(column => if (stringCols.contains(column)) replaceStringNull(col(column)).as(column + CLEAN_SUFFIX) else col(column)) datasetDf = datasetDf.select(newCols:_*) {code} This workaround only works on spark 2.0.2. I still get the 64 KB limit error when running the same thing with 1.6.3. was (Author: barrybecker4): I found a workaround that allows me to avoid the 64 KB error, but it still reuns much slower than I expected. I switched to use a bacth select statement insted of calls to withColumns in a loop. Here is an example of what I did Old way: {code} stringCols.foreach(column => { val qCol = col(column) datasetDf = datasetDf .withColumn(column + CLEAN_SUFFIX, when(qCol.isNull, lit(MISSING)).otherwise(qCol)) }) {code} New way: {code} val replaceStringNull = udf((s: String) => if (s == null) MISSING else s) var newCols = datasetDf.columns.map(column => if (stringCols.contains(column)) replaceStringNull(col(column)).as(column + CLEAN_SUFFIX) else col(column)) datasetDf = datasetDf.select(newCols:_*) {code} > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering" > grows beyond 64 KB > --------------------------------------------------------------------------------------------- > > Key: SPARK-16845 > URL: https://issues.apache.org/jira/browse/SPARK-16845 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.0.0 > Reporter: hejie > Attachments: error.txt.zip > > > I have a wide table(400 columns), when I try fitting the traindata on all > columns, the fatal error occurs. > ... 46 more > Caused by: org.codehaus.janino.JaninoRuntimeException: Code of method > "(Lorg/apache/spark/sql/catalyst/InternalRow;Lorg/apache/spark/sql/catalyst/InternalRow;)I" > of class > "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering" > grows beyond 64 KB > at org.codehaus.janino.CodeContext.makeSpace(CodeContext.java:941) > at org.codehaus.janino.CodeContext.write(CodeContext.java:854) -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org