[ https://issues.apache.org/jira/browse/SPARK-25987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ivan Tsukanov updated SPARK-25987: ---------------------------------- Description: When I execute {code:java} val columnsCount = 100 val columns = (1 to columnsCount).map(i => s"col$i") val initialData = (1 to columnsCount).map(i => s"val$i") val df = sparkSession.createDataFrame( rowRDD = sparkSession.sparkContext.makeRDD(Seq(Row.fromSeq(initialData))), schema = StructType(columns.map(StructField(_, StringType, true))) ) val addSuffixUDF = udf( (str: String) => str + "_added" ) implicit class DFOps(df: DataFrame) { def addSuffix() = { df.select(columns.map(col => addSuffixUDF(df(col)).as(col) ): _*) } } df .addSuffix() .addSuffix() .addSuffix() .show() {code} I get {code:java} An exception or error caused a run to abort. java.lang.StackOverflowError at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:385) at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:553) ... {code} If I reduce columns number (to 10 for example) or do `addSuffix` only once - it works fine. was: When I execute {code:java} val columnsCount = 100 val columns = (1 to columnsCount).map(i => s"col$i") val initialData = (1 to columnsCount).map(i => s"val$i") val df = sparkSession.createDataFrame( rowRDD = sparkSession.sparkContext.makeRDD(Seq(Row.fromSeq(initialData))), schema = StructType(columns.map(StructField(_, StringType, true))) ) val addSuffixUDF = udf( (str: String) => str + "_added" ) implicit class DFOps(df: DataFrame) { def addSuffix() = { df.select(columns.map(col => addSuffixUDF(df(col)).as(col) ): _*) } } df .addSuffix() .addSuffix() .addSuffix() .show() {code} I get {code:java} An exception or error caused a run to abort. java.lang.StackOverflowError at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:385) at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:553) ... {code} If I reduce columns number (to 10 for example) or do `addSuffix` only once - it works fine. > StackOverflowError when executing many operations on a table with many columns > ------------------------------------------------------------------------------ > > Key: SPARK-25987 > URL: https://issues.apache.org/jira/browse/SPARK-25987 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 2.2.1, 2.2.2, 2.3.0, 2.3.2 > Environment: Ubuntu 18.04.1 LTS, openjdk "1.8.0_181" > Reporter: Ivan Tsukanov > Priority: Major > > When I execute > {code:java} > val columnsCount = 100 > val columns = (1 to columnsCount).map(i => s"col$i") > val initialData = (1 to columnsCount).map(i => s"val$i") > val df = sparkSession.createDataFrame( > rowRDD = sparkSession.sparkContext.makeRDD(Seq(Row.fromSeq(initialData))), > schema = StructType(columns.map(StructField(_, StringType, true))) > ) > val addSuffixUDF = udf( > (str: String) => str + "_added" > ) > implicit class DFOps(df: DataFrame) { > def addSuffix() = { > df.select(columns.map(col => > addSuffixUDF(df(col)).as(col) > ): _*) > } > } > df > .addSuffix() > .addSuffix() > .addSuffix() > .show() > {code} > I get > {code:java} > An exception or error caused a run to abort. > java.lang.StackOverflowError > at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:385) > at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:553) > ... > {code} > If I reduce columns number (to 10 for example) or do `addSuffix` only once - > it works fine. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org