[ https://issues.apache.org/jira/browse/SPARK-26224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17337023#comment-17337023 ]
Hyukjin Kwon commented on SPARK-26224: -------------------------------------- There's a discussion going on in the dev mailing list to expose such a API. It would be great if you have some time to put some input, cc [~yikunkero] FYI > Results in stackOverFlowError when trying to add 3000 new columns using > withColumn function of dataframe. > --------------------------------------------------------------------------------------------------------- > > Key: SPARK-26224 > URL: https://issues.apache.org/jira/browse/SPARK-26224 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.3.0 > Environment: On macbook, used Intellij editor. Ran the above sample > code as unit test. > Reporter: Dorjee Tsering > Assignee: Marco Gaido > Priority: Minor > Fix For: 3.0.0 > > > Reproduction step: > Run this sample code on your laptop. I am trying to add 3000 new columns to a > base dataframe with 1 column. > > > {code:java} > import spark.implicits._ > val newColumnsToBeAdded : Seq[StructField] = for (i <- 1 to 3000) yield new > StructField("field_" + i, DataTypes.LongType) > val baseDataFrame: DataFrame = Seq((1)).toDF("employee_id") > val result = newColumnsToBeAdded.foldLeft(baseDataFrame)((df, newColumn) => > df.withColumn(newColumn.name, lit(0))) > result.show(false) > > {code} > Ends up with following stacktrace: > java.lang.StackOverflowError > at > scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:57) > at > scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:52) > at > scala.collection.TraversableLike$class.builder$1(TraversableLike.scala:229) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:233) > at scala.collection.immutable.List.map(List.scala:296) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:333) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272) -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org