[ https://issues.apache.org/jira/browse/SPARK-26224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16717313#comment-16717313 ]
ASF GitHub Bot commented on SPARK-26224: ---------------------------------------- mgaido91 commented on issue #23285: [SPARK-26224][SQL] Avoid creating many project on subsequent calls to withColumn URL: https://github.com/apache/spark/pull/23285#issuecomment-446230611 retest this please ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Results in stackOverFlowError when trying to add 3000 new columns using > withColumn function of dataframe. > --------------------------------------------------------------------------------------------------------- > > Key: SPARK-26224 > URL: https://issues.apache.org/jira/browse/SPARK-26224 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.3.0 > Environment: On macbook, used Intellij editor. Ran the above sample > code as unit test. > Reporter: Dorjee Tsering > Priority: Minor > > Reproduction step: > Run this sample code on your laptop. I am trying to add 3000 new columns to a > base dataframe with 1 column. > > > {code:java} > import spark.implicits._ > val newColumnsToBeAdded : Seq[StructField] = for (i <- 1 to 3000) yield new > StructField("field_" + i, DataTypes.LongType) > val baseDataFrame: DataFrame = Seq((1)).toDF("employee_id") > val result = newColumnsToBeAdded.foldLeft(baseDataFrame)((df, newColumn) => > df.withColumn(newColumn.name, lit(0))) > result.show(false) > > {code} > Ends up with following stacktrace: > java.lang.StackOverflowError > at > scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:57) > at > scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:52) > at > scala.collection.TraversableLike$class.builder$1(TraversableLike.scala:229) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:233) > at scala.collection.immutable.List.map(List.scala:296) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:333) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272) -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org