[jira] [Commented] (SPARK-26224) Results in stackOverFlowError when trying to add 3000 new columns using withColumn function of dataframe.

2021-05-04 Thread Yikun Jiang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-26224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17338809#comment-17338809
 ] 

Yikun Jiang commented on SPARK-26224:
-

FYI, as mentioned as Hyukjin, the mailing list [1] had some discussion to 
expose `withColumns` in Scala/PySpark.

[1] 
http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Multiple-columns-adding-replacing-support-in-PySpark-DataFrame-API-td31164.html

 

And I personal perfer to introduce the `withColumns` because it bring more 
friendly development experience rather than select(*). This is the PR to add 
`withColumns`: https://github.com/apache/spark/pull/32431

> Results in stackOverFlowError when trying to add 3000 new columns using 
> withColumn function of dataframe.
> -
>
> Key: SPARK-26224
> URL: https://issues.apache.org/jira/browse/SPARK-26224
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
> Environment: On macbook, used Intellij editor. Ran the above sample 
> code as unit test.
>Reporter: Dorjee Tsering
>Assignee: Marco Gaido
>Priority: Minor
> Fix For: 3.0.0
>
>
> Reproduction step:
> Run this sample code on your laptop. I am trying to add 3000 new columns to a 
> base dataframe with 1 column.
>  
>  
> {code:java}
> import spark.implicits._
> val newColumnsToBeAdded : Seq[StructField] = for (i <- 1 to 3000) yield new 
> StructField("field_" + i, DataTypes.LongType)
> val baseDataFrame: DataFrame = Seq((1)).toDF("employee_id")
> val result = newColumnsToBeAdded.foldLeft(baseDataFrame)((df, newColumn) => 
> df.withColumn(newColumn.name, lit(0)))
> result.show(false)
>  
> {code}
> Ends up with following stacktrace:
> java.lang.StackOverflowError
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:57)
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:52)
>  at 
> scala.collection.TraversableLike$class.builder$1(TraversableLike.scala:229)
>  at scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
>  at scala.collection.immutable.List.map(List.scala:296)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:333)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26224) Results in stackOverFlowError when trying to add 3000 new columns using withColumn function of dataframe.

2021-04-29 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-26224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17337023#comment-17337023
 ] 

Hyukjin Kwon commented on SPARK-26224:
--

There's a discussion going on in the dev mailing list to expose such a API. It 
would be great if you have some time to put some input, cc [~yikunkero] FYI

> Results in stackOverFlowError when trying to add 3000 new columns using 
> withColumn function of dataframe.
> -
>
> Key: SPARK-26224
> URL: https://issues.apache.org/jira/browse/SPARK-26224
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
> Environment: On macbook, used Intellij editor. Ran the above sample 
> code as unit test.
>Reporter: Dorjee Tsering
>Assignee: Marco Gaido
>Priority: Minor
> Fix For: 3.0.0
>
>
> Reproduction step:
> Run this sample code on your laptop. I am trying to add 3000 new columns to a 
> base dataframe with 1 column.
>  
>  
> {code:java}
> import spark.implicits._
> val newColumnsToBeAdded : Seq[StructField] = for (i <- 1 to 3000) yield new 
> StructField("field_" + i, DataTypes.LongType)
> val baseDataFrame: DataFrame = Seq((1)).toDF("employee_id")
> val result = newColumnsToBeAdded.foldLeft(baseDataFrame)((df, newColumn) => 
> df.withColumn(newColumn.name, lit(0)))
> result.show(false)
>  
> {code}
> Ends up with following stacktrace:
> java.lang.StackOverflowError
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:57)
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:52)
>  at 
> scala.collection.TraversableLike$class.builder$1(TraversableLike.scala:229)
>  at scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
>  at scala.collection.immutable.List.map(List.scala:296)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:333)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26224) Results in stackOverFlowError when trying to add 3000 new columns using withColumn function of dataframe.

2021-04-29 Thread Saurabh Chawla (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-26224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17335374#comment-17335374
 ] 

Saurabh Chawla commented on SPARK-26224:


[~srowen] / [~hyukjin.kwon] - Any plan for exposing the "withColumns", I have 
similar requirement  where I have to loop for withColumn. For safer side I used 
the reflection to access the withColumns. 
{code:java}
val dataSetClass = Class.forName("org.apache.spark.sql.Dataset")
val newConfigurationMethod =
  dataSetClass.getMethod("withColumns", classOf[Seq[String]], 
classOf[Seq[Column]])
newConfigurationMethod.invoke(
  baseDataFrame, columnName, columnValue).asInstanceOf[DataFrame]{code}
It will be great if we use the "withColumns" rather than using the reflection 
code here.

 

> Results in stackOverFlowError when trying to add 3000 new columns using 
> withColumn function of dataframe.
> -
>
> Key: SPARK-26224
> URL: https://issues.apache.org/jira/browse/SPARK-26224
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
> Environment: On macbook, used Intellij editor. Ran the above sample 
> code as unit test.
>Reporter: Dorjee Tsering
>Assignee: Marco Gaido
>Priority: Minor
> Fix For: 3.0.0
>
>
> Reproduction step:
> Run this sample code on your laptop. I am trying to add 3000 new columns to a 
> base dataframe with 1 column.
>  
>  
> {code:java}
> import spark.implicits._
> val newColumnsToBeAdded : Seq[StructField] = for (i <- 1 to 3000) yield new 
> StructField("field_" + i, DataTypes.LongType)
> val baseDataFrame: DataFrame = Seq((1)).toDF("employee_id")
> val result = newColumnsToBeAdded.foldLeft(baseDataFrame)((df, newColumn) => 
> df.withColumn(newColumn.name, lit(0)))
> result.show(false)
>  
> {code}
> Ends up with following stacktrace:
> java.lang.StackOverflowError
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:57)
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:52)
>  at 
> scala.collection.TraversableLike$class.builder$1(TraversableLike.scala:229)
>  at scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
>  at scala.collection.immutable.List.map(List.scala:296)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:333)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26224) Results in stackOverFlowError when trying to add 3000 new columns using withColumn function of dataframe.

2019-03-02 Thread Sean Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16782496#comment-16782496
 ] 

Sean Owen commented on SPARK-26224:
---

The most realistic thing I can imagine is exposing `withColumns`.
But for this use case, there are pretty easy workarounds, like just mapping the 
DataFrame Rows to contain a bunch more 0s and specifying your new schema.

> Results in stackOverFlowError when trying to add 3000 new columns using 
> withColumn function of dataframe.
> -
>
> Key: SPARK-26224
> URL: https://issues.apache.org/jira/browse/SPARK-26224
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
> Environment: On macbook, used Intellij editor. Ran the above sample 
> code as unit test.
>Reporter: Dorjee Tsering
>Priority: Minor
>
> Reproduction step:
> Run this sample code on your laptop. I am trying to add 3000 new columns to a 
> base dataframe with 1 column.
>  
>  
> {code:java}
> import spark.implicits._
> val newColumnsToBeAdded : Seq[StructField] = for (i <- 1 to 3000) yield new 
> StructField("field_" + i, DataTypes.LongType)
> val baseDataFrame: DataFrame = Seq((1)).toDF("employee_id")
> val result = newColumnsToBeAdded.foldLeft(baseDataFrame)((df, newColumn) => 
> df.withColumn(newColumn.name, lit(0)))
> result.show(false)
>  
> {code}
> Ends up with following stacktrace:
> java.lang.StackOverflowError
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:57)
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:52)
>  at 
> scala.collection.TraversableLike$class.builder$1(TraversableLike.scala:229)
>  at scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
>  at scala.collection.immutable.List.map(List.scala:296)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:333)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26224) Results in stackOverFlowError when trying to add 3000 new columns using withColumn function of dataframe.

2018-12-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16717621#comment-16717621
 ] 

ASF GitHub Bot commented on SPARK-26224:


SparkQA commented on issue #23285: [SPARK-26224][SQL] Avoid creating many 
project on subsequent calls to withColumn
URL: https://github.com/apache/spark/pull/23285#issuecomment-446290453
 
 
   **[Test build #99981 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99981/testReport)**
 for PR 23285 at commit 
[`a40db10`](https://github.com/apache/spark/commit/a40db106d4d26966af04afb79edb35a59efd1703).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Results in stackOverFlowError when trying to add 3000 new columns using 
> withColumn function of dataframe.
> -
>
> Key: SPARK-26224
> URL: https://issues.apache.org/jira/browse/SPARK-26224
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
> Environment: On macbook, used Intellij editor. Ran the above sample 
> code as unit test.
>Reporter: Dorjee Tsering
>Priority: Minor
>
> Reproduction step:
> Run this sample code on your laptop. I am trying to add 3000 new columns to a 
> base dataframe with 1 column.
>  
>  
> {code:java}
> import spark.implicits._
> val newColumnsToBeAdded : Seq[StructField] = for (i <- 1 to 3000) yield new 
> StructField("field_" + i, DataTypes.LongType)
> val baseDataFrame: DataFrame = Seq((1)).toDF("employee_id")
> val result = newColumnsToBeAdded.foldLeft(baseDataFrame)((df, newColumn) => 
> df.withColumn(newColumn.name, lit(0)))
> result.show(false)
>  
> {code}
> Ends up with following stacktrace:
> java.lang.StackOverflowError
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:57)
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:52)
>  at 
> scala.collection.TraversableLike$class.builder$1(TraversableLike.scala:229)
>  at scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
>  at scala.collection.immutable.List.map(List.scala:296)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:333)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26224) Results in stackOverFlowError when trying to add 3000 new columns using withColumn function of dataframe.

2018-12-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16717628#comment-16717628
 ] 

ASF GitHub Bot commented on SPARK-26224:


AmplabJenkins removed a comment on issue #23285: [SPARK-26224][SQL] Avoid 
creating many project on subsequent calls to withColumn
URL: https://github.com/apache/spark/pull/23285#issuecomment-446290736
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99981/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Results in stackOverFlowError when trying to add 3000 new columns using 
> withColumn function of dataframe.
> -
>
> Key: SPARK-26224
> URL: https://issues.apache.org/jira/browse/SPARK-26224
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
> Environment: On macbook, used Intellij editor. Ran the above sample 
> code as unit test.
>Reporter: Dorjee Tsering
>Priority: Minor
>
> Reproduction step:
> Run this sample code on your laptop. I am trying to add 3000 new columns to a 
> base dataframe with 1 column.
>  
>  
> {code:java}
> import spark.implicits._
> val newColumnsToBeAdded : Seq[StructField] = for (i <- 1 to 3000) yield new 
> StructField("field_" + i, DataTypes.LongType)
> val baseDataFrame: DataFrame = Seq((1)).toDF("employee_id")
> val result = newColumnsToBeAdded.foldLeft(baseDataFrame)((df, newColumn) => 
> df.withColumn(newColumn.name, lit(0)))
> result.show(false)
>  
> {code}
> Ends up with following stacktrace:
> java.lang.StackOverflowError
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:57)
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:52)
>  at 
> scala.collection.TraversableLike$class.builder$1(TraversableLike.scala:229)
>  at scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
>  at scala.collection.immutable.List.map(List.scala:296)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:333)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26224) Results in stackOverFlowError when trying to add 3000 new columns using withColumn function of dataframe.

2018-12-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16717625#comment-16717625
 ] 

ASF GitHub Bot commented on SPARK-26224:


AmplabJenkins removed a comment on issue #23285: [SPARK-26224][SQL] Avoid 
creating many project on subsequent calls to withColumn
URL: https://github.com/apache/spark/pull/23285#issuecomment-446290728
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Results in stackOverFlowError when trying to add 3000 new columns using 
> withColumn function of dataframe.
> -
>
> Key: SPARK-26224
> URL: https://issues.apache.org/jira/browse/SPARK-26224
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
> Environment: On macbook, used Intellij editor. Ran the above sample 
> code as unit test.
>Reporter: Dorjee Tsering
>Priority: Minor
>
> Reproduction step:
> Run this sample code on your laptop. I am trying to add 3000 new columns to a 
> base dataframe with 1 column.
>  
>  
> {code:java}
> import spark.implicits._
> val newColumnsToBeAdded : Seq[StructField] = for (i <- 1 to 3000) yield new 
> StructField("field_" + i, DataTypes.LongType)
> val baseDataFrame: DataFrame = Seq((1)).toDF("employee_id")
> val result = newColumnsToBeAdded.foldLeft(baseDataFrame)((df, newColumn) => 
> df.withColumn(newColumn.name, lit(0)))
> result.show(false)
>  
> {code}
> Ends up with following stacktrace:
> java.lang.StackOverflowError
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:57)
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:52)
>  at 
> scala.collection.TraversableLike$class.builder$1(TraversableLike.scala:229)
>  at scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
>  at scala.collection.immutable.List.map(List.scala:296)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:333)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26224) Results in stackOverFlowError when trying to add 3000 new columns using withColumn function of dataframe.

2018-12-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16717624#comment-16717624
 ] 

ASF GitHub Bot commented on SPARK-26224:


SparkQA removed a comment on issue #23285: [SPARK-26224][SQL] Avoid creating 
many project on subsequent calls to withColumn
URL: https://github.com/apache/spark/pull/23285#issuecomment-446230975
 
 
   **[Test build #99981 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99981/testReport)**
 for PR 23285 at commit 
[`a40db10`](https://github.com/apache/spark/commit/a40db106d4d26966af04afb79edb35a59efd1703).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Results in stackOverFlowError when trying to add 3000 new columns using 
> withColumn function of dataframe.
> -
>
> Key: SPARK-26224
> URL: https://issues.apache.org/jira/browse/SPARK-26224
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
> Environment: On macbook, used Intellij editor. Ran the above sample 
> code as unit test.
>Reporter: Dorjee Tsering
>Priority: Minor
>
> Reproduction step:
> Run this sample code on your laptop. I am trying to add 3000 new columns to a 
> base dataframe with 1 column.
>  
>  
> {code:java}
> import spark.implicits._
> val newColumnsToBeAdded : Seq[StructField] = for (i <- 1 to 3000) yield new 
> StructField("field_" + i, DataTypes.LongType)
> val baseDataFrame: DataFrame = Seq((1)).toDF("employee_id")
> val result = newColumnsToBeAdded.foldLeft(baseDataFrame)((df, newColumn) => 
> df.withColumn(newColumn.name, lit(0)))
> result.show(false)
>  
> {code}
> Ends up with following stacktrace:
> java.lang.StackOverflowError
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:57)
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:52)
>  at 
> scala.collection.TraversableLike$class.builder$1(TraversableLike.scala:229)
>  at scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
>  at scala.collection.immutable.List.map(List.scala:296)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:333)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26224) Results in stackOverFlowError when trying to add 3000 new columns using withColumn function of dataframe.

2018-12-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16717622#comment-16717622
 ] 

ASF GitHub Bot commented on SPARK-26224:


AmplabJenkins commented on issue #23285: [SPARK-26224][SQL] Avoid creating many 
project on subsequent calls to withColumn
URL: https://github.com/apache/spark/pull/23285#issuecomment-446290728
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Results in stackOverFlowError when trying to add 3000 new columns using 
> withColumn function of dataframe.
> -
>
> Key: SPARK-26224
> URL: https://issues.apache.org/jira/browse/SPARK-26224
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
> Environment: On macbook, used Intellij editor. Ran the above sample 
> code as unit test.
>Reporter: Dorjee Tsering
>Priority: Minor
>
> Reproduction step:
> Run this sample code on your laptop. I am trying to add 3000 new columns to a 
> base dataframe with 1 column.
>  
>  
> {code:java}
> import spark.implicits._
> val newColumnsToBeAdded : Seq[StructField] = for (i <- 1 to 3000) yield new 
> StructField("field_" + i, DataTypes.LongType)
> val baseDataFrame: DataFrame = Seq((1)).toDF("employee_id")
> val result = newColumnsToBeAdded.foldLeft(baseDataFrame)((df, newColumn) => 
> df.withColumn(newColumn.name, lit(0)))
> result.show(false)
>  
> {code}
> Ends up with following stacktrace:
> java.lang.StackOverflowError
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:57)
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:52)
>  at 
> scala.collection.TraversableLike$class.builder$1(TraversableLike.scala:229)
>  at scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
>  at scala.collection.immutable.List.map(List.scala:296)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:333)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26224) Results in stackOverFlowError when trying to add 3000 new columns using withColumn function of dataframe.

2018-12-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16717623#comment-16717623
 ] 

ASF GitHub Bot commented on SPARK-26224:


AmplabJenkins commented on issue #23285: [SPARK-26224][SQL] Avoid creating many 
project on subsequent calls to withColumn
URL: https://github.com/apache/spark/pull/23285#issuecomment-446290736
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99981/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Results in stackOverFlowError when trying to add 3000 new columns using 
> withColumn function of dataframe.
> -
>
> Key: SPARK-26224
> URL: https://issues.apache.org/jira/browse/SPARK-26224
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
> Environment: On macbook, used Intellij editor. Ran the above sample 
> code as unit test.
>Reporter: Dorjee Tsering
>Priority: Minor
>
> Reproduction step:
> Run this sample code on your laptop. I am trying to add 3000 new columns to a 
> base dataframe with 1 column.
>  
>  
> {code:java}
> import spark.implicits._
> val newColumnsToBeAdded : Seq[StructField] = for (i <- 1 to 3000) yield new 
> StructField("field_" + i, DataTypes.LongType)
> val baseDataFrame: DataFrame = Seq((1)).toDF("employee_id")
> val result = newColumnsToBeAdded.foldLeft(baseDataFrame)((df, newColumn) => 
> df.withColumn(newColumn.name, lit(0)))
> result.show(false)
>  
> {code}
> Ends up with following stacktrace:
> java.lang.StackOverflowError
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:57)
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:52)
>  at 
> scala.collection.TraversableLike$class.builder$1(TraversableLike.scala:229)
>  at scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
>  at scala.collection.immutable.List.map(List.scala:296)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:333)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26224) Results in stackOverFlowError when trying to add 3000 new columns using withColumn function of dataframe.

2018-12-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16717337#comment-16717337
 ] 

ASF GitHub Bot commented on SPARK-26224:


AmplabJenkins removed a comment on issue #23285: [SPARK-26224][SQL] Avoid 
creating many project on subsequent calls to withColumn
URL: https://github.com/apache/spark/pull/23285#issuecomment-446233451
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5980/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Results in stackOverFlowError when trying to add 3000 new columns using 
> withColumn function of dataframe.
> -
>
> Key: SPARK-26224
> URL: https://issues.apache.org/jira/browse/SPARK-26224
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
> Environment: On macbook, used Intellij editor. Ran the above sample 
> code as unit test.
>Reporter: Dorjee Tsering
>Priority: Minor
>
> Reproduction step:
> Run this sample code on your laptop. I am trying to add 3000 new columns to a 
> base dataframe with 1 column.
>  
>  
> {code:java}
> import spark.implicits._
> val newColumnsToBeAdded : Seq[StructField] = for (i <- 1 to 3000) yield new 
> StructField("field_" + i, DataTypes.LongType)
> val baseDataFrame: DataFrame = Seq((1)).toDF("employee_id")
> val result = newColumnsToBeAdded.foldLeft(baseDataFrame)((df, newColumn) => 
> df.withColumn(newColumn.name, lit(0)))
> result.show(false)
>  
> {code}
> Ends up with following stacktrace:
> java.lang.StackOverflowError
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:57)
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:52)
>  at 
> scala.collection.TraversableLike$class.builder$1(TraversableLike.scala:229)
>  at scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
>  at scala.collection.immutable.List.map(List.scala:296)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:333)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26224) Results in stackOverFlowError when trying to add 3000 new columns using withColumn function of dataframe.

2018-12-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16717336#comment-16717336
 ] 

ASF GitHub Bot commented on SPARK-26224:


AmplabJenkins removed a comment on issue #23285: [SPARK-26224][SQL] Avoid 
creating many project on subsequent calls to withColumn
URL: https://github.com/apache/spark/pull/23285#issuecomment-446233443
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Results in stackOverFlowError when trying to add 3000 new columns using 
> withColumn function of dataframe.
> -
>
> Key: SPARK-26224
> URL: https://issues.apache.org/jira/browse/SPARK-26224
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
> Environment: On macbook, used Intellij editor. Ran the above sample 
> code as unit test.
>Reporter: Dorjee Tsering
>Priority: Minor
>
> Reproduction step:
> Run this sample code on your laptop. I am trying to add 3000 new columns to a 
> base dataframe with 1 column.
>  
>  
> {code:java}
> import spark.implicits._
> val newColumnsToBeAdded : Seq[StructField] = for (i <- 1 to 3000) yield new 
> StructField("field_" + i, DataTypes.LongType)
> val baseDataFrame: DataFrame = Seq((1)).toDF("employee_id")
> val result = newColumnsToBeAdded.foldLeft(baseDataFrame)((df, newColumn) => 
> df.withColumn(newColumn.name, lit(0)))
> result.show(false)
>  
> {code}
> Ends up with following stacktrace:
> java.lang.StackOverflowError
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:57)
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:52)
>  at 
> scala.collection.TraversableLike$class.builder$1(TraversableLike.scala:229)
>  at scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
>  at scala.collection.immutable.List.map(List.scala:296)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:333)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26224) Results in stackOverFlowError when trying to add 3000 new columns using withColumn function of dataframe.

2018-12-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16717315#comment-16717315
 ] 

ASF GitHub Bot commented on SPARK-26224:


AmplabJenkins removed a comment on issue #23285: [SPARK-26224][SQL] Avoid 
creating many project on subsequent calls to withColumn
URL: https://github.com/apache/spark/pull/23285#issuecomment-446230324
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99975/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Results in stackOverFlowError when trying to add 3000 new columns using 
> withColumn function of dataframe.
> -
>
> Key: SPARK-26224
> URL: https://issues.apache.org/jira/browse/SPARK-26224
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
> Environment: On macbook, used Intellij editor. Ran the above sample 
> code as unit test.
>Reporter: Dorjee Tsering
>Priority: Minor
>
> Reproduction step:
> Run this sample code on your laptop. I am trying to add 3000 new columns to a 
> base dataframe with 1 column.
>  
>  
> {code:java}
> import spark.implicits._
> val newColumnsToBeAdded : Seq[StructField] = for (i <- 1 to 3000) yield new 
> StructField("field_" + i, DataTypes.LongType)
> val baseDataFrame: DataFrame = Seq((1)).toDF("employee_id")
> val result = newColumnsToBeAdded.foldLeft(baseDataFrame)((df, newColumn) => 
> df.withColumn(newColumn.name, lit(0)))
> result.show(false)
>  
> {code}
> Ends up with following stacktrace:
> java.lang.StackOverflowError
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:57)
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:52)
>  at 
> scala.collection.TraversableLike$class.builder$1(TraversableLike.scala:229)
>  at scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
>  at scala.collection.immutable.List.map(List.scala:296)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:333)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26224) Results in stackOverFlowError when trying to add 3000 new columns using withColumn function of dataframe.

2018-12-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16717335#comment-16717335
 ] 

ASF GitHub Bot commented on SPARK-26224:


AmplabJenkins commented on issue #23285: [SPARK-26224][SQL] Avoid creating many 
project on subsequent calls to withColumn
URL: https://github.com/apache/spark/pull/23285#issuecomment-446233451
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5980/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Results in stackOverFlowError when trying to add 3000 new columns using 
> withColumn function of dataframe.
> -
>
> Key: SPARK-26224
> URL: https://issues.apache.org/jira/browse/SPARK-26224
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
> Environment: On macbook, used Intellij editor. Ran the above sample 
> code as unit test.
>Reporter: Dorjee Tsering
>Priority: Minor
>
> Reproduction step:
> Run this sample code on your laptop. I am trying to add 3000 new columns to a 
> base dataframe with 1 column.
>  
>  
> {code:java}
> import spark.implicits._
> val newColumnsToBeAdded : Seq[StructField] = for (i <- 1 to 3000) yield new 
> StructField("field_" + i, DataTypes.LongType)
> val baseDataFrame: DataFrame = Seq((1)).toDF("employee_id")
> val result = newColumnsToBeAdded.foldLeft(baseDataFrame)((df, newColumn) => 
> df.withColumn(newColumn.name, lit(0)))
> result.show(false)
>  
> {code}
> Ends up with following stacktrace:
> java.lang.StackOverflowError
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:57)
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:52)
>  at 
> scala.collection.TraversableLike$class.builder$1(TraversableLike.scala:229)
>  at scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
>  at scala.collection.immutable.List.map(List.scala:296)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:333)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26224) Results in stackOverFlowError when trying to add 3000 new columns using withColumn function of dataframe.

2018-12-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16717334#comment-16717334
 ] 

ASF GitHub Bot commented on SPARK-26224:


AmplabJenkins commented on issue #23285: [SPARK-26224][SQL] Avoid creating many 
project on subsequent calls to withColumn
URL: https://github.com/apache/spark/pull/23285#issuecomment-446233443
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Results in stackOverFlowError when trying to add 3000 new columns using 
> withColumn function of dataframe.
> -
>
> Key: SPARK-26224
> URL: https://issues.apache.org/jira/browse/SPARK-26224
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
> Environment: On macbook, used Intellij editor. Ran the above sample 
> code as unit test.
>Reporter: Dorjee Tsering
>Priority: Minor
>
> Reproduction step:
> Run this sample code on your laptop. I am trying to add 3000 new columns to a 
> base dataframe with 1 column.
>  
>  
> {code:java}
> import spark.implicits._
> val newColumnsToBeAdded : Seq[StructField] = for (i <- 1 to 3000) yield new 
> StructField("field_" + i, DataTypes.LongType)
> val baseDataFrame: DataFrame = Seq((1)).toDF("employee_id")
> val result = newColumnsToBeAdded.foldLeft(baseDataFrame)((df, newColumn) => 
> df.withColumn(newColumn.name, lit(0)))
> result.show(false)
>  
> {code}
> Ends up with following stacktrace:
> java.lang.StackOverflowError
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:57)
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:52)
>  at 
> scala.collection.TraversableLike$class.builder$1(TraversableLike.scala:229)
>  at scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
>  at scala.collection.immutable.List.map(List.scala:296)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:333)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26224) Results in stackOverFlowError when trying to add 3000 new columns using withColumn function of dataframe.

2018-12-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16717313#comment-16717313
 ] 

ASF GitHub Bot commented on SPARK-26224:


mgaido91 commented on issue #23285: [SPARK-26224][SQL] Avoid creating many 
project on subsequent calls to withColumn
URL: https://github.com/apache/spark/pull/23285#issuecomment-446230611
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Results in stackOverFlowError when trying to add 3000 new columns using 
> withColumn function of dataframe.
> -
>
> Key: SPARK-26224
> URL: https://issues.apache.org/jira/browse/SPARK-26224
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
> Environment: On macbook, used Intellij editor. Ran the above sample 
> code as unit test.
>Reporter: Dorjee Tsering
>Priority: Minor
>
> Reproduction step:
> Run this sample code on your laptop. I am trying to add 3000 new columns to a 
> base dataframe with 1 column.
>  
>  
> {code:java}
> import spark.implicits._
> val newColumnsToBeAdded : Seq[StructField] = for (i <- 1 to 3000) yield new 
> StructField("field_" + i, DataTypes.LongType)
> val baseDataFrame: DataFrame = Seq((1)).toDF("employee_id")
> val result = newColumnsToBeAdded.foldLeft(baseDataFrame)((df, newColumn) => 
> df.withColumn(newColumn.name, lit(0)))
> result.show(false)
>  
> {code}
> Ends up with following stacktrace:
> java.lang.StackOverflowError
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:57)
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:52)
>  at 
> scala.collection.TraversableLike$class.builder$1(TraversableLike.scala:229)
>  at scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
>  at scala.collection.immutable.List.map(List.scala:296)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:333)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26224) Results in stackOverFlowError when trying to add 3000 new columns using withColumn function of dataframe.

2018-12-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16717310#comment-16717310
 ] 

ASF GitHub Bot commented on SPARK-26224:


AmplabJenkins removed a comment on issue #23285: [SPARK-26224][SQL] Avoid 
creating many project on subsequent calls to withColumn
URL: https://github.com/apache/spark/pull/23285#issuecomment-446230298
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Results in stackOverFlowError when trying to add 3000 new columns using 
> withColumn function of dataframe.
> -
>
> Key: SPARK-26224
> URL: https://issues.apache.org/jira/browse/SPARK-26224
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
> Environment: On macbook, used Intellij editor. Ran the above sample 
> code as unit test.
>Reporter: Dorjee Tsering
>Priority: Minor
>
> Reproduction step:
> Run this sample code on your laptop. I am trying to add 3000 new columns to a 
> base dataframe with 1 column.
>  
>  
> {code:java}
> import spark.implicits._
> val newColumnsToBeAdded : Seq[StructField] = for (i <- 1 to 3000) yield new 
> StructField("field_" + i, DataTypes.LongType)
> val baseDataFrame: DataFrame = Seq((1)).toDF("employee_id")
> val result = newColumnsToBeAdded.foldLeft(baseDataFrame)((df, newColumn) => 
> df.withColumn(newColumn.name, lit(0)))
> result.show(false)
>  
> {code}
> Ends up with following stacktrace:
> java.lang.StackOverflowError
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:57)
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:52)
>  at 
> scala.collection.TraversableLike$class.builder$1(TraversableLike.scala:229)
>  at scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
>  at scala.collection.immutable.List.map(List.scala:296)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:333)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26224) Results in stackOverFlowError when trying to add 3000 new columns using withColumn function of dataframe.

2018-12-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16717309#comment-16717309
 ] 

ASF GitHub Bot commented on SPARK-26224:


SparkQA removed a comment on issue #23285: [SPARK-26224][SQL] Avoid creating 
many project on subsequent calls to withColumn
URL: https://github.com/apache/spark/pull/23285#issuecomment-446200970
 
 
   **[Test build #99975 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99975/testReport)**
 for PR 23285 at commit 
[`a40db10`](https://github.com/apache/spark/commit/a40db106d4d26966af04afb79edb35a59efd1703).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Results in stackOverFlowError when trying to add 3000 new columns using 
> withColumn function of dataframe.
> -
>
> Key: SPARK-26224
> URL: https://issues.apache.org/jira/browse/SPARK-26224
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
> Environment: On macbook, used Intellij editor. Ran the above sample 
> code as unit test.
>Reporter: Dorjee Tsering
>Priority: Minor
>
> Reproduction step:
> Run this sample code on your laptop. I am trying to add 3000 new columns to a 
> base dataframe with 1 column.
>  
>  
> {code:java}
> import spark.implicits._
> val newColumnsToBeAdded : Seq[StructField] = for (i <- 1 to 3000) yield new 
> StructField("field_" + i, DataTypes.LongType)
> val baseDataFrame: DataFrame = Seq((1)).toDF("employee_id")
> val result = newColumnsToBeAdded.foldLeft(baseDataFrame)((df, newColumn) => 
> df.withColumn(newColumn.name, lit(0)))
> result.show(false)
>  
> {code}
> Ends up with following stacktrace:
> java.lang.StackOverflowError
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:57)
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:52)
>  at 
> scala.collection.TraversableLike$class.builder$1(TraversableLike.scala:229)
>  at scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
>  at scala.collection.immutable.List.map(List.scala:296)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:333)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26224) Results in stackOverFlowError when trying to add 3000 new columns using withColumn function of dataframe.

2018-12-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16717308#comment-16717308
 ] 

ASF GitHub Bot commented on SPARK-26224:


AmplabJenkins removed a comment on issue #23285: [SPARK-26224][SQL] Avoid 
creating many project on subsequent calls to withColumn
URL: https://github.com/apache/spark/pull/23285#issuecomment-446201096
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5974/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Results in stackOverFlowError when trying to add 3000 new columns using 
> withColumn function of dataframe.
> -
>
> Key: SPARK-26224
> URL: https://issues.apache.org/jira/browse/SPARK-26224
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
> Environment: On macbook, used Intellij editor. Ran the above sample 
> code as unit test.
>Reporter: Dorjee Tsering
>Priority: Minor
>
> Reproduction step:
> Run this sample code on your laptop. I am trying to add 3000 new columns to a 
> base dataframe with 1 column.
>  
>  
> {code:java}
> import spark.implicits._
> val newColumnsToBeAdded : Seq[StructField] = for (i <- 1 to 3000) yield new 
> StructField("field_" + i, DataTypes.LongType)
> val baseDataFrame: DataFrame = Seq((1)).toDF("employee_id")
> val result = newColumnsToBeAdded.foldLeft(baseDataFrame)((df, newColumn) => 
> df.withColumn(newColumn.name, lit(0)))
> result.show(false)
>  
> {code}
> Ends up with following stacktrace:
> java.lang.StackOverflowError
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:57)
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:52)
>  at 
> scala.collection.TraversableLike$class.builder$1(TraversableLike.scala:229)
>  at scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
>  at scala.collection.immutable.List.map(List.scala:296)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:333)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26224) Results in stackOverFlowError when trying to add 3000 new columns using withColumn function of dataframe.

2018-12-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16717307#comment-16717307
 ] 

ASF GitHub Bot commented on SPARK-26224:


AmplabJenkins commented on issue #23285: [SPARK-26224][SQL] Avoid creating many 
project on subsequent calls to withColumn
URL: https://github.com/apache/spark/pull/23285#issuecomment-446230324
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99975/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Results in stackOverFlowError when trying to add 3000 new columns using 
> withColumn function of dataframe.
> -
>
> Key: SPARK-26224
> URL: https://issues.apache.org/jira/browse/SPARK-26224
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
> Environment: On macbook, used Intellij editor. Ran the above sample 
> code as unit test.
>Reporter: Dorjee Tsering
>Priority: Minor
>
> Reproduction step:
> Run this sample code on your laptop. I am trying to add 3000 new columns to a 
> base dataframe with 1 column.
>  
>  
> {code:java}
> import spark.implicits._
> val newColumnsToBeAdded : Seq[StructField] = for (i <- 1 to 3000) yield new 
> StructField("field_" + i, DataTypes.LongType)
> val baseDataFrame: DataFrame = Seq((1)).toDF("employee_id")
> val result = newColumnsToBeAdded.foldLeft(baseDataFrame)((df, newColumn) => 
> df.withColumn(newColumn.name, lit(0)))
> result.show(false)
>  
> {code}
> Ends up with following stacktrace:
> java.lang.StackOverflowError
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:57)
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:52)
>  at 
> scala.collection.TraversableLike$class.builder$1(TraversableLike.scala:229)
>  at scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
>  at scala.collection.immutable.List.map(List.scala:296)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:333)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26224) Results in stackOverFlowError when trying to add 3000 new columns using withColumn function of dataframe.

2018-12-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16717306#comment-16717306
 ] 

ASF GitHub Bot commented on SPARK-26224:


AmplabJenkins commented on issue #23285: [SPARK-26224][SQL] Avoid creating many 
project on subsequent calls to withColumn
URL: https://github.com/apache/spark/pull/23285#issuecomment-446230298
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Results in stackOverFlowError when trying to add 3000 new columns using 
> withColumn function of dataframe.
> -
>
> Key: SPARK-26224
> URL: https://issues.apache.org/jira/browse/SPARK-26224
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
> Environment: On macbook, used Intellij editor. Ran the above sample 
> code as unit test.
>Reporter: Dorjee Tsering
>Priority: Minor
>
> Reproduction step:
> Run this sample code on your laptop. I am trying to add 3000 new columns to a 
> base dataframe with 1 column.
>  
>  
> {code:java}
> import spark.implicits._
> val newColumnsToBeAdded : Seq[StructField] = for (i <- 1 to 3000) yield new 
> StructField("field_" + i, DataTypes.LongType)
> val baseDataFrame: DataFrame = Seq((1)).toDF("employee_id")
> val result = newColumnsToBeAdded.foldLeft(baseDataFrame)((df, newColumn) => 
> df.withColumn(newColumn.name, lit(0)))
> result.show(false)
>  
> {code}
> Ends up with following stacktrace:
> java.lang.StackOverflowError
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:57)
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:52)
>  at 
> scala.collection.TraversableLike$class.builder$1(TraversableLike.scala:229)
>  at scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
>  at scala.collection.immutable.List.map(List.scala:296)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:333)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26224) Results in stackOverFlowError when trying to add 3000 new columns using withColumn function of dataframe.

2018-12-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16717304#comment-16717304
 ] 

ASF GitHub Bot commented on SPARK-26224:


SparkQA commented on issue #23285: [SPARK-26224][SQL] Avoid creating many 
project on subsequent calls to withColumn
URL: https://github.com/apache/spark/pull/23285#issuecomment-446230024
 
 
   **[Test build #99975 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99975/testReport)**
 for PR 23285 at commit 
[`a40db10`](https://github.com/apache/spark/commit/a40db106d4d26966af04afb79edb35a59efd1703).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Results in stackOverFlowError when trying to add 3000 new columns using 
> withColumn function of dataframe.
> -
>
> Key: SPARK-26224
> URL: https://issues.apache.org/jira/browse/SPARK-26224
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
> Environment: On macbook, used Intellij editor. Ran the above sample 
> code as unit test.
>Reporter: Dorjee Tsering
>Priority: Minor
>
> Reproduction step:
> Run this sample code on your laptop. I am trying to add 3000 new columns to a 
> base dataframe with 1 column.
>  
>  
> {code:java}
> import spark.implicits._
> val newColumnsToBeAdded : Seq[StructField] = for (i <- 1 to 3000) yield new 
> StructField("field_" + i, DataTypes.LongType)
> val baseDataFrame: DataFrame = Seq((1)).toDF("employee_id")
> val result = newColumnsToBeAdded.foldLeft(baseDataFrame)((df, newColumn) => 
> df.withColumn(newColumn.name, lit(0)))
> result.show(false)
>  
> {code}
> Ends up with following stacktrace:
> java.lang.StackOverflowError
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:57)
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:52)
>  at 
> scala.collection.TraversableLike$class.builder$1(TraversableLike.scala:229)
>  at scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
>  at scala.collection.immutable.List.map(List.scala:296)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:333)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26224) Results in stackOverFlowError when trying to add 3000 new columns using withColumn function of dataframe.

2018-12-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16717112#comment-16717112
 ] 

ASF GitHub Bot commented on SPARK-26224:


mgaido91 commented on a change in pull request #23285: [SPARK-26224][SQL] Avoid 
creating many project on subsequent calls to withColumn
URL: https://github.com/apache/spark/pull/23285#discussion_r240605608
 
 

 ##
 File path: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
 ##
 @@ -2146,7 +2146,7 @@ class Dataset[T] private[sql](
* Returns a new Dataset by adding columns or replacing the existing columns 
that has
* the same names.
*/
-  private[spark] def withColumns(colNames: Seq[String], cols: Seq[Column]): 
DataFrame = {
+  private[spark] def withColumns(colNames: Seq[String], cols: Seq[Column]): 
DataFrame = withPlan {
 
 Review comment:
   yes, but I think this is a special case. I have seen many cases when 
`withColumn` is used in for loops: with this change such a pattern would be 
better supported.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Results in stackOverFlowError when trying to add 3000 new columns using 
> withColumn function of dataframe.
> -
>
> Key: SPARK-26224
> URL: https://issues.apache.org/jira/browse/SPARK-26224
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
> Environment: On macbook, used Intellij editor. Ran the above sample 
> code as unit test.
>Reporter: Dorjee Tsering
>Priority: Minor
>
> Reproduction step:
> Run this sample code on your laptop. I am trying to add 3000 new columns to a 
> base dataframe with 1 column.
>  
>  
> {code:java}
> import spark.implicits._
> val newColumnsToBeAdded : Seq[StructField] = for (i <- 1 to 3000) yield new 
> StructField("field_" + i, DataTypes.LongType)
> val baseDataFrame: DataFrame = Seq((1)).toDF("employee_id")
> val result = newColumnsToBeAdded.foldLeft(baseDataFrame)((df, newColumn) => 
> df.withColumn(newColumn.name, lit(0)))
> result.show(false)
>  
> {code}
> Ends up with following stacktrace:
> java.lang.StackOverflowError
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:57)
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:52)
>  at 
> scala.collection.TraversableLike$class.builder$1(TraversableLike.scala:229)
>  at scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
>  at scala.collection.immutable.List.map(List.scala:296)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:333)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26224) Results in stackOverFlowError when trying to add 3000 new columns using withColumn function of dataframe.

2018-12-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16717123#comment-16717123
 ] 

ASF GitHub Bot commented on SPARK-26224:


AmplabJenkins removed a comment on issue #23285: [SPARK-26224][SQL] Avoid 
creating many project on subsequent calls to withColumn
URL: https://github.com/apache/spark/pull/23285#issuecomment-446201089
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Results in stackOverFlowError when trying to add 3000 new columns using 
> withColumn function of dataframe.
> -
>
> Key: SPARK-26224
> URL: https://issues.apache.org/jira/browse/SPARK-26224
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
> Environment: On macbook, used Intellij editor. Ran the above sample 
> code as unit test.
>Reporter: Dorjee Tsering
>Priority: Minor
>
> Reproduction step:
> Run this sample code on your laptop. I am trying to add 3000 new columns to a 
> base dataframe with 1 column.
>  
>  
> {code:java}
> import spark.implicits._
> val newColumnsToBeAdded : Seq[StructField] = for (i <- 1 to 3000) yield new 
> StructField("field_" + i, DataTypes.LongType)
> val baseDataFrame: DataFrame = Seq((1)).toDF("employee_id")
> val result = newColumnsToBeAdded.foldLeft(baseDataFrame)((df, newColumn) => 
> df.withColumn(newColumn.name, lit(0)))
> result.show(false)
>  
> {code}
> Ends up with following stacktrace:
> java.lang.StackOverflowError
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:57)
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:52)
>  at 
> scala.collection.TraversableLike$class.builder$1(TraversableLike.scala:229)
>  at scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
>  at scala.collection.immutable.List.map(List.scala:296)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:333)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26224) Results in stackOverFlowError when trying to add 3000 new columns using withColumn function of dataframe.

2018-12-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16717121#comment-16717121
 ] 

ASF GitHub Bot commented on SPARK-26224:


AmplabJenkins commented on issue #23285: [SPARK-26224][SQL] Avoid creating many 
project on subsequent calls to withColumn
URL: https://github.com/apache/spark/pull/23285#issuecomment-446201096
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5974/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Results in stackOverFlowError when trying to add 3000 new columns using 
> withColumn function of dataframe.
> -
>
> Key: SPARK-26224
> URL: https://issues.apache.org/jira/browse/SPARK-26224
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
> Environment: On macbook, used Intellij editor. Ran the above sample 
> code as unit test.
>Reporter: Dorjee Tsering
>Priority: Minor
>
> Reproduction step:
> Run this sample code on your laptop. I am trying to add 3000 new columns to a 
> base dataframe with 1 column.
>  
>  
> {code:java}
> import spark.implicits._
> val newColumnsToBeAdded : Seq[StructField] = for (i <- 1 to 3000) yield new 
> StructField("field_" + i, DataTypes.LongType)
> val baseDataFrame: DataFrame = Seq((1)).toDF("employee_id")
> val result = newColumnsToBeAdded.foldLeft(baseDataFrame)((df, newColumn) => 
> df.withColumn(newColumn.name, lit(0)))
> result.show(false)
>  
> {code}
> Ends up with following stacktrace:
> java.lang.StackOverflowError
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:57)
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:52)
>  at 
> scala.collection.TraversableLike$class.builder$1(TraversableLike.scala:229)
>  at scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
>  at scala.collection.immutable.List.map(List.scala:296)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:333)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26224) Results in stackOverFlowError when trying to add 3000 new columns using withColumn function of dataframe.

2018-12-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16717120#comment-16717120
 ] 

ASF GitHub Bot commented on SPARK-26224:


SparkQA commented on issue #23285: [SPARK-26224][SQL] Avoid creating many 
project on subsequent calls to withColumn
URL: https://github.com/apache/spark/pull/23285#issuecomment-446200970
 
 
   **[Test build #99975 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99975/testReport)**
 for PR 23285 at commit 
[`a40db10`](https://github.com/apache/spark/commit/a40db106d4d26966af04afb79edb35a59efd1703).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Results in stackOverFlowError when trying to add 3000 new columns using 
> withColumn function of dataframe.
> -
>
> Key: SPARK-26224
> URL: https://issues.apache.org/jira/browse/SPARK-26224
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
> Environment: On macbook, used Intellij editor. Ran the above sample 
> code as unit test.
>Reporter: Dorjee Tsering
>Priority: Minor
>
> Reproduction step:
> Run this sample code on your laptop. I am trying to add 3000 new columns to a 
> base dataframe with 1 column.
>  
>  
> {code:java}
> import spark.implicits._
> val newColumnsToBeAdded : Seq[StructField] = for (i <- 1 to 3000) yield new 
> StructField("field_" + i, DataTypes.LongType)
> val baseDataFrame: DataFrame = Seq((1)).toDF("employee_id")
> val result = newColumnsToBeAdded.foldLeft(baseDataFrame)((df, newColumn) => 
> df.withColumn(newColumn.name, lit(0)))
> result.show(false)
>  
> {code}
> Ends up with following stacktrace:
> java.lang.StackOverflowError
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:57)
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:52)
>  at 
> scala.collection.TraversableLike$class.builder$1(TraversableLike.scala:229)
>  at scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
>  at scala.collection.immutable.List.map(List.scala:296)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:333)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26224) Results in stackOverFlowError when trying to add 3000 new columns using withColumn function of dataframe.

2018-12-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16717024#comment-16717024
 ] 

ASF GitHub Bot commented on SPARK-26224:


viirya commented on a change in pull request #23285: [SPARK-26224][SQL] Avoid 
creating many project on subsequent calls to withColumn
URL: https://github.com/apache/spark/pull/23285#discussion_r240592731
 
 

 ##
 File path: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
 ##
 @@ -2146,7 +2146,7 @@ class Dataset[T] private[sql](
* Returns a new Dataset by adding columns or replacing the existing columns 
that has
* the same names.
*/
-  private[spark] def withColumns(colNames: Seq[String], cols: Seq[Column]): 
DataFrame = {
+  private[spark] def withColumns(colNames: Seq[String], cols: Seq[Column]): 
DataFrame = withPlan {
 
 Review comment:
   As stated on the JIRA ticket, the problem is deep query plan. I think we can 
have many ways to create such deep query plan, not only for `withColumns`. For 
example, you can call `select` many times to do that too. This change makes 
`withColumns` a special case.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Results in stackOverFlowError when trying to add 3000 new columns using 
> withColumn function of dataframe.
> -
>
> Key: SPARK-26224
> URL: https://issues.apache.org/jira/browse/SPARK-26224
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
> Environment: On macbook, used Intellij editor. Ran the above sample 
> code as unit test.
>Reporter: Dorjee Tsering
>Priority: Minor
>
> Reproduction step:
> Run this sample code on your laptop. I am trying to add 3000 new columns to a 
> base dataframe with 1 column.
>  
>  
> {code:java}
> import spark.implicits._
> val newColumnsToBeAdded : Seq[StructField] = for (i <- 1 to 3000) yield new 
> StructField("field_" + i, DataTypes.LongType)
> val baseDataFrame: DataFrame = Seq((1)).toDF("employee_id")
> val result = newColumnsToBeAdded.foldLeft(baseDataFrame)((df, newColumn) => 
> df.withColumn(newColumn.name, lit(0)))
> result.show(false)
>  
> {code}
> Ends up with following stacktrace:
> java.lang.StackOverflowError
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:57)
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:52)
>  at 
> scala.collection.TraversableLike$class.builder$1(TraversableLike.scala:229)
>  at scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
>  at scala.collection.immutable.List.map(List.scala:296)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:333)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26224) Results in stackOverFlowError when trying to add 3000 new columns using withColumn function of dataframe.

2018-12-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716993#comment-16716993
 ] 

ASF GitHub Bot commented on SPARK-26224:


AmplabJenkins removed a comment on issue #23285: [SPARK-26224][SQL] Avoid 
creating many project on subsequent calls to withColumn
URL: https://github.com/apache/spark/pull/23285#issuecomment-446181893
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99969/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Results in stackOverFlowError when trying to add 3000 new columns using 
> withColumn function of dataframe.
> -
>
> Key: SPARK-26224
> URL: https://issues.apache.org/jira/browse/SPARK-26224
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
> Environment: On macbook, used Intellij editor. Ran the above sample 
> code as unit test.
>Reporter: Dorjee Tsering
>Priority: Minor
>
> Reproduction step:
> Run this sample code on your laptop. I am trying to add 3000 new columns to a 
> base dataframe with 1 column.
>  
>  
> {code:java}
> import spark.implicits._
> val newColumnsToBeAdded : Seq[StructField] = for (i <- 1 to 3000) yield new 
> StructField("field_" + i, DataTypes.LongType)
> val baseDataFrame: DataFrame = Seq((1)).toDF("employee_id")
> val result = newColumnsToBeAdded.foldLeft(baseDataFrame)((df, newColumn) => 
> df.withColumn(newColumn.name, lit(0)))
> result.show(false)
>  
> {code}
> Ends up with following stacktrace:
> java.lang.StackOverflowError
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:57)
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:52)
>  at 
> scala.collection.TraversableLike$class.builder$1(TraversableLike.scala:229)
>  at scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
>  at scala.collection.immutable.List.map(List.scala:296)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:333)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26224) Results in stackOverFlowError when trying to add 3000 new columns using withColumn function of dataframe.

2018-12-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716988#comment-16716988
 ] 

ASF GitHub Bot commented on SPARK-26224:


AmplabJenkins removed a comment on issue #23285: [SPARK-26224][SQL] Avoid 
creating many project on subsequent calls to withColumn
URL: https://github.com/apache/spark/pull/23285#issuecomment-446181889
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Results in stackOverFlowError when trying to add 3000 new columns using 
> withColumn function of dataframe.
> -
>
> Key: SPARK-26224
> URL: https://issues.apache.org/jira/browse/SPARK-26224
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
> Environment: On macbook, used Intellij editor. Ran the above sample 
> code as unit test.
>Reporter: Dorjee Tsering
>Priority: Minor
>
> Reproduction step:
> Run this sample code on your laptop. I am trying to add 3000 new columns to a 
> base dataframe with 1 column.
>  
>  
> {code:java}
> import spark.implicits._
> val newColumnsToBeAdded : Seq[StructField] = for (i <- 1 to 3000) yield new 
> StructField("field_" + i, DataTypes.LongType)
> val baseDataFrame: DataFrame = Seq((1)).toDF("employee_id")
> val result = newColumnsToBeAdded.foldLeft(baseDataFrame)((df, newColumn) => 
> df.withColumn(newColumn.name, lit(0)))
> result.show(false)
>  
> {code}
> Ends up with following stacktrace:
> java.lang.StackOverflowError
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:57)
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:52)
>  at 
> scala.collection.TraversableLike$class.builder$1(TraversableLike.scala:229)
>  at scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
>  at scala.collection.immutable.List.map(List.scala:296)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:333)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26224) Results in stackOverFlowError when trying to add 3000 new columns using withColumn function of dataframe.

2018-12-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716987#comment-16716987
 ] 

ASF GitHub Bot commented on SPARK-26224:


SparkQA removed a comment on issue #23285: [SPARK-26224][SQL] Avoid creating 
many project on subsequent calls to withColumn
URL: https://github.com/apache/spark/pull/23285#issuecomment-446171042
 
 
   **[Test build #99969 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99969/testReport)**
 for PR 23285 at commit 
[`da2c82e`](https://github.com/apache/spark/commit/da2c82ebef5fb2a7e9279d8ce3d9364d2da61ae4).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Results in stackOverFlowError when trying to add 3000 new columns using 
> withColumn function of dataframe.
> -
>
> Key: SPARK-26224
> URL: https://issues.apache.org/jira/browse/SPARK-26224
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
> Environment: On macbook, used Intellij editor. Ran the above sample 
> code as unit test.
>Reporter: Dorjee Tsering
>Priority: Minor
>
> Reproduction step:
> Run this sample code on your laptop. I am trying to add 3000 new columns to a 
> base dataframe with 1 column.
>  
>  
> {code:java}
> import spark.implicits._
> val newColumnsToBeAdded : Seq[StructField] = for (i <- 1 to 3000) yield new 
> StructField("field_" + i, DataTypes.LongType)
> val baseDataFrame: DataFrame = Seq((1)).toDF("employee_id")
> val result = newColumnsToBeAdded.foldLeft(baseDataFrame)((df, newColumn) => 
> df.withColumn(newColumn.name, lit(0)))
> result.show(false)
>  
> {code}
> Ends up with following stacktrace:
> java.lang.StackOverflowError
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:57)
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:52)
>  at 
> scala.collection.TraversableLike$class.builder$1(TraversableLike.scala:229)
>  at scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
>  at scala.collection.immutable.List.map(List.scala:296)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:333)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26224) Results in stackOverFlowError when trying to add 3000 new columns using withColumn function of dataframe.

2018-12-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716984#comment-16716984
 ] 

ASF GitHub Bot commented on SPARK-26224:


SparkQA commented on issue #23285: [SPARK-26224][SQL] Avoid creating many 
project on subsequent calls to withColumn
URL: https://github.com/apache/spark/pull/23285#issuecomment-446181860
 
 
   **[Test build #99969 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99969/testReport)**
 for PR 23285 at commit 
[`da2c82e`](https://github.com/apache/spark/commit/da2c82ebef5fb2a7e9279d8ce3d9364d2da61ae4).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Results in stackOverFlowError when trying to add 3000 new columns using 
> withColumn function of dataframe.
> -
>
> Key: SPARK-26224
> URL: https://issues.apache.org/jira/browse/SPARK-26224
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
> Environment: On macbook, used Intellij editor. Ran the above sample 
> code as unit test.
>Reporter: Dorjee Tsering
>Priority: Minor
>
> Reproduction step:
> Run this sample code on your laptop. I am trying to add 3000 new columns to a 
> base dataframe with 1 column.
>  
>  
> {code:java}
> import spark.implicits._
> val newColumnsToBeAdded : Seq[StructField] = for (i <- 1 to 3000) yield new 
> StructField("field_" + i, DataTypes.LongType)
> val baseDataFrame: DataFrame = Seq((1)).toDF("employee_id")
> val result = newColumnsToBeAdded.foldLeft(baseDataFrame)((df, newColumn) => 
> df.withColumn(newColumn.name, lit(0)))
> result.show(false)
>  
> {code}
> Ends up with following stacktrace:
> java.lang.StackOverflowError
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:57)
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:52)
>  at 
> scala.collection.TraversableLike$class.builder$1(TraversableLike.scala:229)
>  at scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
>  at scala.collection.immutable.List.map(List.scala:296)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:333)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26224) Results in stackOverFlowError when trying to add 3000 new columns using withColumn function of dataframe.

2018-12-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716985#comment-16716985
 ] 

ASF GitHub Bot commented on SPARK-26224:


AmplabJenkins commented on issue #23285: [SPARK-26224][SQL] Avoid creating many 
project on subsequent calls to withColumn
URL: https://github.com/apache/spark/pull/23285#issuecomment-446181889
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Results in stackOverFlowError when trying to add 3000 new columns using 
> withColumn function of dataframe.
> -
>
> Key: SPARK-26224
> URL: https://issues.apache.org/jira/browse/SPARK-26224
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
> Environment: On macbook, used Intellij editor. Ran the above sample 
> code as unit test.
>Reporter: Dorjee Tsering
>Priority: Minor
>
> Reproduction step:
> Run this sample code on your laptop. I am trying to add 3000 new columns to a 
> base dataframe with 1 column.
>  
>  
> {code:java}
> import spark.implicits._
> val newColumnsToBeAdded : Seq[StructField] = for (i <- 1 to 3000) yield new 
> StructField("field_" + i, DataTypes.LongType)
> val baseDataFrame: DataFrame = Seq((1)).toDF("employee_id")
> val result = newColumnsToBeAdded.foldLeft(baseDataFrame)((df, newColumn) => 
> df.withColumn(newColumn.name, lit(0)))
> result.show(false)
>  
> {code}
> Ends up with following stacktrace:
> java.lang.StackOverflowError
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:57)
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:52)
>  at 
> scala.collection.TraversableLike$class.builder$1(TraversableLike.scala:229)
>  at scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
>  at scala.collection.immutable.List.map(List.scala:296)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:333)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26224) Results in stackOverFlowError when trying to add 3000 new columns using withColumn function of dataframe.

2018-12-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716986#comment-16716986
 ] 

ASF GitHub Bot commented on SPARK-26224:


AmplabJenkins commented on issue #23285: [SPARK-26224][SQL] Avoid creating many 
project on subsequent calls to withColumn
URL: https://github.com/apache/spark/pull/23285#issuecomment-446181893
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99969/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Results in stackOverFlowError when trying to add 3000 new columns using 
> withColumn function of dataframe.
> -
>
> Key: SPARK-26224
> URL: https://issues.apache.org/jira/browse/SPARK-26224
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
> Environment: On macbook, used Intellij editor. Ran the above sample 
> code as unit test.
>Reporter: Dorjee Tsering
>Priority: Minor
>
> Reproduction step:
> Run this sample code on your laptop. I am trying to add 3000 new columns to a 
> base dataframe with 1 column.
>  
>  
> {code:java}
> import spark.implicits._
> val newColumnsToBeAdded : Seq[StructField] = for (i <- 1 to 3000) yield new 
> StructField("field_" + i, DataTypes.LongType)
> val baseDataFrame: DataFrame = Seq((1)).toDF("employee_id")
> val result = newColumnsToBeAdded.foldLeft(baseDataFrame)((df, newColumn) => 
> df.withColumn(newColumn.name, lit(0)))
> result.show(false)
>  
> {code}
> Ends up with following stacktrace:
> java.lang.StackOverflowError
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:57)
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:52)
>  at 
> scala.collection.TraversableLike$class.builder$1(TraversableLike.scala:229)
>  at scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
>  at scala.collection.immutable.List.map(List.scala:296)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:333)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26224) Results in stackOverFlowError when trying to add 3000 new columns using withColumn function of dataframe.

2018-12-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716902#comment-16716902
 ] 

ASF GitHub Bot commented on SPARK-26224:


mgaido91 commented on a change in pull request #23285: [SPARK-26224][SQL] Avoid 
creating many project on subsequent calls to withColumn
URL: https://github.com/apache/spark/pull/23285#discussion_r240577269
 
 

 ##
 File path: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
 ##
 @@ -2164,16 +2164,16 @@ class Dataset[T] private[sql](
   columnMap.find { case (colName, _) =>
 resolver(field.name, colName)
   } match {
-case Some((colName: String, col: Column)) => col.as(colName)
-case _ => Column(field)
+case Some((colName: String, col: Column)) => col.as(colName).named
+case _ => field
   }
 }
 
-val newColumns = columnMap.filter { case (colName, col) =>
+val newColumns = columnMap.filter { case (colName, _) =>
   !output.exists(f => resolver(f.name, colName))
-}.map { case (colName, col) => col.as(colName) }
+}.map { case (colName, col) => col.as(colName).named }
 
-select(replacedAndExistingColumns ++ newColumns : _*)
+CollapseProject(Project(replacedAndExistingColumns ++ newColumns, 
logicalPlan))
 
 Review comment:
   I don't think we can do that. Imagine the case when all the columns depend 
on the previously added one: if we would do that, we would end up with an 
invalid plan. Or am I missing something?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Results in stackOverFlowError when trying to add 3000 new columns using 
> withColumn function of dataframe.
> -
>
> Key: SPARK-26224
> URL: https://issues.apache.org/jira/browse/SPARK-26224
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
> Environment: On macbook, used Intellij editor. Ran the above sample 
> code as unit test.
>Reporter: Dorjee Tsering
>Priority: Minor
>
> Reproduction step:
> Run this sample code on your laptop. I am trying to add 3000 new columns to a 
> base dataframe with 1 column.
>  
>  
> {code:java}
> import spark.implicits._
> val newColumnsToBeAdded : Seq[StructField] = for (i <- 1 to 3000) yield new 
> StructField("field_" + i, DataTypes.LongType)
> val baseDataFrame: DataFrame = Seq((1)).toDF("employee_id")
> val result = newColumnsToBeAdded.foldLeft(baseDataFrame)((df, newColumn) => 
> df.withColumn(newColumn.name, lit(0)))
> result.show(false)
>  
> {code}
> Ends up with following stacktrace:
> java.lang.StackOverflowError
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:57)
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:52)
>  at 
> scala.collection.TraversableLike$class.builder$1(TraversableLike.scala:229)
>  at scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
>  at scala.collection.immutable.List.map(List.scala:296)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:333)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> 

[jira] [Commented] (SPARK-26224) Results in stackOverFlowError when trying to add 3000 new columns using withColumn function of dataframe.

2018-12-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716892#comment-16716892
 ] 

ASF GitHub Bot commented on SPARK-26224:


cloud-fan commented on a change in pull request #23285: [SPARK-26224][SQL] 
Avoid creating many project on subsequent calls to withColumn
URL: https://github.com/apache/spark/pull/23285#discussion_r240575095
 
 

 ##
 File path: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
 ##
 @@ -2164,16 +2164,16 @@ class Dataset[T] private[sql](
   columnMap.find { case (colName, _) =>
 resolver(field.name, colName)
   } match {
-case Some((colName: String, col: Column)) => col.as(colName)
-case _ => Column(field)
+case Some((colName: String, col: Column)) => col.as(colName).named
+case _ => field
   }
 }
 
-val newColumns = columnMap.filter { case (colName, col) =>
+val newColumns = columnMap.filter { case (colName, _) =>
   !output.exists(f => resolver(f.name, colName))
-}.map { case (colName, col) => col.as(colName) }
+}.map { case (colName, col) => col.as(colName).named }
 
-select(replacedAndExistingColumns ++ newColumns : _*)
+CollapseProject(Project(replacedAndExistingColumns ++ newColumns, 
logicalPlan))
 
 Review comment:
   Can we reduce the scope of this optimization? e.g. if the root node of this 
query is `Project`, update its project list to include `withColumns`, otherwise 
add a new Project.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Results in stackOverFlowError when trying to add 3000 new columns using 
> withColumn function of dataframe.
> -
>
> Key: SPARK-26224
> URL: https://issues.apache.org/jira/browse/SPARK-26224
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
> Environment: On macbook, used Intellij editor. Ran the above sample 
> code as unit test.
>Reporter: Dorjee Tsering
>Priority: Minor
>
> Reproduction step:
> Run this sample code on your laptop. I am trying to add 3000 new columns to a 
> base dataframe with 1 column.
>  
>  
> {code:java}
> import spark.implicits._
> val newColumnsToBeAdded : Seq[StructField] = for (i <- 1 to 3000) yield new 
> StructField("field_" + i, DataTypes.LongType)
> val baseDataFrame: DataFrame = Seq((1)).toDF("employee_id")
> val result = newColumnsToBeAdded.foldLeft(baseDataFrame)((df, newColumn) => 
> df.withColumn(newColumn.name, lit(0)))
> result.show(false)
>  
> {code}
> Ends up with following stacktrace:
> java.lang.StackOverflowError
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:57)
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:52)
>  at 
> scala.collection.TraversableLike$class.builder$1(TraversableLike.scala:229)
>  at scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
>  at scala.collection.immutable.List.map(List.scala:296)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:333)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> 

[jira] [Commented] (SPARK-26224) Results in stackOverFlowError when trying to add 3000 new columns using withColumn function of dataframe.

2018-12-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716875#comment-16716875
 ] 

ASF GitHub Bot commented on SPARK-26224:


mgaido91 commented on issue #23285: [SPARK-26224][SQL] Avoid creating many 
project on subsequent calls to withColumn
URL: https://github.com/apache/spark/pull/23285#issuecomment-446169638
 
 
   cc @cloud-fan @viirya 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Results in stackOverFlowError when trying to add 3000 new columns using 
> withColumn function of dataframe.
> -
>
> Key: SPARK-26224
> URL: https://issues.apache.org/jira/browse/SPARK-26224
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
> Environment: On macbook, used Intellij editor. Ran the above sample 
> code as unit test.
>Reporter: Dorjee Tsering
>Priority: Minor
>
> Reproduction step:
> Run this sample code on your laptop. I am trying to add 3000 new columns to a 
> base dataframe with 1 column.
>  
>  
> {code:java}
> import spark.implicits._
> val newColumnsToBeAdded : Seq[StructField] = for (i <- 1 to 3000) yield new 
> StructField("field_" + i, DataTypes.LongType)
> val baseDataFrame: DataFrame = Seq((1)).toDF("employee_id")
> val result = newColumnsToBeAdded.foldLeft(baseDataFrame)((df, newColumn) => 
> df.withColumn(newColumn.name, lit(0)))
> result.show(false)
>  
> {code}
> Ends up with following stacktrace:
> java.lang.StackOverflowError
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:57)
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:52)
>  at 
> scala.collection.TraversableLike$class.builder$1(TraversableLike.scala:229)
>  at scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
>  at scala.collection.immutable.List.map(List.scala:296)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:333)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26224) Results in stackOverFlowError when trying to add 3000 new columns using withColumn function of dataframe.

2018-12-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716882#comment-16716882
 ] 

ASF GitHub Bot commented on SPARK-26224:


AmplabJenkins removed a comment on issue #23285: [SPARK-26224][SQL] Avoid 
creating many project on subsequent calls to withColumn
URL: https://github.com/apache/spark/pull/23285#issuecomment-446170969
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Results in stackOverFlowError when trying to add 3000 new columns using 
> withColumn function of dataframe.
> -
>
> Key: SPARK-26224
> URL: https://issues.apache.org/jira/browse/SPARK-26224
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
> Environment: On macbook, used Intellij editor. Ran the above sample 
> code as unit test.
>Reporter: Dorjee Tsering
>Priority: Minor
>
> Reproduction step:
> Run this sample code on your laptop. I am trying to add 3000 new columns to a 
> base dataframe with 1 column.
>  
>  
> {code:java}
> import spark.implicits._
> val newColumnsToBeAdded : Seq[StructField] = for (i <- 1 to 3000) yield new 
> StructField("field_" + i, DataTypes.LongType)
> val baseDataFrame: DataFrame = Seq((1)).toDF("employee_id")
> val result = newColumnsToBeAdded.foldLeft(baseDataFrame)((df, newColumn) => 
> df.withColumn(newColumn.name, lit(0)))
> result.show(false)
>  
> {code}
> Ends up with following stacktrace:
> java.lang.StackOverflowError
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:57)
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:52)
>  at 
> scala.collection.TraversableLike$class.builder$1(TraversableLike.scala:229)
>  at scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
>  at scala.collection.immutable.List.map(List.scala:296)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:333)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26224) Results in stackOverFlowError when trying to add 3000 new columns using withColumn function of dataframe.

2018-12-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716883#comment-16716883
 ] 

ASF GitHub Bot commented on SPARK-26224:


AmplabJenkins removed a comment on issue #23285: [SPARK-26224][SQL] Avoid 
creating many project on subsequent calls to withColumn
URL: https://github.com/apache/spark/pull/23285#issuecomment-446170979
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5969/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Results in stackOverFlowError when trying to add 3000 new columns using 
> withColumn function of dataframe.
> -
>
> Key: SPARK-26224
> URL: https://issues.apache.org/jira/browse/SPARK-26224
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
> Environment: On macbook, used Intellij editor. Ran the above sample 
> code as unit test.
>Reporter: Dorjee Tsering
>Priority: Minor
>
> Reproduction step:
> Run this sample code on your laptop. I am trying to add 3000 new columns to a 
> base dataframe with 1 column.
>  
>  
> {code:java}
> import spark.implicits._
> val newColumnsToBeAdded : Seq[StructField] = for (i <- 1 to 3000) yield new 
> StructField("field_" + i, DataTypes.LongType)
> val baseDataFrame: DataFrame = Seq((1)).toDF("employee_id")
> val result = newColumnsToBeAdded.foldLeft(baseDataFrame)((df, newColumn) => 
> df.withColumn(newColumn.name, lit(0)))
> result.show(false)
>  
> {code}
> Ends up with following stacktrace:
> java.lang.StackOverflowError
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:57)
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:52)
>  at 
> scala.collection.TraversableLike$class.builder$1(TraversableLike.scala:229)
>  at scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
>  at scala.collection.immutable.List.map(List.scala:296)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:333)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26224) Results in stackOverFlowError when trying to add 3000 new columns using withColumn function of dataframe.

2018-12-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716878#comment-16716878
 ] 

ASF GitHub Bot commented on SPARK-26224:


AmplabJenkins commented on issue #23285: [SPARK-26224][SQL] Avoid creating many 
project on subsequent calls to withColumn
URL: https://github.com/apache/spark/pull/23285#issuecomment-446170969
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Results in stackOverFlowError when trying to add 3000 new columns using 
> withColumn function of dataframe.
> -
>
> Key: SPARK-26224
> URL: https://issues.apache.org/jira/browse/SPARK-26224
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
> Environment: On macbook, used Intellij editor. Ran the above sample 
> code as unit test.
>Reporter: Dorjee Tsering
>Priority: Minor
>
> Reproduction step:
> Run this sample code on your laptop. I am trying to add 3000 new columns to a 
> base dataframe with 1 column.
>  
>  
> {code:java}
> import spark.implicits._
> val newColumnsToBeAdded : Seq[StructField] = for (i <- 1 to 3000) yield new 
> StructField("field_" + i, DataTypes.LongType)
> val baseDataFrame: DataFrame = Seq((1)).toDF("employee_id")
> val result = newColumnsToBeAdded.foldLeft(baseDataFrame)((df, newColumn) => 
> df.withColumn(newColumn.name, lit(0)))
> result.show(false)
>  
> {code}
> Ends up with following stacktrace:
> java.lang.StackOverflowError
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:57)
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:52)
>  at 
> scala.collection.TraversableLike$class.builder$1(TraversableLike.scala:229)
>  at scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
>  at scala.collection.immutable.List.map(List.scala:296)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:333)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26224) Results in stackOverFlowError when trying to add 3000 new columns using withColumn function of dataframe.

2018-12-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716880#comment-16716880
 ] 

ASF GitHub Bot commented on SPARK-26224:


SparkQA commented on issue #23285: [SPARK-26224][SQL] Avoid creating many 
project on subsequent calls to withColumn
URL: https://github.com/apache/spark/pull/23285#issuecomment-446171042
 
 
   **[Test build #99969 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99969/testReport)**
 for PR 23285 at commit 
[`da2c82e`](https://github.com/apache/spark/commit/da2c82ebef5fb2a7e9279d8ce3d9364d2da61ae4).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Results in stackOverFlowError when trying to add 3000 new columns using 
> withColumn function of dataframe.
> -
>
> Key: SPARK-26224
> URL: https://issues.apache.org/jira/browse/SPARK-26224
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
> Environment: On macbook, used Intellij editor. Ran the above sample 
> code as unit test.
>Reporter: Dorjee Tsering
>Priority: Minor
>
> Reproduction step:
> Run this sample code on your laptop. I am trying to add 3000 new columns to a 
> base dataframe with 1 column.
>  
>  
> {code:java}
> import spark.implicits._
> val newColumnsToBeAdded : Seq[StructField] = for (i <- 1 to 3000) yield new 
> StructField("field_" + i, DataTypes.LongType)
> val baseDataFrame: DataFrame = Seq((1)).toDF("employee_id")
> val result = newColumnsToBeAdded.foldLeft(baseDataFrame)((df, newColumn) => 
> df.withColumn(newColumn.name, lit(0)))
> result.show(false)
>  
> {code}
> Ends up with following stacktrace:
> java.lang.StackOverflowError
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:57)
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:52)
>  at 
> scala.collection.TraversableLike$class.builder$1(TraversableLike.scala:229)
>  at scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
>  at scala.collection.immutable.List.map(List.scala:296)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:333)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26224) Results in stackOverFlowError when trying to add 3000 new columns using withColumn function of dataframe.

2018-12-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716879#comment-16716879
 ] 

ASF GitHub Bot commented on SPARK-26224:


AmplabJenkins commented on issue #23285: [SPARK-26224][SQL] Avoid creating many 
project on subsequent calls to withColumn
URL: https://github.com/apache/spark/pull/23285#issuecomment-446170979
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5969/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Results in stackOverFlowError when trying to add 3000 new columns using 
> withColumn function of dataframe.
> -
>
> Key: SPARK-26224
> URL: https://issues.apache.org/jira/browse/SPARK-26224
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
> Environment: On macbook, used Intellij editor. Ran the above sample 
> code as unit test.
>Reporter: Dorjee Tsering
>Priority: Minor
>
> Reproduction step:
> Run this sample code on your laptop. I am trying to add 3000 new columns to a 
> base dataframe with 1 column.
>  
>  
> {code:java}
> import spark.implicits._
> val newColumnsToBeAdded : Seq[StructField] = for (i <- 1 to 3000) yield new 
> StructField("field_" + i, DataTypes.LongType)
> val baseDataFrame: DataFrame = Seq((1)).toDF("employee_id")
> val result = newColumnsToBeAdded.foldLeft(baseDataFrame)((df, newColumn) => 
> df.withColumn(newColumn.name, lit(0)))
> result.show(false)
>  
> {code}
> Ends up with following stacktrace:
> java.lang.StackOverflowError
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:57)
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:52)
>  at 
> scala.collection.TraversableLike$class.builder$1(TraversableLike.scala:229)
>  at scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
>  at scala.collection.immutable.List.map(List.scala:296)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:333)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26224) Results in stackOverFlowError when trying to add 3000 new columns using withColumn function of dataframe.

2018-12-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716873#comment-16716873
 ] 

ASF GitHub Bot commented on SPARK-26224:


mgaido91 opened a new pull request #23285: [SPARK-26224][SQL] Avoid creating 
many project on subsequent calls to withColumn
URL: https://github.com/apache/spark/pull/23285
 
 
   ## What changes were proposed in this pull request?
   
   We have seen many cases when users make several subsequent calls to 
`withColumn` on a Dataset. This leads now to the generation of a lott of 
`Project` nodes on the top of the plan, with serious problem which can lead 
also to `StackOverflowException`s.
   
   The PR proposes to run `CollapseProject` on the result of `withColumn` so 
that subsequent call (if possible) generate anyway a plan with a single project 
node on the top.
   
   ## How was this patch tested?
   
   added UT


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Results in stackOverFlowError when trying to add 3000 new columns using 
> withColumn function of dataframe.
> -
>
> Key: SPARK-26224
> URL: https://issues.apache.org/jira/browse/SPARK-26224
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
> Environment: On macbook, used Intellij editor. Ran the above sample 
> code as unit test.
>Reporter: Dorjee Tsering
>Priority: Minor
>
> Reproduction step:
> Run this sample code on your laptop. I am trying to add 3000 new columns to a 
> base dataframe with 1 column.
>  
>  
> {code:java}
> import spark.implicits._
> val newColumnsToBeAdded : Seq[StructField] = for (i <- 1 to 3000) yield new 
> StructField("field_" + i, DataTypes.LongType)
> val baseDataFrame: DataFrame = Seq((1)).toDF("employee_id")
> val result = newColumnsToBeAdded.foldLeft(baseDataFrame)((df, newColumn) => 
> df.withColumn(newColumn.name, lit(0)))
> result.show(false)
>  
> {code}
> Ends up with following stacktrace:
> java.lang.StackOverflowError
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:57)
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:52)
>  at 
> scala.collection.TraversableLike$class.builder$1(TraversableLike.scala:229)
>  at scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
>  at scala.collection.immutable.List.map(List.scala:296)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:333)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26224) Results in stackOverFlowError when trying to add 3000 new columns using withColumn function of dataframe.

2018-12-10 Thread Wenchen Fan (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715905#comment-16715905
 ] 

Wenchen Fan commented on SPARK-26224:
-

except `withColumn`, do we have other patterns that create a deep query plan. 
We did optimize union in `df.union`

> Results in stackOverFlowError when trying to add 3000 new columns using 
> withColumn function of dataframe.
> -
>
> Key: SPARK-26224
> URL: https://issues.apache.org/jira/browse/SPARK-26224
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
> Environment: On macbook, used Intellij editor. Ran the above sample 
> code as unit test.
>Reporter: Dorjee Tsering
>Priority: Minor
>
> Reproduction step:
> Run this sample code on your laptop. I am trying to add 3000 new columns to a 
> base dataframe with 1 column.
>  
>  
> {code:java}
> import spark.implicits._
> val newColumnsToBeAdded : Seq[StructField] = for (i <- 1 to 3000) yield new 
> StructField("field_" + i, DataTypes.LongType)
> val baseDataFrame: DataFrame = Seq((1)).toDF("employee_id")
> val result = newColumnsToBeAdded.foldLeft(baseDataFrame)((df, newColumn) => 
> df.withColumn(newColumn.name, lit(0)))
> result.show(false)
>  
> {code}
> Ends up with following stacktrace:
> java.lang.StackOverflowError
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:57)
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:52)
>  at 
> scala.collection.TraversableLike$class.builder$1(TraversableLike.scala:229)
>  at scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
>  at scala.collection.immutable.List.map(List.scala:296)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:333)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26224) Results in stackOverFlowError when trying to add 3000 new columns using withColumn function of dataframe.

2018-12-10 Thread Marco Gaido (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16714991#comment-16714991
 ] 

Marco Gaido commented on SPARK-26224:
-

[~viirya] that is true, but here it comes a question: do we want our DataFrame 
API to be an imperative or a declarative one? The user here is saying: I want a 
new column. It is Spark which is adding a new projection. One thing we can do, 
for instance, is to add a {{CollapseProject}} at the end of {{withColumn}}: in 
this way, the problem would be solved.

You can argue that we are running an optimization in a very early phase in that 
case, which is not a great thing, but I noticed also that this patter is often 
used (there have been JIRA for the inefficiency of this approach in the past 
IIRC, because it caused the whole plan to be re-analyzed every time). cc 
[~cloud_fan] [~smilegator] for their opinion on this.

> Results in stackOverFlowError when trying to add 3000 new columns using 
> withColumn function of dataframe.
> -
>
> Key: SPARK-26224
> URL: https://issues.apache.org/jira/browse/SPARK-26224
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
> Environment: On macbook, used Intellij editor. Ran the above sample 
> code as unit test.
>Reporter: Dorjee Tsering
>Priority: Minor
>
> Reproduction step:
> Run this sample code on your laptop. I am trying to add 3000 new columns to a 
> base dataframe with 1 column.
>  
>  
> {code:java}
> import spark.implicits._
> val newColumnsToBeAdded : Seq[StructField] = for (i <- 1 to 3000) yield new 
> StructField("field_" + i, DataTypes.LongType)
> val baseDataFrame: DataFrame = Seq((1)).toDF("employee_id")
> val result = newColumnsToBeAdded.foldLeft(baseDataFrame)((df, newColumn) => 
> df.withColumn(newColumn.name, lit(0)))
> result.show(false)
>  
> {code}
> Ends up with following stacktrace:
> java.lang.StackOverflowError
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:57)
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:52)
>  at 
> scala.collection.TraversableLike$class.builder$1(TraversableLike.scala:229)
>  at scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
>  at scala.collection.immutable.List.map(List.scala:296)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:333)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26224) Results in stackOverFlowError when trying to add 3000 new columns using withColumn function of dataframe.

2018-12-07 Thread Liang-Chi Hsieh (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16713494#comment-16713494
 ] 

Liang-Chi Hsieh commented on SPARK-26224:
-

I think it is not specified to withColumn. withColumn simply adds a projection 
on original dataframe.

I think It is because you create a very deep query plan. So the analyzer or 
optimizer encounters when traversing down the query plan.

Even it can traverse down such deep query plan, it might be not efficient to do 
that. I'd recommend not to create such deep query plan.

This should not be a bug.

> Results in stackOverFlowError when trying to add 3000 new columns using 
> withColumn function of dataframe.
> -
>
> Key: SPARK-26224
> URL: https://issues.apache.org/jira/browse/SPARK-26224
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.0
> Environment: On macbook, used Intellij editor. Ran the above sample 
> code as unit test.
>Reporter: Dorjee Tsering
>Priority: Minor
>
> Reproduction step:
> Run this sample code on your laptop. I am trying to add 3000 new columns to a 
> base dataframe with 1 column.
>  
>  
> {code:java}
> import spark.implicits._
> val newColumnsToBeAdded : Seq[StructField] = for (i <- 1 to 3000) yield new 
> StructField("field_" + i, DataTypes.LongType)
> val baseDataFrame: DataFrame = Seq((1)).toDF("employee_id")
> val result = newColumnsToBeAdded.foldLeft(baseDataFrame)((df, newColumn) => 
> df.withColumn(newColumn.name, lit(0)))
> result.show(false)
>  
> {code}
> Ends up with following stacktrace:
> java.lang.StackOverflowError
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:57)
>  at 
> scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:52)
>  at 
> scala.collection.TraversableLike$class.builder$1(TraversableLike.scala:229)
>  at scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
>  at scala.collection.immutable.List.map(List.scala:296)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:333)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org