Hey, I'm seeing extreme slowness in withColumn when it's used in a loop. I'm running this code:
for (int i = 0; i < NUM_ITERATIONS ++i) {     df = df.withColumn("col"+i, new Column(new Literal(i, DataTypes.IntegerType))); } where df is initially a trivial dataframe. Here are the results of running with different values of NUM_ITERATIONS: iterations time 25 3s 50 11s 75 31s 100 76s 125 159s 150 283s When I update the DataFrame by manually copying/appending to the column array and using DataFrame.select, it runs in about half the time, but this is still untenable at any significant number of iterations. Any insight? -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/DataFrame-withColumn-very-slow-when-used-iteratively-tp12562.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org