Re: [DISCUSS] Multiple columns adding/replacing support in PySpark DataFrame API

2021-05-04 Thread Паша
I've created my own implicit withColumnsRenamed for such a purpose which just accepted a map of string→string and called rename multiple times. вт, 4 мая 2021 г. в 10:22, Yikun Jiang : > @Saurabh @Mr.Powers Thanks for the input information. > > I personal perfer to introduce the `withColumns`

Re: [DISCUSS] Multiple columns adding/replacing support in PySpark DataFrame API

2021-05-04 Thread Yikun Jiang
@Saurabh @Mr.Powers Thanks for the input information. I personal perfer to introduce the `withColumns` because it bring more friendly development experience rather than select( * ). This is the PR to add `withColumns`: https://github.com/apache/spark/pull/32431 Regards, Yikun Saurabh Chawla

Re: [DISCUSS] Multiple columns adding/replacing support in PySpark DataFrame API

2021-04-30 Thread Matthew Powers
Thanks for starting this good discussion. You can add multiple columns with select to avoid calling withColumn multiple times: val newCols = Seq(col("*"), lit("val1").as("key1"), lit("val2").as("key2")) df.select(newCols: _*).show() withColumns would be a nice interface for less technical Spark

Re: [DISCUSS] Multiple columns adding/replacing support in PySpark DataFrame API

2021-04-29 Thread Saurabh Chawla
Hi All, I also had a scenario where at runtime, I needed to loop through a dataframe to use withColumn many times. For the safer side I used the reflection to access the withColumns to prevent any java.lang.StackOverflowError. val dataSetClass = Class.forName("org.apache.spark.sql.Dataset")

[DISCUSS] Multiple columns adding/replacing support in PySpark DataFrame API

2021-04-22 Thread Yikun Jiang
Hi, all *Background:* Currently, there is a withColumns [1] method to help users/devs add/replace multiple columns at once. But this method is private