[ 
https://issues.apache.org/jira/browse/SPARK-36858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17421935#comment-17421935
 ] 

Hyukjin Kwon commented on SPARK-36858:
--------------------------------------

Can't we simply do this in a for loop?

> Spark API to apply same function to multiple columns
> ----------------------------------------------------
>
>                 Key: SPARK-36858
>                 URL: https://issues.apache.org/jira/browse/SPARK-36858
>             Project: Spark
>          Issue Type: New Feature
>          Components: Spark Core
>    Affects Versions: 2.4.8, 3.1.2
>            Reporter: Armand BERGES
>            Priority: Minor
>
> Hi
> My team and I have regularly need to apply the same function to multiple 
> columns at once.
> For example, we want to remove all non alphanumerical characters to each 
> columns of our dataframes. 
> When we hit this use case first, some people in my team were using this kind 
> of code : 
> {code:java}
> val colListToClean = .... ## Generate some list, could be very long.
> val dfToClean: DataFrame = ... ## This is the dataframe we want to clean
> def cleanFunction(colName: String): Column = ... ## Write some function to 
> manipulate column based on its name.
> val dfCleaned = colListToClean.foldLeft(dfToClean)((df, colName) => 
> df.withColumn(colName, cleanFunction(colName)){code}
> This kind of code when applied on a large set of columns overloaded our 
> driver (because a Dataframe is generated for each column to clean).
> Based on this issue, we developed some code to add two functions : 
>  * One to apply the same function to multiple columns
>  * One to rename multiple columns based on a Map. 
>  
> I wonder if your ever ask your team to add such kind of API ? If you did, had 
> you any kind of issue regarding the implementation ? If you didn't, is this 
> any idea you could add to Spark ? 
> Best regards, 
>  
> LvffY
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to