[ https://issues.apache.org/jira/browse/SPARK-36858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17421935#comment-17421935 ]
Hyukjin Kwon commented on SPARK-36858: -------------------------------------- Can't we simply do this in a for loop? > Spark API to apply same function to multiple columns > ---------------------------------------------------- > > Key: SPARK-36858 > URL: https://issues.apache.org/jira/browse/SPARK-36858 > Project: Spark > Issue Type: New Feature > Components: Spark Core > Affects Versions: 2.4.8, 3.1.2 > Reporter: Armand BERGES > Priority: Minor > > Hi > My team and I have regularly need to apply the same function to multiple > columns at once. > For example, we want to remove all non alphanumerical characters to each > columns of our dataframes. > When we hit this use case first, some people in my team were using this kind > of code : > {code:java} > val colListToClean = .... ## Generate some list, could be very long. > val dfToClean: DataFrame = ... ## This is the dataframe we want to clean > def cleanFunction(colName: String): Column = ... ## Write some function to > manipulate column based on its name. > val dfCleaned = colListToClean.foldLeft(dfToClean)((df, colName) => > df.withColumn(colName, cleanFunction(colName)){code} > This kind of code when applied on a large set of columns overloaded our > driver (because a Dataframe is generated for each column to clean). > Based on this issue, we developed some code to add two functions : > * One to apply the same function to multiple columns > * One to rename multiple columns based on a Map. > > I wonder if your ever ask your team to add such kind of API ? If you did, had > you any kind of issue regarding the implementation ? If you didn't, is this > any idea you could add to Spark ? > Best regards, > > LvffY > > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org