Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/18732 @rxin, `transform` takes a function: pd.Series -> pd.Series and apply the function on all columns: ``` df.show() id v1 v2 v3 a 1.0 4.0 0.0 a 2.0 5.0 1.0 a 3.0 6.0 1.0 df.groupby('id').transform(pandas_udf(lambda v: v - v.mean(), DoubleType())).show() id v1 v2 v3 a -1.0 -1.0 -0.666667 a 0.0 0.0 0.333333 a 1.0 1.0 0.333333 ``` This is mimicking `pd.DataFrame.groupby().transform` `apply` takes a function: pd.DataFrame -> pd.DataFrame and is similar to `flatMapGroups` The name `apply` is originated from the R paper "The Split-Apply-Combine Strategy for Data Analysis" and is used in both pandas and R to describe this function, so the name `apply` should be pretty straight forward to pandas/python user.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org