Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r143263694 --- Diff: python/pyspark/sql/group.py --- @@ -192,7 +193,69 @@ def pivot(self, pivot_col, values=None): jgd = self._jgd.pivot(pivot_col) else: jgd = self._jgd.pivot(pivot_col, values) - return GroupedData(jgd, self.sql_ctx) + return GroupedData(jgd, self._df) + + @since(2.3) + def apply(self, udf): --- End diff -- @rxin just to recap our discussion regarding naming: You asked: > What's the difference between this one and the transform function you also proposed? I'm trying to see if all the naming makes sense when considered together. Answer is: `transform` takes a function: pd.Series -> pd.Series and apply the function on each column (or subset of columns). The input and output Series are of the same length. `apply` takes a function: pd.DataFrame -> pd.DataFrame and apply the function on the group. Similar to `flatMapGroups` Does this make sense to you?
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org