Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21887#discussion_r205644653 --- Diff: docs/sql-programming-guide.md --- @@ -1804,6 +1804,25 @@ The following example shows how to use `groupby().apply()` to subtract the mean For detailed usage, please see [`pyspark.sql.functions.pandas_udf`](api/python/pyspark.sql.html#pyspark.sql.functions.pandas_udf) and [`pyspark.sql.GroupedData.apply`](api/python/pyspark.sql.html#pyspark.sql.GroupedData.apply). +### Grouped Aggregate + +Grouped aggregate Pandas UDFs are similar to Spark aggregate functions. Grouped aggregate Pandas UDFs are used with groupBy and +window operations. It defines an aggregation from one or more `pandas.Series` +to a scalar value, where the `pandas.Series` represents values for a column within the same group or window. + +Note that this type of UDF doesn't not support partial aggregation and all data for a group or window will be loaded into memory. Also, +only unbounded window are supported with Grouped aggregate Pandas UDfs currently. --- End diff -- `UDfs` -> `UDFs`
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org