Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19630#discussion_r148807539 --- Diff: python/pyspark/sql/group.py --- @@ -214,11 +214,11 @@ def apply(self, udf): :param udf: A function object returned by :meth:`pyspark.sql.functions.pandas_udf` - >>> from pyspark.sql.functions import pandas_udf + >>> from pyspark.sql.functions import pandas_udf, PandasUdfType >>> df = spark.createDataFrame( ... [(1, 1.0), (1, 2.0), (2, 3.0), (2, 5.0), (2, 10.0)], ... ("id", "v")) - >>> @pandas_udf(returnType=df.schema) + >>> @pandas_udf(returnType=df.schema, functionType=PandasUdfType.GROUP_FLATMAP) --- End diff -- I think `GROUP_MAP` is better here, think about `RDD.mapPartitions`, we pass a function that takes an `Iterator`(group) and returns another `Iterator`(group). `GROUP_TRANSFORM` is also fine.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org