BryanCutler commented on a change in pull request #26110: [SPARK-29126][PYSPARK][DOC] Pandas Cogroup udf usage guide URL: https://github.com/apache/spark/pull/26110#discussion_r338302225
########## File path: docs/sql-pyspark-pandas-with-arrow.md ########## @@ -178,6 +178,41 @@ For detailed usage, please see [`pyspark.sql.functions.pandas_udf`](api/python/p [`pyspark.sql.DataFrame.mapsInPandas`](api/python/pyspark.sql.html#pyspark.sql.DataFrame.mapInPandas). +### Cogrouped Map + +CoGrouped map Pandas UDFs allow two DataFrames to be cogrouped a by a common key and then a python function applied to +each cogroup. They are used with `groupBy().cogroup().apply()` which consists of the following steps: + +* Shuffle the data such that the groups of each dataframe which share a key are cogrouped together. +* Apply a function to each cogroup. The input of of the function is two `pandas.DataFrame` (with an optional Tuple Review comment: duplicate of in `input of of` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org