[GitHub] [spark] BryanCutler commented on a change in pull request #26110: [SPARK-29126][PYSPARK][DOC] Pandas Cogroup udf usage guide

GitBox Wed, 23 Oct 2019 15:23:34 -0700

BryanCutler commented on a change in pull request #26110: 
[SPARK-29126][PYSPARK][DOC] Pandas Cogroup udf usage guide
URL: https://github.com/apache/spark/pull/26110#discussion_r338302225


 ##########
 File path: docs/sql-pyspark-pandas-with-arrow.md
 ##########
 @@ -178,6 +178,41 @@ For detailed usage, please see 
[`pyspark.sql.functions.pandas_udf`](api/python/p
 
[`pyspark.sql.DataFrame.mapsInPandas`](api/python/pyspark.sql.html#pyspark.sql.DataFrame.mapInPandas).
 
 
+### Cogrouped Map
+
+CoGrouped map Pandas UDFs allow two DataFrames to be cogrouped a by a common 
key and then a python function applied to
+each cogroup.  They are used with `groupBy().cogroup().apply()` which consists 
of the following steps:
+
+* Shuffle the data such that the groups of each dataframe which share a key 
are cogrouped together.
+* Apply a function to each cogroup.  The input of of the function is two 
`pandas.DataFrame` (with an optional Tuple
 
 Review comment:
   duplicate of in `input of of`

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] BryanCutler commented on a change in pull request #26110: [SPARK-29126][PYSPARK][DOC] Pandas Cogroup udf usage guide

Reply via email to