[ https://issues.apache.org/jira/browse/SPARK-27463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830535#comment-16830535 ]
Bryan Cutler commented on SPARK-27463: -------------------------------------- I left some comments on the doc. Overall, I think it sounds like a useful addition and won't require a huge amount of changes. Since there are some different ways we could go with the PySpark APIs, just make sure each choice is well described with examples. Thanks! > SPIP: Support Dataframe Cogroup via Pandas UDFs > ------------------------------------------------ > > Key: SPARK-27463 > URL: https://issues.apache.org/jira/browse/SPARK-27463 > Project: Spark > Issue Type: Improvement > Components: PySpark, SQL > Affects Versions: 3.0.0 > Reporter: Chris Martin > Priority: Major > Labels: SPIP > > Recent work on Pandas UDFs in Spark, has allowed for improved > interoperability between Pandas and Spark. This proposal aims to extend this > by introducing a new Pandas UDF type which would allow for a cogroup > operation to be applied to two PySpark DataFrames. > Full details are in the google document linked below. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org