Re: Pandas UDF cogroup.applyInPandas with multiple dataframes

2023-02-22 Thread Santosh Pingale
I have opened two PRs: One that tries to maintain backwards compatibility: https://github.com/apache/spark/pull/39902 One that breaks the API to make it cleaner: https://github.com/apache/spark/pull/40122

Re: Pandas UDF cogroup.applyInPandas with multiple dataframes

2023-02-07 Thread Li Jin
I am not a Spark committer and haven't been working on Spark for a while. However, I was heavily involved in the original cogroup work and we are using cogroup functionality pretty heavily and I want to give my two cents here. I think this is a nice improvement and I hope someone from the PySpark

Re: Pandas UDF cogroup.applyInPandas with multiple dataframes

2023-02-06 Thread Santosh Pingale
Created a PR: https://github.com/apache/spark/pull/39902 > On 24 Jan 2023, at 15:04, Santosh Pingale wrote: > > Hey all > > I have an interesting problem in hand. We have cases where we want to pass > multiple(20 to 30) data frames to

Pandas UDF cogroup.applyInPandas with multiple dataframes

2023-01-24 Thread Santosh Pingale
Hey all I have an interesting problem in hand. We have cases where we want to pass multiple(20 to 30) data frames to cogroup.applyInPandas function. RDD currently supports cogroup with upto 4 dataframes (ZippedPartitionsRDD4) where as cogroup with pandas can handle only 2 dataframes (with