Hello Folks!

I was looking into pyspark dataframe cogroup + applyInPandas apis.

As mentioned in the spark 
 the pandas udf to be applied by applyInPandas takes two pandas.DataFrames and 
returns one another pandas.DataFrame.

I was wondering whether there are ways to make the pandas udf accepting more 
than 2 pandas.DataFrames as arguments when doing cogroup + applyInPandas, hence 
put my question here to 

python - Is it possible to use cogroup + applyInPandas for more than 2 pyspark 
dataframes as input? - Stack 
As mentioned in the spark doc, the function to be applied by applyInPandas 
takes two pandas.DataFrames and returns one another pandas.DataFrame. Hence the 
following can be done: def function_with_two_args(pdf1, pdf2): result_pdf = <do 
this and that> return result_pdf 
function_with_two_args, schema="time int, id int, v1 double, v2 ...
Answered by the user D3V, there is a workaround for it. In the meanwhile both 
of us think it is nice to have the feature which can pass more than 2 pandas 
dataframe to the Pandas UDF which is passed to the cogroup.applyInPandas. What 
do you guys think about this? Can we create a jira ticket about this feature?


Reply via email to