Hi,

I am using pyspark Grouped Map pandas UDF (
https://spark.apache.org/docs/latest/sql-pyspark-pandas-with-arrow.html).
Functionality wise it works great. However, serDe causes a lot of perf
hits. To optimize this UDF, can I do either below:

1. use a java UDF to completely replace the python Grouped Map pandas UDF.
2. The Python Grouped Map pandas UDF calls a java function internally.

Which way is more promising and how? Thanks for any pointers.

Thanks
Lian

Reply via email to