HyukjinKwon commented on a change in pull request #23900: [SPARK-23836][PYTHON] Add support for StructType return in Scalar Pandas UDF URL: https://github.com/apache/spark/pull/23900#discussion_r268047078
########## File path: python/pyspark/sql/functions.py ########## @@ -2868,6 +2869,15 @@ def pandas_udf(f=None, returnType=None, functionType=None): +----------+--------------+------------+ | 8| JOHN DOE| 22| +----------+--------------+------------+ + >>> @pandas_udf("first string, last string") # doctest: +SKIP Review comment: Yes, something has to be done. I was at the very least tried to document the casting combinations. **Pandas UDF matrix:** https://github.com/apache/spark/blob/b67d36957287c1fbefa1996e6a4a009a75c4c3f8/python/pyspark/sql/functions.py#L3098-L3131 The problem is, this matrix is different from regular PySpark UDF, and also our `TypeCoercions`: **Regular PySpark UDF matrix:** https://github.com/apache/spark/blob/b67d36957287c1fbefa1996e6a4a009a75c4c3f8/python/pyspark/sql/functions.py#L2830-L2860 I lost the last discussion about whether we should allow such type coercions or not. But basically my guts say: If we allow, I think it will need a huge bunch of codes to maintain again (Arrow Type <> Pandas type <> Python type <> SparkSQL type), but if we disallow, it will break many existing apps. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org