HyukjinKwon commented on a change in pull request #23900: [SPARK-23836][PYTHON] 
Add support for StructType return in Scalar Pandas UDF
URL: https://github.com/apache/spark/pull/23900#discussion_r268047078
 
 

 ##########
 File path: python/pyspark/sql/functions.py
 ##########
 @@ -2868,6 +2869,15 @@ def pandas_udf(f=None, returnType=None, 
functionType=None):
        +----------+--------------+------------+
        |         8|      JOHN DOE|          22|
        +----------+--------------+------------+
+       >>> @pandas_udf("first string, last string")  # doctest: +SKIP
 
 Review comment:
   Yes, something has to be done. I was at the very least tried to document the 
casting 
   combinations.
   
   **Pandas UDF matrix:**
   
   
https://github.com/apache/spark/blob/b67d36957287c1fbefa1996e6a4a009a75c4c3f8/python/pyspark/sql/functions.py#L3098-L3131
   
   The problem is, this matrix is different from regular PySpark UDF, and also 
our `TypeCoercions`:
   
   **Regular PySpark UDF matrix:**
   
   
https://github.com/apache/spark/blob/b67d36957287c1fbefa1996e6a4a009a75c4c3f8/python/pyspark/sql/functions.py#L2830-L2860
   
   I lost the last discussion about whether we should allow such type coercions 
or not. But basically my guts say: If we allow, I think it will need a huge 
bunch of codes to maintain again (Arrow Type <> Pandas type <> Python type <> 
SparkSQL type), but if we disallow, it will break many existing apps.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to