Github user e-dorigatti commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21383#discussion_r191472884
  
    --- Diff: python/pyspark/sql/udf.py ---
    @@ -157,7 +157,17 @@ def _create_judf(self):
             spark = SparkSession.builder.getOrCreate()
             sc = spark.sparkContext
     
    -        wrapped_func = _wrap_function(sc, self.func, self.returnType)
    +        func = fail_on_stopiteration(self.func)
    +
    +        # for pandas UDFs the worker needs to know if the function takes
    +        # one or two arguments, but the signature is lost when wrapping 
with
    +        # fail_on_stopiteration, so we store it here
    +        if self.evalType in (PythonEvalType.SQL_SCALAR_PANDAS_UDF,
    +                             PythonEvalType.SQL_GROUPED_MAP_PANDAS_UDF,
    +                             PythonEvalType.SQL_GROUPED_AGG_PANDAS_UDF):
    +            func._argspec = _get_argspec(self.func)
    --- End diff --
    
    I am not sure how to do that, though, can you suggest a way? Originally 
this hack was in `fail_one_stopiteration`, but then it was decided to restrict 
its scope as much as possible


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to