Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/19630#discussion_r148805721 --- Diff: python/pyspark/sql/functions.py --- @@ -2279,7 +2174,36 @@ def pandas_udf(f=None, returnType=StringType()): .. note:: The user-defined function must be deterministic. """ - return _create_udf(f, returnType=returnType, pythonUdfType=PythonUdfType.PANDAS_UDF) + # decorator @pandas_udf(dataType(), functionType) + if f is None or isinstance(f, (str, DataType)): + # If DataType has been passed as a positional argument + # for decorator use it as a returnType + + return_type = f or returnType + + if return_type is None: + raise ValueError("Must specify return type.") + + if functionType is not None: + # @pandas_udf(dataType, functionType=functionType) + # @pandas_udf(returnType=dataType, functionType=functionType) + udf_type = functionType + elif returnType is not None and isinstance(returnType, int): --- End diff -- Yes, when using `pandas_udf` as a decorate, the args are actually shifted by one position, i.e, with: `@pandas_udf('double', SCALAR)` it's actually: `f='double'` and `returnType=SCALAR` The most complication of the branching statement is because `pandas_udf` serves as both a decorate and a function
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org