Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19147#discussion_r138831553 --- Diff: python/pyspark/sql/functions.py --- @@ -2111,6 +2126,53 @@ def wrapper(*args): return wrapper +def _udf(f, returnType, vectorized): + udf_obj = UserDefinedFunction(f, returnType, vectorized=vectorized) + return udf_obj._wrapped() + + +if _have_pandas and _have_arrow: + + @since(2.3) + def pandas_udf(f=None, returnType=StringType()): + """ + Creates a :class:`Column` expression representing a vectorized user defined function (UDF). + + .. note:: The vectorized user-defined functions must be deterministic. Due to optimization, + duplicate invocations may be eliminated or the function may even be invoked more times + than it is present in the query. --- End diff -- Should we explain more about what the vectorized UDF is and its expected input parameters and outputs?
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org