Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19787#discussion_r152198789 --- Diff: python/pyspark/sql/functions.py --- @@ -2198,12 +2198,9 @@ def udf(f=None, returnType=StringType()): duplicate invocations may be eliminated or the function may even be invoked more times than it is present in the query. - .. note:: The user-defined functions do not support conditional execution by using them with - SQL conditional expressions such as `when` or `if`. The functions still apply on all rows no - matter the conditions are met or not. So the output is correct if the functions can be - correctly run on all rows without failure. If the functions can cause runtime failure on the - rows that do not satisfy the conditions, the suggested workaround is to incorporate the - condition logic into the functions. + .. note:: The user-defined functions do not support conditional expressions or short curcuiting + in boolean expressions and it ends up with being executed all internally. If the functions + can fail on special rows, the workaround is to incorporate the condition into the functions. --- End diff -- Maybe it is also worth adding a note to pandas_udf.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org