[ https://issues.apache.org/jira/browse/SPARK-22347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16578709#comment-16578709 ]
Liang-Chi Hsieh commented on SPARK-22347: ----------------------------------------- Agreed. Thanks [~rdblue] > UDF is evaluated when 'F.when' condition is false > ------------------------------------------------- > > Key: SPARK-22347 > URL: https://issues.apache.org/jira/browse/SPARK-22347 > Project: Spark > Issue Type: Documentation > Components: PySpark > Affects Versions: 2.2.0 > Reporter: Nicolas Porter > Assignee: Liang-Chi Hsieh > Priority: Minor > > Here's a simple example on how to reproduce this: > {code} > from pyspark.sql import functions as F, Row, types > def Divide10(): > def fn(value): return 10 / int(value) > return F.udf(fn, types.IntegerType()) > df = sc.parallelize([Row(x=5), Row(x=0)]).toDF() > x = F.col('x') > df2 = df.select(F.when((x > 0), Divide10()(x))) > df2.show(200) > {code} > This raises a division by zero error, even if `F.when` is trying to filter > out all cases where `x <= 0`. I believe the correct behavior should be not to > evaluate the UDF when the `F.when` condition is false. > Interestingly enough, when the `F.when` condition is set to `F.lit(False)`, > then the error is not raised and all rows resolve to `null`, which is the > expected result. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org