[ https://issues.apache.org/jira/browse/SPARK-37752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17465972#comment-17465972 ]
Ohad Raviv commented on SPARK-37752: ------------------------------------ That is what I deduced. thanks for the answer! > Python UDF fails when it should not get evaluated > ------------------------------------------------- > > Key: SPARK-37752 > URL: https://issues.apache.org/jira/browse/SPARK-37752 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.3.4 > Reporter: Ohad Raviv > Priority: Minor > > Haven't checked on newer versions yet. > If i define in Python: > {code:java} > def udf1(col1): > print(col1[2]) > return "blah" > spark.udf.register("udf1", udf1) {code} > and then use it in SQL: > {code:java} > select case when length(c)>2 then udf1(c) end > from ( > select explode(array("123","234","12")) as c > ) {code} > it fails on: > {noformat} > File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/worker.py", line 253, > in main > process() > File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/worker.py", line 248, > in process > serializer.dump_stream(func(split_index, iterator), outfile) > File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/worker.py", line 155, > in <lambda> > func = lambda _, it: map(mapper, it) > File "<string>", line 1, in <lambda> > File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/worker.py", line 76, in > <lambda> > return lambda *a: f(*a) > File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/util.py", line 55, in > wrapper > return f(*args, **kwargs) > File "<stdin>", line 3, in udf1 > IndexError: string index out of range{noformat} > Although in the out-of-range row it should not get evaluated at all as the > case-when filters for lengths of more than 2 letters. > the same scenario works great when we define instead a Scala UDF. > will check now if it happens also for newer versions. -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org