[ 
https://issues.apache.org/jira/browse/SPARK-37752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17465972#comment-17465972
 ] 

Ohad Raviv commented on SPARK-37752:
------------------------------------

That is what I deduced. thanks for the answer!

> Python UDF fails when it should not get evaluated
> -------------------------------------------------
>
>                 Key: SPARK-37752
>                 URL: https://issues.apache.org/jira/browse/SPARK-37752
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.3.4
>            Reporter: Ohad Raviv
>            Priority: Minor
>
> Haven't checked on newer versions yet.
> If i define in Python:
> {code:java}
> def udf1(col1):
>     print(col1[2])
>     return "blah"
> spark.udf.register("udf1", udf1) {code}
> and then use it in SQL:
> {code:java}
> select case when length(c)>2 then udf1(c) end
> from (
>     select explode(array("123","234","12")) as c
> ) {code}
> it fails on:
> {noformat}
>   File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/worker.py", line 253, 
> in main
>     process()
>   File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/worker.py", line 248, 
> in process
>     serializer.dump_stream(func(split_index, iterator), outfile)
>   File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/worker.py", line 155, 
> in <lambda>
>     func = lambda _, it: map(mapper, it)
>   File "<string>", line 1, in <lambda>
>   File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/worker.py", line 76, in 
> <lambda>
>     return lambda *a: f(*a)
>   File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/util.py", line 55, in 
> wrapper
>     return f(*args, **kwargs)
>   File "<stdin>", line 3, in udf1
> IndexError: string index out of range{noformat}
> Although in the out-of-range row it should not get evaluated at all as the 
> case-when filters for lengths of more than 2 letters.
> the same scenario works great when we define instead a Scala UDF.
> will check now if it happens also for newer versions.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to