[ 
https://issues.apache.org/jira/browse/SPARK-38485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17523363#comment-17523363
 ] 

Tanel Kiis commented on SPARK-38485:
------------------------------------

Is there then even any point in having non-deterministic methods in spark? Some 
optimizations are disabled for them do avoid similar situations.

> Non-deterministic UDF executed multiple times when combined with withField
> --------------------------------------------------------------------------
>
>                 Key: SPARK-38485
>                 URL: https://issues.apache.org/jira/browse/SPARK-38485
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.3.0
>            Reporter: Tanel Kiis
>            Priority: Major
>              Labels: Correctness
>
> When adding fields to a result of a non-deterministic UDF, that returns a 
> struct, then that UDF is executed multiple times (once per field) for each 
> row.
> In this UT df1 passes, but df2 fails with something like:
> "279751724 did not equal -1023188908"
> {code}
>   test("SPARK-XXXXX: non-deterministic UDF should be called once when adding 
> fields") {
>     val nondeterministicUDF = udf((s: Int) => {
>       val r = Random.nextInt()
>       // Both values should be the same
>       GroupByKey(r, r)
>     }).asNondeterministic()
>     val df1 = spark.range(5).select(nondeterministicUDF($"id"))
>     df1.collect().foreach {
>       row => assert(row.getStruct(0).getInt(0) == row.getStruct(0).getInt(1))
>     }
>     val df2 = 
> spark.range(5).select(nondeterministicUDF($"id").withField("new", lit(7)))
>     df2.collect().foreach {
>       row => assert(row.getStruct(0).getInt(0) == row.getStruct(0).getInt(1))
>     }
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to