[jira] [Assigned] (SPARK-53538) Update with nondeterministic assigments can fail when whole-stage codegen is off

Hyukjin Kwon (Jira) Tue, 09 Sep 2025 17:05:07 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-53538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Hyukjin Kwon reassigned SPARK-53538:
------------------------------------

    Assignee: Bruce Robbins

> Update with nondeterministic assigments can fail when whole-stage codegen is 
> off
> --------------------------------------------------------------------------------
>
>                 Key: SPARK-53538
>                 URL: https://issues.apache.org/jira/browse/SPARK-53538
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 4.0.0
>            Reporter: Bruce Robbins
>            Assignee: Bruce Robbins
>            Priority: Major
>              Labels: pull-request-available
>
> This test will fail if the "split-updates" property is set to "true":
> {noformat}
>   test("update with nondeterministic assignments and no wholestage codegen") {
>     val extraColCount = SQLConf.get.wholeStageMaxNumFields - 4
>     val schema = "pk INT NOT NULL, id INT, value DOUBLE, dep STRING, " +
>       ((1 to extraColCount).map(i => s"col$i INT").mkString(", "))
>     val data = (1 to 3).map { i =>
>       s"""{ "pk": $i, "id": $i, "value": 2.0, "dep": "hr", """ +
>         ((1 to extraColCount).map(j => s""""col$j": $i""").mkString(", ")) +
>       "}"
>     }.mkString("\n")
>     createAndInitTable(schema, data)
>     // rand() always generates values in [0, 1) range
>     sql(s"UPDATE $tableNameAsString SET value = rand() WHERE id <= 2")
>     checkAnswer(
>       sql(s"SELECT count(*) FROM $tableNameAsString WHERE value < 2.0"),
>       Row(2) :: Nil)
>   }
> {noformat}
> The error is:
> {noformat}
> [info]   org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 0 in stage 11.0 failed 1 times, most recent failure: Lost task 0.0 in 
> stage 11.0 (TID 11) (10.0.0.101 executor driver):
> java.lang.NullPointerException: Cannot invoke "java.util.Random.nextDouble()" 
> because "<parameter1>.mutableStateArray_0[0]" is null
> [info]        at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.writeFields_0_0$(Unknown
>  Source)
> [info]        at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
>  Source)
> [info]        at 
> org.apache.spark.sql.execution.ExpandExec$$anon$1.next(ExpandExec.scala:75)
> ...
> {noformat}
> {{RewriteUpdateTable}} will create an {{Expand}} operator with a set of 
> projections, one of which will contain a nondeterministic expression. 
> {{ExpandExec}} fails to initialize the  derived {{UnsafeProjections}} before 
> using them, resulting in the above error.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Assigned] (SPARK-53538) Update with nondeterministic assigments can fail when whole-stage codegen is off

Reply via email to