[
https://issues.apache.org/jira/browse/SPARK-53538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hyukjin Kwon reassigned SPARK-53538:
------------------------------------
Assignee: Bruce Robbins
> Update with nondeterministic assigments can fail when whole-stage codegen is
> off
> --------------------------------------------------------------------------------
>
> Key: SPARK-53538
> URL: https://issues.apache.org/jira/browse/SPARK-53538
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 4.0.0
> Reporter: Bruce Robbins
> Assignee: Bruce Robbins
> Priority: Major
> Labels: pull-request-available
>
> This test will fail if the "split-updates" property is set to "true":
> {noformat}
> test("update with nondeterministic assignments and no wholestage codegen") {
> val extraColCount = SQLConf.get.wholeStageMaxNumFields - 4
> val schema = "pk INT NOT NULL, id INT, value DOUBLE, dep STRING, " +
> ((1 to extraColCount).map(i => s"col$i INT").mkString(", "))
> val data = (1 to 3).map { i =>
> s"""{ "pk": $i, "id": $i, "value": 2.0, "dep": "hr", """ +
> ((1 to extraColCount).map(j => s""""col$j": $i""").mkString(", ")) +
> "}"
> }.mkString("\n")
> createAndInitTable(schema, data)
> // rand() always generates values in [0, 1) range
> sql(s"UPDATE $tableNameAsString SET value = rand() WHERE id <= 2")
> checkAnswer(
> sql(s"SELECT count(*) FROM $tableNameAsString WHERE value < 2.0"),
> Row(2) :: Nil)
> }
> {noformat}
> The error is:
> {noformat}
> [info] org.apache.spark.SparkException: Job aborted due to stage failure:
> Task 0 in stage 11.0 failed 1 times, most recent failure: Lost task 0.0 in
> stage 11.0 (TID 11) (10.0.0.101 executor driver):
> java.lang.NullPointerException: Cannot invoke "java.util.Random.nextDouble()"
> because "<parameter1>.mutableStateArray_0[0]" is null
> [info] at
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.writeFields_0_0$(Unknown
> Source)
> [info] at
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
> Source)
> [info] at
> org.apache.spark.sql.execution.ExpandExec$$anon$1.next(ExpandExec.scala:75)
> ...
> {noformat}
> {{RewriteUpdateTable}} will create an {{Expand}} operator with a set of
> projections, one of which will contain a nondeterministic expression.
> {{ExpandExec}} fails to initialize the derived {{UnsafeProjections}} before
> using them, resulting in the above error.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]