Dan Wu created SPARK-56564:
------------------------------
Summary: V2 DataSource write path throws SparkException instead of
CommitDeniedException, causing spurious stage failures with speculation
Key: SPARK-56564
URL: https://issues.apache.org/jira/browse/SPARK-56564
Project: Spark
Issue Type: Bug
Components: Structured Streaming, SQL
Affects Versions: 3.5.3, 4.0.0
Reporter: Dan Wu
h2. Problem
When speculation is enabled, the OutputCommitCoordinator denies commit for
losing speculative task attempts. The V1 write path
({{SparkHadoopMapRedUtil.commitTask()}}) correctly throws
{{CommitDeniedException}}, which the executor converts to {{TaskCommitDenied}}
with {{countTowardsTaskFailures=false}}. However, the V2 write path
({{WritingSparkTask}} in {{WriteToDataSourceV2Exec.scala}}) calls
{{QueryExecutionErrors.commitDeniedError()}} which returns a *plain
SparkException*. The executor does not recognize this as a commit denial, so
each denied speculative attempt is counted as a real task failure. After
{{spark.task.maxFailures}} (default 4) denials, the stage is incorrectly
aborted.
This affects *all V2 DataSource writes* with speculation enabled, including
Structured Streaming jobs using {{ForeachWriter}}.
h2. V1 path (correct)
{code:scala}
// SparkHadoopMapRedUtil.scala:85
throw new CommitDeniedException(message, ctx.stageId(), splitId,
ctx.attemptNumber())
// -> Executor catches CommitDeniedException (Executor.scala:777)
// -> Converts to TaskCommitDenied (countTowardsTaskFailures=false)
{code}
h2. V2 path (buggy)
{code:scala}
// WriteToDataSourceV2Exec.scala:590
throw QueryExecutionErrors.commitDeniedError(partId, taskId, attemptId,
stageId, stageAttempt)
// -> QueryExecutionErrors returns plain SparkException
// -> Executor generic Throwable handler (countTowardsTaskFailures=true)
{code}
h2. Proposed Fix
Change {{WritingSparkTask.run()}} in {{WriteToDataSourceV2Exec.scala}} to throw
{{CommitDeniedException}} directly, matching V1 behavior.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]