Dan Wu created SPARK-56564:
------------------------------

             Summary: V2 DataSource write path throws SparkException instead of 
CommitDeniedException, causing spurious stage failures with speculation
                 Key: SPARK-56564
                 URL: https://issues.apache.org/jira/browse/SPARK-56564
             Project: Spark
          Issue Type: Bug
          Components: Structured Streaming, SQL
    Affects Versions: 3.5.3, 4.0.0
            Reporter: Dan Wu


h2. Problem

When speculation is enabled, the OutputCommitCoordinator denies commit for 
losing speculative task attempts. The V1 write path 
({{SparkHadoopMapRedUtil.commitTask()}}) correctly throws 
{{CommitDeniedException}}, which the executor converts to {{TaskCommitDenied}} 
with {{countTowardsTaskFailures=false}}. However, the V2 write path 
({{WritingSparkTask}} in {{WriteToDataSourceV2Exec.scala}}) calls 
{{QueryExecutionErrors.commitDeniedError()}} which returns a *plain 
SparkException*. The executor does not recognize this as a commit denial, so 
each denied speculative attempt is counted as a real task failure. After 
{{spark.task.maxFailures}} (default 4) denials, the stage is incorrectly 
aborted.

This affects *all V2 DataSource writes* with speculation enabled, including 
Structured Streaming jobs using {{ForeachWriter}}.

h2. V1 path (correct)

{code:scala}
// SparkHadoopMapRedUtil.scala:85
throw new CommitDeniedException(message, ctx.stageId(), splitId, 
ctx.attemptNumber())
// -> Executor catches CommitDeniedException (Executor.scala:777)
// -> Converts to TaskCommitDenied (countTowardsTaskFailures=false)
{code}

h2. V2 path (buggy)

{code:scala}
// WriteToDataSourceV2Exec.scala:590
throw QueryExecutionErrors.commitDeniedError(partId, taskId, attemptId, 
stageId, stageAttempt)
// -> QueryExecutionErrors returns plain SparkException
// -> Executor generic Throwable handler (countTowardsTaskFailures=true)
{code}

h2. Proposed Fix

Change {{WritingSparkTask.run()}} in {{WriteToDataSourceV2Exec.scala}} to throw 
{{CommitDeniedException}} directly, matching V1 behavior.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to