[ 
https://issues.apache.org/jira/browse/SPARK-40455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17605295#comment-17605295
 ] 

Apache Spark commented on SPARK-40455:
--------------------------------------

User 'caican00' has created a pull request for this issue:
https://github.com/apache/spark/pull/37899

> Abort result stage directly when it failed caused by FetchFailed
> ----------------------------------------------------------------
>
>                 Key: SPARK-40455
>                 URL: https://issues.apache.org/jira/browse/SPARK-40455
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 3.0.0, 3.1.2, 3.2.1, 3.3.0
>            Reporter: caican
>            Assignee: Apache Spark
>            Priority: Major
>
> Here's a very serious bug:
> When result stage failed caused by FetchFailedException,  the previous 
> condition to determine whether result stage retries are allowed is 
> {color:#ff0000}numMissingPartitions < resultStage.numTasks{color}. 
>  
> If this condition holds on retry, but the other tasks at the current result 
> stage are not killed, when result stage was resubmit, it would got wrong 
> partitions to recalculation.
> {code:java}
> // DAGScheduler#submitMissingTasks
>  
> // Figure out the indexes of partition ids to compute.
> val partitionsToCompute: Seq[Int] = stage.findMissingPartitions() {code}
> It is possible that the number of partitions to be recalculated is smaller 
> than the actual number of partitions at result stage



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to