[ 
https://issues.apache.org/jira/browse/SPARK-40455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

caican updated SPARK-40455:
---------------------------
    Description: 
Here's a very serious bug:

When result stage failed caused by FetchFailedException,  the previous 
condition to determine whether result stage retries are allowed is 
{color:#FF0000}numMissingPartitions < resultStage.numTasks{color}. 

 

If this condition holds on retry, but the other tasks in the current result 
stage are not killed, when result stage was resubmit, it would got wrong 
partitions to recalculation.
{code:java}
// DAGScheduler#submitMissingTasks
 
// Figure out the indexes of partition ids to compute.
val partitionsToCompute: Seq[Int] = stage.findMissingPartitions() {code}
 

  was:
Here's a very serious bug:

When result stage failed caused by `FetchFailedException`,  the previous 
condition to determine whether result stage retries are allowed is 
`numMissingPartitions < resultStage.numTasks`. 

 

If this condition holds on retry, but the other tasks in the current result 
stage are not killed, when result stage was resubmit, it would got wrong 
partitions to recalculation

```

 

```
{code:java}
// DAGScheduler#submitMissingTasks
 
// Figure out the indexes of partition ids to compute.
val partitionsToCompute: Seq[Int] = stage.findMissingPartitions() {code}
 


> Abort result stage directly when it failed caused by FetchFailed
> ----------------------------------------------------------------
>
>                 Key: SPARK-40455
>                 URL: https://issues.apache.org/jira/browse/SPARK-40455
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 3.0.0, 3.1.2, 3.2.1, 3.3.0
>            Reporter: caican
>            Priority: Major
>
> Here's a very serious bug:
> When result stage failed caused by FetchFailedException,  the previous 
> condition to determine whether result stage retries are allowed is 
> {color:#FF0000}numMissingPartitions < resultStage.numTasks{color}. 
>  
> If this condition holds on retry, but the other tasks in the current result 
> stage are not killed, when result stage was resubmit, it would got wrong 
> partitions to recalculation.
> {code:java}
> // DAGScheduler#submitMissingTasks
>  
> // Figure out the indexes of partition ids to compute.
> val partitionsToCompute: Seq[Int] = stage.findMissingPartitions() {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to