[ 
https://issues.apache.org/jira/browse/SPARK-10796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-10796:
------------------------------
    Component/s: Scheduler

> The Stage taskSets may are all removed while stage still have pending 
> partitions after having lost some executors
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-10796
>                 URL: https://issues.apache.org/jira/browse/SPARK-10796
>             Project: Spark
>          Issue Type: Bug
>          Components: Scheduler
>    Affects Versions: 1.3.0
>            Reporter: SuYan
>            Priority: Minor
>
> We meet that problem in Spark 1.3.0, and I also check the lastest Spark code. 
> and I think that problem still exist.
> 1. while a stage occurs fetchFailed, then will new resubmit the running 
> stage, and mark previous stage as zombie.
> 2. if there have a executor lost, the zombie taskset may lost the results of 
> already successful tasks. In Current code, it will resubmit, but it useless 
> because it is zombie, will not be scheduler again.
> so if the active taskset and zombie taskset all finished the task in 
> `runningtasks`, Spark will think they are finished.  but the running Stage 
> still have pending partitions. so it will be hang....because no logical to 
> re-run this pending partitions.
> Driver logical is complicated, it will be helpful if any one will check that 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to