[
https://issues.apache.org/jira/browse/SPARK-10796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sean Owen updated SPARK-10796:
------------------------------
Priority: Minor (was: Major)
Can you clarify with a simple example? I'm not clear on the situation you are
describing..
> The Stage taskSets may are all removed while stage still have pending
> partitions after having lost some executors
> -----------------------------------------------------------------------------------------------------------------
>
> Key: SPARK-10796
> URL: https://issues.apache.org/jira/browse/SPARK-10796
> Project: Spark
> Issue Type: Bug
> Components: Scheduler
> Affects Versions: 1.3.0
> Reporter: SuYan
> Priority: Minor
>
> We meet that problem in Spark 1.3.0, and I also check the lastest Spark code.
> and I think that problem still exist.
> 1. while a stage occurs fetchFailed, then will new resubmit the running
> stage, and mark previous stage as zombie.
> 2. if there have a executor lost, the zombie taskset may lost the results of
> already successful tasks. In Current code, it will resubmit, but it useless
> because it is zombie, will not be scheduler again.
> so if the active taskset and zombie taskset all finished the task in
> `runningtasks`, Spark will think they are finished. but the running Stage
> still have pending partitions. so it will be hang....because no logical to
> re-run this pending partitions.
> Driver logical is complicated, it will be helpful if any one will check that
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]