[ https://issues.apache.org/jira/browse/SPARK-23888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Imran Rashid updated SPARK-23888: --------------------------------- Labels: speculation (was: ) > speculative task should not run on a given host where another attempt is > already running on > ------------------------------------------------------------------------------------------- > > Key: SPARK-23888 > URL: https://issues.apache.org/jira/browse/SPARK-23888 > Project: Spark > Issue Type: Bug > Components: Scheduler, Spark Core > Affects Versions: 2.3.0 > Reporter: wuyi > Priority: Major > Labels: speculation > Fix For: 2.3.0 > > > There's a bug in: > {code:java} > /** Check whether a task is currently running an attempt on a given host */ > private def hasAttemptOnHost(taskIndex: Int, host: String): Boolean = { > taskAttempts(taskIndex).exists(_.host == host) > } > {code} > This will ignore hosts which have finished attempts, so we should check > whether the attempt is currently running on the given host. > And it is possible for a speculative task to run on a host where another > attempt failed here before. > Assume we have only two machines: host1, host2. We first run task0.0 on > host1. Then, due to a long time waiting for task0.0, we launch a speculative > task0.1 on host2. And, task0.1 finally failed on host1, but it can not re-run > since there's already a copy running on host2. After another long time, we > launch a new speculative task0.2. And, now, we can run task0.2 on host1 > again, since there's no more running attempt on host1. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org