Jianfu Li created SPARK-49485: --------------------------------- Summary: When dynamic allocation enabled and remaining executors laying on a same host, specluative tasks will not be triggered Key: SPARK-49485 URL: https://issues.apache.org/jira/browse/SPARK-49485 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.5.2 Environment: just run this job on a yarn cluster with only 1 host.
set spark.dynamicAllocation.minExecutors = 3 {code:java} object SpecHangJob { def main(args: Array[String]): Unit = { val spark = SparkSession.builder.appName("test").enableHiveSupport().getOrCreate val sc = spark.sparkContext val conf = sc.getConf val taskNum = conf.getInt("spark.test.parallelism", 3) val seq = (0 until taskNum).toList sc.parallelize(seq, taskNum).map( i => { if (i == 0) { try { Thread.sleep(1000 * 60 * 30) } catch { case exception: Exception => println(exception.getMessage) } } else { try { Thread.sleep(1000 * 30) } catch { case e: Exception => println(e.getMessage) } } "haha" } ).collect() } } {code} you will find task 0 has triggerd speculative execution but speculative task not starts Reporter: Jianfu Li We find some cases that dynamic allocation enabled, sometimes tasks launched on remaining executors on a same slow host will execute for a long time, and it makes a spark job keep in running state for a long time.Specluative executes cannot fix the problem,cause if there is one slow task and three executors(set spark.dynamicAllocation.minExecutors = 3),maxNeed will be 2 and ExecutorAllocationManager will think the count of executors is sufficient and will not require new executors.Or a new executor allocated but lay on the same host.Both the case result to long-time-not-finish of a job.We should detect this case, exclude the old host and require for new executors -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org