wuyi created SPARK-41848: ---------------------------- Summary: Tasks are over-scheduled with TaskResourceProfile Key: SPARK-41848 URL: https://issues.apache.org/jira/browse/SPARK-41848 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.4.0 Reporter: wuyi
{code:java} test("SPARK-XXX") { val conf = new SparkConf().setAppName("test").setMaster("local-cluster[1,4,1024]") sc = new SparkContext(conf) val req = new TaskResourceRequests().cpus(3) val rp = new ResourceProfileBuilder().require(req).build() val res = sc.parallelize(Seq(0, 1), 2).withResources(rp).map { x => Thread.sleep(5000) x * 2 }.collect() assert(res === Array(0, 2)) } {code} In this test, tasks are supposed to be scheduled in order since each task requires 3 cores but the executor only has 4 cores. However, we noticed 2 tasks are launched concurrently from the logs. It turns out that we used the TaskResourceProfile (taskCpus=3) of the taskset for task scheduling: {code:java} val rpId = taskSet.taskSet.resourceProfileId val taskSetProf = sc.resourceProfileManager.resourceProfileFromId(rpId) val taskCpus = ResourceProfile.getTaskCpusOrDefaultForProfile(taskSetProf, conf) {code} but the ResourceProfile (taskCpus=1) of the executor for updating the free cores in ExecutorData: {code:java} val rpId = executorData.resourceProfileId val prof = scheduler.sc.resourceProfileManager.resourceProfileFromId(rpId) val taskCpus = ResourceProfile.getTaskCpusOrDefaultForProfile(prof, conf) executorData.freeCores -= taskCpus {code} which results in the inconsistency of the available cores. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org