[jira] [Commented] (SPARK-34707) Code-gen broadcast nested loop join (left outer/right outer)
[ https://issues.apache.org/jira/browse/SPARK-34707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17306519#comment-17306519 ] Zebing Lin commented on SPARK-34707: Created PR https://github.com/apache/spark/pull/31931 > Code-gen broadcast nested loop join (left outer/right outer) > > > Key: SPARK-34707 > URL: https://issues.apache.org/jira/browse/SPARK-34707 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Cheng Su >Priority: Minor > > We saw 1x run-time improvement for code-gen broadcast nested loop inner join > (https://issues.apache.org/jira/browse/SPARK-34620 ). Similarly let's add > code-gen for left outer (build right side), and right outer (build left side) > as well here. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30511) Spark marks intentionally killed speculative tasks as pending leads to holding idle executors
[ https://issues.apache.org/jira/browse/SPARK-30511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zebing Lin updated SPARK-30511: --- Description: *TL;DR* When speculative tasks fail/get killed, they are still considered as pending and count towards the calculation of number of needed executors. h3. Symptom In one of our production job (where it's running 4 tasks per executor), we found that it was holding 6 executors at the end with only 2 tasks running (1 speculative). With more logging enabled, we found the job printed: {code:java} pendingTasks is 0 pendingSpeculativeTasks is 17 totalRunningTasks is 2 {code} while the job only had 1 speculative task running and 16 speculative tasks intentionally killed because of corresponding original tasks had finished. An easy repro of the issue (`--conf spark.speculation=true --conf spark.executor.cores=4 --conf spark.dynamicAllocation.maxExecutors=1000` in cluster mode): {code:java} val n = 4000 val someRDD = sc.parallelize(1 to n, n) someRDD.mapPartitionsWithIndex( (index: Int, it: Iterator[Int]) => { if (index < 300 && index >= 150) { Thread.sleep(index * 1000) // Fake running tasks } else if (index == 300) { Thread.sleep(1000 * 1000) // Fake long running tasks } it.toList.map(x => index + ", " + x).iterator }).collect {code} You will see when running the last task, we would be hold 38 executors (see attachment), which is exactly (152 + 3) / 4 = 38. h3. The Bug Upon examining the code of _pendingSpeculativeTasks_: {code:java} stageAttemptToNumSpeculativeTasks.map { case (stageAttempt, numTasks) => numTasks - stageAttemptToSpeculativeTaskIndices.get(stageAttempt).map(_.size).getOrElse(0) }.sum {code} where _stageAttemptToNumSpeculativeTasks(stageAttempt)_ is incremented on _onSpeculativeTaskSubmitted_, but never decremented. _stageAttemptToNumSpeculativeTasks -= stageAttempt_ is performed on stage completion. *This means Spark is marking ended speculative tasks as pending, which leads to Spark to hold more executors that it actually needs!* I will have a PR ready to fix this issue, along with SPARK-28403 too was: *TL;DR* When speculative tasks finished/failed/got killed, they are still considered as pending and count towards the calculation of number of needed executors. h3. Symptom In one of our production job (where it's running 4 tasks per executor), we found that it was holding 6 executors at the end with only 2 tasks running (1 speculative). With more logging enabled, we found the job printed: {code:java} pendingTasks is 0 pendingSpeculativeTasks is 17 totalRunningTasks is 2 {code} while the job only had 1 speculative task running and 16 speculative tasks intentionally killed because of corresponding original tasks had finished. An easy repro of the issue (`--conf spark.speculation=true --conf spark.executor.cores=4 --conf spark.dynamicAllocation.maxExecutors=1000` in cluster mode): {code:java} val n = 4000 val someRDD = sc.parallelize(1 to n, n) someRDD.mapPartitionsWithIndex( (index: Int, it: Iterator[Int]) => { if (index < 300 && index >= 150) { Thread.sleep(index * 1000) // Fake running tasks } else if (index == 300) { Thread.sleep(1000 * 1000) // Fake long running tasks } it.toList.map(x => index + ", " + x).iterator }).collect {code} You will see when running the last task, we would be hold 38 executors (see attachment), which is exactly (152 + 3) / 4 = 38. h3. The Bug Upon examining the code of _pendingSpeculativeTasks_: {code:java} stageAttemptToNumSpeculativeTasks.map { case (stageAttempt, numTasks) => numTasks - stageAttemptToSpeculativeTaskIndices.get(stageAttempt).map(_.size).getOrElse(0) }.sum {code} where _stageAttemptToNumSpeculativeTasks(stageAttempt)_ is incremented on _onSpeculativeTaskSubmitted_, but never decremented. _stageAttemptToNumSpeculativeTasks -= stageAttempt_ is performed on stage completion. *This means Spark is marking ended speculative tasks as pending, which leads to Spark to hold more executors that it actually needs!* I will have a PR ready to fix this issue, along with SPARK-28403 too > Spark marks intentionally killed speculative tasks as pending leads to > holding idle executors > - > > Key: SPARK-30511 > URL: https://issues.apache.org/jira/browse/SPARK-30511 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 2.3.0 >Reporter: Zebing Lin >Priority: Major > Attachments: Screen Shot 2020-01-15 at 11.13.17.png > > > *TL;DR* > When speculative tasks fail/get killed, they are still considered as pending > and count towards the calculation of number of needed executors. > h3. Symptom > In one of our production job (where it's running 4 tasks per executor), we > foun
[jira] [Updated] (SPARK-30511) Spark marks intentionally killed speculative tasks as pending leads to holding idle executors
[ https://issues.apache.org/jira/browse/SPARK-30511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zebing Lin updated SPARK-30511: --- Summary: Spark marks intentionally killed speculative tasks as pending leads to holding idle executors (was: Spark marks ended speculative tasks as pending leads to holding idle executors) > Spark marks intentionally killed speculative tasks as pending leads to > holding idle executors > - > > Key: SPARK-30511 > URL: https://issues.apache.org/jira/browse/SPARK-30511 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 2.3.0 >Reporter: Zebing Lin >Priority: Major > Attachments: Screen Shot 2020-01-15 at 11.13.17.png > > > *TL;DR* > When speculative tasks finished/failed/got killed, they are still considered > as pending and count towards the calculation of number of needed executors. > h3. Symptom > In one of our production job (where it's running 4 tasks per executor), we > found that it was holding 6 executors at the end with only 2 tasks running (1 > speculative). With more logging enabled, we found the job printed: > {code:java} > pendingTasks is 0 pendingSpeculativeTasks is 17 totalRunningTasks is 2 > {code} > while the job only had 1 speculative task running and 16 speculative tasks > intentionally killed because of corresponding original tasks had finished. > An easy repro of the issue (`--conf spark.speculation=true --conf > spark.executor.cores=4 --conf spark.dynamicAllocation.maxExecutors=1000` in > cluster mode): > {code:java} > val n = 4000 > val someRDD = sc.parallelize(1 to n, n) > someRDD.mapPartitionsWithIndex( (index: Int, it: Iterator[Int]) => { > if (index < 300 && index >= 150) { > Thread.sleep(index * 1000) // Fake running tasks > } else if (index == 300) { > Thread.sleep(1000 * 1000) // Fake long running tasks > } > it.toList.map(x => index + ", " + x).iterator > }).collect > {code} > You will see when running the last task, we would be hold 38 executors (see > attachment), which is exactly (152 + 3) / 4 = 38. > h3. The Bug > Upon examining the code of _pendingSpeculativeTasks_: > {code:java} > stageAttemptToNumSpeculativeTasks.map { case (stageAttempt, numTasks) => > numTasks - > stageAttemptToSpeculativeTaskIndices.get(stageAttempt).map(_.size).getOrElse(0) > }.sum > {code} > where _stageAttemptToNumSpeculativeTasks(stageAttempt)_ is incremented on > _onSpeculativeTaskSubmitted_, but never decremented. > _stageAttemptToNumSpeculativeTasks -= stageAttempt_ is performed on stage > completion. *This means Spark is marking ended speculative tasks as pending, > which leads to Spark to hold more executors that it actually needs!* > I will have a PR ready to fix this issue, along with SPARK-28403 too > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-28403) Executor Allocation Manager can add an extra executor when speculative tasks
[ https://issues.apache.org/jira/browse/SPARK-28403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17016316#comment-17016316 ] Zebing Lin edited comment on SPARK-28403 at 1/15/20 8:53 PM: - In our production, this just caused a fluctuation of requested executors: {code:java} Total executors: Running = 6, Needed = 6, Requested = 6 Lowering target number of executors to 5 (previously 6) because not all requested executors are actually needed Total executors: Running = 6, Needed = 5, Requested = 6 Lowering target number of executors to 5 (previously 6) because not all requested executors are actually needed Total executors: Running = 6, Needed = 5, Requested = 6 {code} I think this logic can be deleted. was (Author: zebingl): In our production, this just caused a fluctuation of requested executors: {code:java} Total executors: Running = 6, Needed = 6, Requested = 6 Lowering target number of executors to 5 (previously 6) because not all requested executors are actually needed Total executors: Running = 6, Needed = 5, Requested = 6 Total executors: Running = 6, Needed = 6, Requested = 6 Lowering target number of executors to 5 (previously 6) because not all requested executors are actually needed Total executors: Running = 6, Needed = 5, Requested = 6 {code} I think this logic can be deleted. > Executor Allocation Manager can add an extra executor when speculative tasks > > > Key: SPARK-28403 > URL: https://issues.apache.org/jira/browse/SPARK-28403 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.0 >Reporter: Thomas Graves >Priority: Major > > It looks like SPARK-19326 added a bug in the execuctor allocation maanger > where it adds an extra executor when it shouldn't when we have pending > speculative tasks but the target number didn't change. > [https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala#L377] > It doesn't look like this is necessary since it already added in the > pendingSpeculative tasks. > See the questioning of this on the PR at: > https://github.com/apache/spark/pull/18492/files#diff-b096353602813e47074ace09a3890d56R379 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28403) Executor Allocation Manager can add an extra executor when speculative tasks
[ https://issues.apache.org/jira/browse/SPARK-28403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17016316#comment-17016316 ] Zebing Lin commented on SPARK-28403: In our production, this just caused a fluctuation of requested executors: {code:java} Total executors: Running = 6, Needed = 6, Requested = 6 Lowering target number of executors to 5 (previously 6) because not all requested executors are actually needed Total executors: Running = 6, Needed = 5, Requested = 6 Total executors: Running = 6, Needed = 6, Requested = 6 Lowering target number of executors to 5 (previously 6) because not all requested executors are actually needed Total executors: Running = 6, Needed = 5, Requested = 6 {code} I think this logic can be deleted. > Executor Allocation Manager can add an extra executor when speculative tasks > > > Key: SPARK-28403 > URL: https://issues.apache.org/jira/browse/SPARK-28403 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.0 >Reporter: Thomas Graves >Priority: Major > > It looks like SPARK-19326 added a bug in the execuctor allocation maanger > where it adds an extra executor when it shouldn't when we have pending > speculative tasks but the target number didn't change. > [https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala#L377] > It doesn't look like this is necessary since it already added in the > pendingSpeculative tasks. > See the questioning of this on the PR at: > https://github.com/apache/spark/pull/18492/files#diff-b096353602813e47074ace09a3890d56R379 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30511) Spark marks ended speculative tasks as pending leads to holding idle executors
[ https://issues.apache.org/jira/browse/SPARK-30511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zebing Lin updated SPARK-30511: --- Description: *TL;DR* When speculative tasks finished/failed/got killed, they are still considered as pending and count towards the calculation of number of needed executors. h3. Symptom In one of our production job (where it's running 4 tasks per executor), we found that it was holding 6 executors at the end with only 2 tasks running (1 speculative). With more logging enabled, we found the job printed: {code:java} pendingTasks is 0 pendingSpeculativeTasks is 17 totalRunningTasks is 2 {code} while the job only had 1 speculative task running and 16 speculative tasks intentionally killed because of corresponding original tasks had finished. An easy repro of the issue (`--conf spark.speculation=true --conf spark.executor.cores=4 --conf spark.dynamicAllocation.maxExecutors=1000` in cluster mode): {code:java} val n = 4000 val someRDD = sc.parallelize(1 to n, n) someRDD.mapPartitionsWithIndex( (index: Int, it: Iterator[Int]) => { if (index < 300 && index >= 150) { Thread.sleep(index * 1000) // Fake running tasks } else if (index == 300) { Thread.sleep(1000 * 1000) // Fake long running tasks } it.toList.map(x => index + ", " + x).iterator }).collect {code} You will see when running the last task, we would be hold 38 executors (see attachment), which is exactly (152 + 3) / 4 = 38. h3. The Bug Upon examining the code of _pendingSpeculativeTasks_: {code:java} stageAttemptToNumSpeculativeTasks.map { case (stageAttempt, numTasks) => numTasks - stageAttemptToSpeculativeTaskIndices.get(stageAttempt).map(_.size).getOrElse(0) }.sum {code} where _stageAttemptToNumSpeculativeTasks(stageAttempt)_ is incremented on _onSpeculativeTaskSubmitted_, but never decremented. _stageAttemptToNumSpeculativeTasks -= stageAttempt_ is performed on stage completion. *This means Spark is marking ended speculative tasks as pending, which leads to Spark to hold more executors that it actually needs!* I will have a PR ready to fix this issue, along with SPARK-28403 too was: *TL;DR* When speculative tasks finished/failed/got killed, they are still considered as pending and count towards the calculation of number of needed executors. h3. Symptom In one of our production job (where it's running 4 tasks per executor), we found that it was holding 6 executors at the end with only 2 tasks running (1 speculative). With more logging enabled, we found the job printed: {code:java} pendingTasks is 0 pendingSpeculativeTasks is 17 totalRunningTasks is 2 {code} while the job only had 1 speculative task running and 16 speculative tasks intentionally killed because of corresponding original tasks had finished. An easy repro of the issue (`--conf spark.speculation=true --conf spark.executor.cores=4 --conf spark.dynamicAllocation.maxExecutors=1000` in cluster mode): {code:java} val n = 4000 val someRDD = sc.parallelize(1 to n, n) someRDD.mapPartitionsWithIndex( (index: Int, it: Iterator[Int]) => { if (index < 300 && index >= 150) { Thread.sleep(index * 1000) // Fake running tasks } else if (index == 300) { Thread.sleep(1000 * 1000) // Fake long running tasks } it.toList.map(x => index + ", " + x).iterator }).collect {code} You will see when running the last task, we would be hold 38 executors (see attachment), which is exactly (152 + 3) / 4 = 38. h3. The Bug Upon examining the code of _pendingSpeculativeTasks_: {code:java} stageAttemptToNumSpeculativeTasks.map { case (stageAttempt, numTasks) => numTasks - stageAttemptToSpeculativeTaskIndices.get(stageAttempt).map(_.size).getOrElse(0) }.sum {code} where _stageAttemptToNumSpeculativeTasks(stageAttempt)_ is incremented on _onSpeculativeTaskSubmitted_, but never decremented. _stageAttemptToNumSpeculativeTasks -= stageAttempt_ is performed on stage completion. *This means Spark is marking ended speculative tasks as pending, which leads to Spark to hold more executors that it actually needs!* I will have a PR ready to fix this issue, along with SPARK-2840 too > Spark marks ended speculative tasks as pending leads to holding idle executors > -- > > Key: SPARK-30511 > URL: https://issues.apache.org/jira/browse/SPARK-30511 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 2.3.0 >Reporter: Zebing Lin >Priority: Major > Attachments: Screen Shot 2020-01-15 at 11.13.17.png > > > *TL;DR* > When speculative tasks finished/failed/got killed, they are still considered > as pending and count towards the calculation of number of needed executors. > h3. Symptom > In one of our production job (where it's running 4 tasks per executor), we > found that it wa
[jira] [Updated] (SPARK-30511) Spark marks ended speculative tasks as pending leads to holding idle executors
[ https://issues.apache.org/jira/browse/SPARK-30511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zebing Lin updated SPARK-30511: --- Description: *TL;DR* When speculative tasks finished/failed/got killed, they are still considered as pending and count towards the calculation of number of needed executors. h3. Symptom In one of our production job (where it's running 4 tasks per executor), we found that it was holding 6 executors at the end with only 2 tasks running (1 speculative). With more logging enabled, we found the job printed: {code:java} pendingTasks is 0 pendingSpeculativeTasks is 17 totalRunningTasks is 2 {code} while the job only had 1 speculative task running and 16 speculative tasks intentionally killed because of corresponding original tasks had finished. An easy repro of the issue (`--conf spark.speculation=true --conf spark.executor.cores=4 --conf spark.dynamicAllocation.maxExecutors=1000` in cluster mode): {code:java} val n = 4000 val someRDD = sc.parallelize(1 to n, n) someRDD.mapPartitionsWithIndex( (index: Int, it: Iterator[Int]) => { if (index < 300 && index >= 150) { Thread.sleep(index * 1000) // Fake running tasks } else if (index == 300) { Thread.sleep(1000 * 1000) // Fake long running tasks } it.toList.map(x => index + ", " + x).iterator }).collect {code} You will see when running the last task, we would be hold 38 executors (see attachment), which is exactly (152 + 3) / 4 = 38. h3. The Bug Upon examining the code of _pendingSpeculativeTasks_: {code:java} stageAttemptToNumSpeculativeTasks.map { case (stageAttempt, numTasks) => numTasks - stageAttemptToSpeculativeTaskIndices.get(stageAttempt).map(_.size).getOrElse(0) }.sum {code} where _stageAttemptToNumSpeculativeTasks(stageAttempt)_ is incremented on _onSpeculativeTaskSubmitted_, but never decremented. _stageAttemptToNumSpeculativeTasks -= stageAttempt_ is performed on stage completion. *This means Spark is marking ended speculative tasks as pending, which leads to Spark to hold more executors that it actually needs!* I will have a PR ready to fix this issue, along with SPARK-2840 too was: *TL;DR* When speculative tasks finished/failed/got killed, they are still considered as pending and count towards the calculation of number of needed executors. h3. Symptom In one of our production job (where it's running 4 tasks per executor), we found that it was holding 6 executors at the end with only 2 tasks running (1 speculative). With more logging enabled, we found the job printed: {code:java} pendingTasks is 0 pendingSpeculativeTasks is 17 totalRunningTasks is 2 {code} while the job only had 1 speculative task running and 16 speculative tasks intentionally killed because of corresponding original tasks had finished. An easy repro of the issue (`--conf spark.speculation=true --conf spark.executor.cores=4 --conf spark.dynamicAllocation.maxExecutors=1000` in cluster mode): {code:java} val n = 4000 val someRDD = sc.parallelize(1 to n, n) someRDD.mapPartitionsWithIndex( (index: Int, it: Iterator[Int]) => { if (index < 300 && index >= 150) { Thread.sleep(index * 1000) // Fake running tasks } else if (index == 300) { Thread.sleep(1000 * 1000) // Fake long running tasks } it.toList.map(x => index + ", " + x).iterator }).collect {code} You will see when running the last task, we would be hold 39 executors (see attachment). h3. The Bug Upon examining the code of _pendingSpeculativeTasks_: {code:java} stageAttemptToNumSpeculativeTasks.map { case (stageAttempt, numTasks) => numTasks - stageAttemptToSpeculativeTaskIndices.get(stageAttempt).map(_.size).getOrElse(0) }.sum {code} where _stageAttemptToNumSpeculativeTasks(stageAttempt)_ is incremented on _onSpeculativeTaskSubmitted_, but never decremented. _stageAttemptToNumSpeculativeTasks -= stageAttempt_ is performed on stage completion. *This means Spark is marking ended speculative tasks as pending, which leads to Spark to hold more executors that it actually needs!* I will have a PR ready to fix this issue, along with SPARK-2840 too > Spark marks ended speculative tasks as pending leads to holding idle executors > -- > > Key: SPARK-30511 > URL: https://issues.apache.org/jira/browse/SPARK-30511 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 2.3.0 >Reporter: Zebing Lin >Priority: Major > Attachments: Screen Shot 2020-01-15 at 11.13.17.png > > > *TL;DR* > When speculative tasks finished/failed/got killed, they are still considered > as pending and count towards the calculation of number of needed executors. > h3. Symptom > In one of our production job (where it's running 4 tasks per executor), we > found that it was holding 6 executors at the end with
[jira] [Updated] (SPARK-30511) Spark marks ended speculative tasks as pending leads to holding idle executors
[ https://issues.apache.org/jira/browse/SPARK-30511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zebing Lin updated SPARK-30511: --- Attachment: Screen Shot 2020-01-15 at 11.13.17.png > Spark marks ended speculative tasks as pending leads to holding idle executors > -- > > Key: SPARK-30511 > URL: https://issues.apache.org/jira/browse/SPARK-30511 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 2.3.0 >Reporter: Zebing Lin >Priority: Major > Attachments: Screen Shot 2020-01-15 at 11.13.17.png > > > *TL;DR* > When speculative tasks finished/failed/got killed, they are still considered > as pending and count towards the calculation of number of needed executors. > h3. Symptom > In one of our production job (where it's running 4 tasks per executor), we > found that it was holding 6 executors at the end with only 2 tasks running (1 > speculative). With more logging enabled, we found the job printed: > {code:java} > pendingTasks is 0 pendingSpeculativeTasks is 17 totalRunningTasks is 2 > {code} > while the job only had 1 speculative task running and 16 speculative tasks > intentionally killed because of corresponding original tasks had finished. > An easy repro of the issue (`--conf spark.speculation=true --conf > spark.executor.cores=4 --conf spark.dynamicAllocation.maxExecutors=1000` in > cluster mode): > {code:java} > val n = 4000 > val someRDD = sc.parallelize(1 to n, n) > someRDD.mapPartitionsWithIndex( (index: Int, it: Iterator[Int]) => { > if (index < 300 && index >= 150) { > Thread.sleep(index * 1000) // Fake running tasks > } else if (index == 300) { > Thread.sleep(1000 * 1000) // Fake long running tasks > } > it.toList.map(x => index + ", " + x).iterator > }).collect > {code} > You will see when running the last task, we would be hold 39 executors (see > attachment). > h3. The Bug > Upon examining the code of _pendingSpeculativeTasks_: > {code:java} > stageAttemptToNumSpeculativeTasks.map { case (stageAttempt, numTasks) => > numTasks - > stageAttemptToSpeculativeTaskIndices.get(stageAttempt).map(_.size).getOrElse(0) > }.sum > {code} > where _stageAttemptToNumSpeculativeTasks(stageAttempt)_ is incremented on > _onSpeculativeTaskSubmitted_, but never decremented. > _stageAttemptToNumSpeculativeTasks -= stageAttempt_ is performed on stage > completion. *This means Spark is marking ended speculative tasks as pending, > which leads to Spark to hold more executors that it actually needs!* > I will have a PR ready to fix this issue, along with SPARK-2840 too > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30511) Spark marks ended speculative tasks as pending leads to holding idle executors
[ https://issues.apache.org/jira/browse/SPARK-30511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zebing Lin updated SPARK-30511: --- Description: *TL;DR* When speculative tasks finished/failed/got killed, they are still considered as pending and count towards the calculation of number of needed executors. h3. Symptom In one of our production job (where it's running 4 tasks per executor), we found that it was holding 6 executors at the end with only 2 tasks running (1 speculative). With more logging enabled, we found the job printed: {code:java} pendingTasks is 0 pendingSpeculativeTasks is 17 totalRunningTasks is 2 {code} while the job only had 1 speculative task running and 16 speculative tasks intentionally killed because of corresponding original tasks had finished. An easy repro of the issue (`--conf spark.speculation=true --conf spark.executor.cores=4 --conf spark.dynamicAllocation.maxExecutors=1000` in cluster mode): {code:java} val n = 4000 val someRDD = sc.parallelize(1 to n, n) someRDD.mapPartitionsWithIndex( (index: Int, it: Iterator[Int]) => { if (index < 300 && index >= 150) { Thread.sleep(index * 1000) // Fake running tasks } else if (index == 300) { Thread.sleep(1000 * 1000) // Fake long running tasks } it.toList.map(x => index + ", " + x).iterator }).collect {code} You will see when running the last task, we would be hold 39 executors: h3. The Bug Upon examining the code of _pendingSpeculativeTasks_: {code:java} stageAttemptToNumSpeculativeTasks.map { case (stageAttempt, numTasks) => numTasks - stageAttemptToSpeculativeTaskIndices.get(stageAttempt).map(_.size).getOrElse(0) }.sum {code} where _stageAttemptToNumSpeculativeTasks(stageAttempt)_ is incremented on _onSpeculativeTaskSubmitted_, but never decremented. _stageAttemptToNumSpeculativeTasks -= stageAttempt_ is performed on stage completion. *This means Spark is marking ended speculative tasks as pending, which leads to Spark to hold more executors that it actually needs!* I will have a PR ready to fix this issue, along with SPARK-2840 too was: *TL;DR* When speculative tasks finished/failed/got killed, they are still considered as pending and count towards the calculation of number of needed executors. h3. Symptom In one of our production job (where it's running 4 tasks per executor), we found that it was holding 6 executors at the end with only 2 tasks running (1 speculative). With more logging enabled, we found the job printed: {code:java} pendingTasks is 0 pendingSpeculativeTasks is 17 totalRunningTasks is 2 {code} while the job only had 1 speculative task running and 16 speculative tasks intentionally killed because of corresponding original tasks had finished. An easy repro of the issue (`--conf spark.speculation=true --conf spark.executor.cores=4 --conf spark.dynamicAllocation.maxExecutors=1000` in cluster mode): {code:java} val n = 4000 val someRDD = sc.parallelize(1 to n, n) someRDD.mapPartitionsWithIndex( (index: Int, it: Iterator[Int]) => { if (index < 300 && index >= 150) { Thread.sleep(index * 1000) // Fake running tasks } else if (index == 300) { Thread.sleep(1000 * 1000) // Fake long running tasks } it.toList.map(x => index + ", " + x).iterator }).collect {code} You will see when running the last task, we would be hold 39 executors: !image-2020-01-15-11-09-29-215.png! h3. The Bug Upon examining the code of _pendingSpeculativeTasks_: {code:java} stageAttemptToNumSpeculativeTasks.map { case (stageAttempt, numTasks) => numTasks - stageAttemptToSpeculativeTaskIndices.get(stageAttempt).map(_.size).getOrElse(0) }.sum {code} where _stageAttemptToNumSpeculativeTasks(stageAttempt)_ is incremented on _onSpeculativeTaskSubmitted_, but never decremented. _stageAttemptToNumSpeculativeTasks -= stageAttempt_ is performed on stage completion. *This means Spark is marking ended speculative tasks as pending, which leads to Spark to hold more executors that it actually needs!* I will have a PR ready to fix this issue, along with SPARK-2840 too > Spark marks ended speculative tasks as pending leads to holding idle executors > -- > > Key: SPARK-30511 > URL: https://issues.apache.org/jira/browse/SPARK-30511 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 2.3.0 >Reporter: Zebing Lin >Priority: Major > > *TL;DR* > When speculative tasks finished/failed/got killed, they are still considered > as pending and count towards the calculation of number of needed executors. > h3. Symptom > In one of our production job (where it's running 4 tasks per executor), we > found that it was holding 6 executors at the end with only 2 tasks running (1 > speculative). With more logging enabled, we found the job printed: > {cod
[jira] [Updated] (SPARK-30511) Spark marks ended speculative tasks as pending leads to holding idle executors
[ https://issues.apache.org/jira/browse/SPARK-30511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zebing Lin updated SPARK-30511: --- Description: *TL;DR* When speculative tasks finished/failed/got killed, they are still considered as pending and count towards the calculation of number of needed executors. h3. Symptom In one of our production job (where it's running 4 tasks per executor), we found that it was holding 6 executors at the end with only 2 tasks running (1 speculative). With more logging enabled, we found the job printed: {code:java} pendingTasks is 0 pendingSpeculativeTasks is 17 totalRunningTasks is 2 {code} while the job only had 1 speculative task running and 16 speculative tasks intentionally killed because of corresponding original tasks had finished. An easy repro of the issue (`--conf spark.speculation=true --conf spark.executor.cores=4 --conf spark.dynamicAllocation.maxExecutors=1000` in cluster mode): {code:java} val n = 4000 val someRDD = sc.parallelize(1 to n, n) someRDD.mapPartitionsWithIndex( (index: Int, it: Iterator[Int]) => { if (index < 300 && index >= 150) { Thread.sleep(index * 1000) // Fake running tasks } else if (index == 300) { Thread.sleep(1000 * 1000) // Fake long running tasks } it.toList.map(x => index + ", " + x).iterator }).collect {code} You will see when running the last task, we would be hold 39 executors (see attachment). h3. The Bug Upon examining the code of _pendingSpeculativeTasks_: {code:java} stageAttemptToNumSpeculativeTasks.map { case (stageAttempt, numTasks) => numTasks - stageAttemptToSpeculativeTaskIndices.get(stageAttempt).map(_.size).getOrElse(0) }.sum {code} where _stageAttemptToNumSpeculativeTasks(stageAttempt)_ is incremented on _onSpeculativeTaskSubmitted_, but never decremented. _stageAttemptToNumSpeculativeTasks -= stageAttempt_ is performed on stage completion. *This means Spark is marking ended speculative tasks as pending, which leads to Spark to hold more executors that it actually needs!* I will have a PR ready to fix this issue, along with SPARK-2840 too was: *TL;DR* When speculative tasks finished/failed/got killed, they are still considered as pending and count towards the calculation of number of needed executors. h3. Symptom In one of our production job (where it's running 4 tasks per executor), we found that it was holding 6 executors at the end with only 2 tasks running (1 speculative). With more logging enabled, we found the job printed: {code:java} pendingTasks is 0 pendingSpeculativeTasks is 17 totalRunningTasks is 2 {code} while the job only had 1 speculative task running and 16 speculative tasks intentionally killed because of corresponding original tasks had finished. An easy repro of the issue (`--conf spark.speculation=true --conf spark.executor.cores=4 --conf spark.dynamicAllocation.maxExecutors=1000` in cluster mode): {code:java} val n = 4000 val someRDD = sc.parallelize(1 to n, n) someRDD.mapPartitionsWithIndex( (index: Int, it: Iterator[Int]) => { if (index < 300 && index >= 150) { Thread.sleep(index * 1000) // Fake running tasks } else if (index == 300) { Thread.sleep(1000 * 1000) // Fake long running tasks } it.toList.map(x => index + ", " + x).iterator }).collect {code} You will see when running the last task, we would be hold 39 executors: h3. The Bug Upon examining the code of _pendingSpeculativeTasks_: {code:java} stageAttemptToNumSpeculativeTasks.map { case (stageAttempt, numTasks) => numTasks - stageAttemptToSpeculativeTaskIndices.get(stageAttempt).map(_.size).getOrElse(0) }.sum {code} where _stageAttemptToNumSpeculativeTasks(stageAttempt)_ is incremented on _onSpeculativeTaskSubmitted_, but never decremented. _stageAttemptToNumSpeculativeTasks -= stageAttempt_ is performed on stage completion. *This means Spark is marking ended speculative tasks as pending, which leads to Spark to hold more executors that it actually needs!* I will have a PR ready to fix this issue, along with SPARK-2840 too > Spark marks ended speculative tasks as pending leads to holding idle executors > -- > > Key: SPARK-30511 > URL: https://issues.apache.org/jira/browse/SPARK-30511 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 2.3.0 >Reporter: Zebing Lin >Priority: Major > > *TL;DR* > When speculative tasks finished/failed/got killed, they are still considered > as pending and count towards the calculation of number of needed executors. > h3. Symptom > In one of our production job (where it's running 4 tasks per executor), we > found that it was holding 6 executors at the end with only 2 tasks running (1 > speculative). With more logging enabled, we found the job printed: > {code:java} > pendingTa
[jira] [Updated] (SPARK-30511) Spark marks ended speculative tasks as pending leads to holding idle executors
[ https://issues.apache.org/jira/browse/SPARK-30511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zebing Lin updated SPARK-30511: --- Description: *TL;DR* When speculative tasks finished/failed/got killed, they are still considered as pending and count towards the calculation of number of needed executors. h3. Symptom In one of our production job (where it's running 4 tasks per executor), we found that it was holding 6 executors at the end with only 2 tasks running (1 speculative). With more logging enabled, we found the job printed: {code:java} pendingTasks is 0 pendingSpeculativeTasks is 17 totalRunningTasks is 2 {code} while the job only had 1 speculative task running and 16 speculative tasks intentionally killed because of corresponding original tasks had finished. An easy repro of the issue (`--conf spark.speculation=true --conf spark.executor.cores=4 --conf spark.dynamicAllocation.maxExecutors=1000` in cluster mode): {code:java} val n = 4000 val someRDD = sc.parallelize(1 to n, n) someRDD.mapPartitionsWithIndex( (index: Int, it: Iterator[Int]) => { if (index < 300 && index >= 150) { Thread.sleep(index * 1000) // Fake running tasks } else if (index == 300) { Thread.sleep(1000 * 1000) // Fake long running tasks } it.toList.map(x => index + ", " + x).iterator }).collect {code} You will see when running the last task, we would be hold 39 executors: !image-2020-01-15-11-09-29-215.png! h3. The Bug Upon examining the code of _pendingSpeculativeTasks_: {code:java} stageAttemptToNumSpeculativeTasks.map { case (stageAttempt, numTasks) => numTasks - stageAttemptToSpeculativeTaskIndices.get(stageAttempt).map(_.size).getOrElse(0) }.sum {code} where _stageAttemptToNumSpeculativeTasks(stageAttempt)_ is incremented on _onSpeculativeTaskSubmitted_, but never decremented. _stageAttemptToNumSpeculativeTasks -= stageAttempt_ is performed on stage completion. *This means Spark is marking ended speculative tasks as pending, which leads to Spark to hold more executors that it actually needs!* I will have a PR ready to fix this issue, along with SPARK-2840 too was: *TL;DR* When speculative tasks finished/failed/got killed, they are still considered as pending and count towards the calculation of number of needed executors. h3. Symptom In one of our production job (where it's running 4 tasks per executor), we found that it was holding 6 executors at the end with only 2 tasks running (1 speculative). With more logging enabled, we found the job printed: {code:java} pendingTasks is 0 pendingSpeculativeTasks is 17 totalRunningTasks is 2 {code} while the job only had 1 speculative task running and 16 speculative tasks intentionally killed because of corresponding original tasks had finished. h3. The Bug Upon examining the code of _pendingSpeculativeTasks_: {code:java} stageAttemptToNumSpeculativeTasks.map { case (stageAttempt, numTasks) => numTasks - stageAttemptToSpeculativeTaskIndices.get(stageAttempt).map(_.size).getOrElse(0) }.sum {code} where _stageAttemptToNumSpeculativeTasks(stageAttempt)_ is incremented on _onSpeculativeTaskSubmitted_, but never decremented. _stageAttemptToNumSpeculativeTasks -= stageAttempt_ is performed on stage completion. *This means Spark is marking ended speculative tasks as pending, which leads to Spark to hold more executors that it actually needs!* I will have a PR ready to fix this issue, along with SPARK-2840 too > Spark marks ended speculative tasks as pending leads to holding idle executors > -- > > Key: SPARK-30511 > URL: https://issues.apache.org/jira/browse/SPARK-30511 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 2.3.0 >Reporter: Zebing Lin >Priority: Major > > *TL;DR* > When speculative tasks finished/failed/got killed, they are still considered > as pending and count towards the calculation of number of needed executors. > h3. Symptom > In one of our production job (where it's running 4 tasks per executor), we > found that it was holding 6 executors at the end with only 2 tasks running (1 > speculative). With more logging enabled, we found the job printed: > {code:java} > pendingTasks is 0 pendingSpeculativeTasks is 17 totalRunningTasks is 2 > {code} > while the job only had 1 speculative task running and 16 speculative tasks > intentionally killed because of corresponding original tasks had finished. > An easy repro of the issue (`--conf spark.speculation=true --conf > spark.executor.cores=4 --conf spark.dynamicAllocation.maxExecutors=1000` in > cluster mode): > {code:java} > val n = 4000 > val someRDD = sc.parallelize(1 to n, n) > someRDD.mapPartitionsWithIndex( (index: Int, it: Iterator[Int]) => { > if (index < 300 && index >= 150) { > Thread.s
[jira] [Updated] (SPARK-30511) Spark marks ended speculative tasks as pending leads to holding idle executors
[ https://issues.apache.org/jira/browse/SPARK-30511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zebing Lin updated SPARK-30511: --- External issue ID: (was: SPARK-2840) > Spark marks ended speculative tasks as pending leads to holding idle executors > -- > > Key: SPARK-30511 > URL: https://issues.apache.org/jira/browse/SPARK-30511 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 2.3.0 >Reporter: Zebing Lin >Priority: Major > > *TL;DR* > When speculative tasks finished/failed/got killed, they are still considered > as pending and count towards the calculation of number of needed executors. > h3. Symptom > In one of our production job (where it's running 4 tasks per executor), we > found that it was holding 6 executors at the end with only 2 tasks running (1 > speculative). With more logging enabled, we found the job printed: > {code:java} > pendingTasks is 0 pendingSpeculativeTasks is 17 totalRunningTasks is 2 > {code} > while the job only had 1 speculative task running and 16 speculative tasks > intentionally killed because of corresponding original tasks had finished. > h3. The Bug > Upon examining the code of _pendingSpeculativeTasks_: > {code:java} > stageAttemptToNumSpeculativeTasks.map { case (stageAttempt, numTasks) => > numTasks - > stageAttemptToSpeculativeTaskIndices.get(stageAttempt).map(_.size).getOrElse(0) > }.sum > {code} > where _stageAttemptToNumSpeculativeTasks(stageAttempt)_ is incremented on > _onSpeculativeTaskSubmitted_, but never decremented. > _stageAttemptToNumSpeculativeTasks -= stageAttempt_ is performed on stage > completion. *This means Spark is marking ended speculative tasks as pending, > which leads to Spark to hold more executors that it actually needs!* > I will have a PR ready to fix this issue, along with SPARK-2840 too > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30511) Spark marks ended speculative tasks as pending leads to holding idle executors
[ https://issues.apache.org/jira/browse/SPARK-30511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zebing Lin updated SPARK-30511: --- Description: *TL;DR* When speculative tasks finished/failed/got killed, they are still considered as pending and count towards the calculation of number of needed executors. h3. Symptom In one of our production job (where it's running 4 tasks per executor), we found that it was holding 6 executors at the end with only 2 tasks running (1 speculative). With more logging enabled, we found the job printed: {code:java} pendingTasks is 0 pendingSpeculativeTasks is 17 totalRunningTasks is 2 {code} while the job only had 1 speculative task running and 16 speculative tasks intentionally killed because of corresponding original tasks had finished. h3. The Bug Upon examining the code of _pendingSpeculativeTasks_: {code:java} stageAttemptToNumSpeculativeTasks.map { case (stageAttempt, numTasks) => numTasks - stageAttemptToSpeculativeTaskIndices.get(stageAttempt).map(_.size).getOrElse(0) }.sum {code} where _stageAttemptToNumSpeculativeTasks(stageAttempt)_ is incremented on _onSpeculativeTaskSubmitted_, but never decremented. _stageAttemptToNumSpeculativeTasks -= stageAttempt_ is performed on stage completion. *This means Spark is marking ended speculative tasks as pending, which leads to Spark to hold more executors that it actually needs!* I will have a PR ready to fix this issue, along with SPARK-2840 too was: *TL;DR* When speculative tasks finished/failed/got killed, they are still considered as pending and count towards the calculation of number of needed executors. h3. Symptom In one of our production job (where it's running 4 tasks per executor), we found that it was holding 6 executors at the end with only 2 tasks running (1 speculative). With more logging enabled, we found the job printed: {code:java} pendingTasks is 0 pendingSpeculativeTasks is 17 totalRunningTasks is 2 {code} while the job only had 1 speculative task running and 16 speculative tasks intentionally killed because of corresponding original tasks had finished. h3. The Bug Upon examining the code of _pendingSpeculativeTasks_: {code:java} stageAttemptToNumSpeculativeTasks.map { case (stageAttempt, numTasks) => numTasks - stageAttemptToSpeculativeTaskIndices.get(stageAttempt).map(_.size).getOrElse(0) }.sum {code} where _stageAttemptToNumSpeculativeTasks(stageAttempt)_ is incremented on _onSpeculativeTaskSubmitted_, but never decremented. _stageAttemptToNumSpeculativeTasks -= stageAttempt_ is performed on stage completion. *This means Spark is marking ended speculative tasks as pending, which leads to Spark to hold more executors that it actually needs!* I will have a PR ready to fix this issue, along with SPARK-2840 too > Spark marks ended speculative tasks as pending leads to holding idle executors > -- > > Key: SPARK-30511 > URL: https://issues.apache.org/jira/browse/SPARK-30511 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 2.3.0 >Reporter: Zebing Lin >Priority: Major > > *TL;DR* > When speculative tasks finished/failed/got killed, they are still considered > as pending and count towards the calculation of number of needed executors. > h3. Symptom > In one of our production job (where it's running 4 tasks per executor), we > found that it was holding 6 executors at the end with only 2 tasks running (1 > speculative). With more logging enabled, we found the job printed: > {code:java} > pendingTasks is 0 pendingSpeculativeTasks is 17 totalRunningTasks is 2 > {code} > while the job only had 1 speculative task running and 16 speculative tasks > intentionally killed because of corresponding original tasks had finished. > h3. The Bug > Upon examining the code of _pendingSpeculativeTasks_: > {code:java} > stageAttemptToNumSpeculativeTasks.map { case (stageAttempt, numTasks) => > numTasks - > stageAttemptToSpeculativeTaskIndices.get(stageAttempt).map(_.size).getOrElse(0) > }.sum > {code} > where _stageAttemptToNumSpeculativeTasks(stageAttempt)_ is incremented on > _onSpeculativeTaskSubmitted_, but never decremented. > _stageAttemptToNumSpeculativeTasks -= stageAttempt_ is performed on stage > completion. *This means Spark is marking ended speculative tasks as pending, > which leads to Spark to hold more executors that it actually needs!* > I will have a PR ready to fix this issue, along with SPARK-2840 too > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30511) Spark marks ended speculative tasks as pending leads to holding idle executors
[ https://issues.apache.org/jira/browse/SPARK-30511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zebing Lin updated SPARK-30511: --- External issue ID: SPARK-2840 > Spark marks ended speculative tasks as pending leads to holding idle executors > -- > > Key: SPARK-30511 > URL: https://issues.apache.org/jira/browse/SPARK-30511 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 2.3.0 >Reporter: Zebing Lin >Priority: Major > > *TL;DR* > When speculative tasks finished/failed/got killed, they are still considered > as pending and count towards the calculation of number of needed executors. > h3. Symptom > In one of our production job (where it's running 4 tasks per executor), we > found that it was holding 6 executors at the end with only 2 tasks running (1 > speculative). With more logging enabled, we found the job printed: > {code:java} > pendingTasks is 0 pendingSpeculativeTasks is 17 totalRunningTasks is 2 > {code} > while the job only had 1 speculative task running and 16 speculative tasks > intentionally killed because of corresponding original tasks had finished. > h3. The Bug > Upon examining the code of _pendingSpeculativeTasks_: > {code:java} > stageAttemptToNumSpeculativeTasks.map { case (stageAttempt, numTasks) => > numTasks - > stageAttemptToSpeculativeTaskIndices.get(stageAttempt).map(_.size).getOrElse(0) > }.sum > {code} > where _stageAttemptToNumSpeculativeTasks(stageAttempt)_ is incremented on > _onSpeculativeTaskSubmitted_, but never decremented. > _stageAttemptToNumSpeculativeTasks -= stageAttempt_ is performed on stage > completion. *This means Spark is marking ended speculative tasks as pending, > which leads to Spark to hold more executors that it actually needs!* > I will have a PR ready to fix this issue, along with SPARK-2840 too > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30511) Spark marks ended speculative tasks as pending leads to holding idle executors
Zebing Lin created SPARK-30511: -- Summary: Spark marks ended speculative tasks as pending leads to holding idle executors Key: SPARK-30511 URL: https://issues.apache.org/jira/browse/SPARK-30511 Project: Spark Issue Type: Bug Components: Scheduler Affects Versions: 2.3.0 Reporter: Zebing Lin *TL;DR* When speculative tasks finished/failed/got killed, they are still considered as pending and count towards the calculation of number of needed executors. h3. Symptom In one of our production job (where it's running 4 tasks per executor), we found that it was holding 6 executors at the end with only 2 tasks running (1 speculative). With more logging enabled, we found the job printed: {code:java} pendingTasks is 0 pendingSpeculativeTasks is 17 totalRunningTasks is 2 {code} while the job only had 1 speculative task running and 16 speculative tasks intentionally killed because of corresponding original tasks had finished. h3. The Bug Upon examining the code of _pendingSpeculativeTasks_: {code:java} stageAttemptToNumSpeculativeTasks.map { case (stageAttempt, numTasks) => numTasks - stageAttemptToSpeculativeTaskIndices.get(stageAttempt).map(_.size).getOrElse(0) }.sum {code} where _stageAttemptToNumSpeculativeTasks(stageAttempt)_ is incremented on _onSpeculativeTaskSubmitted_, but never decremented. _stageAttemptToNumSpeculativeTasks -= stageAttempt_ is performed on stage completion. *This means Spark is marking ended speculative tasks as pending, which leads to Spark to hold more executors that it actually needs!* I will have a PR ready to fix this issue, along with SPARK-2840 too -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org