[jira] [Commented] (SPARK-34707) Code-gen broadcast nested loop join (left outer/right outer)

2021-03-22 Thread Zebing Lin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17306519#comment-17306519
 ] 

Zebing Lin commented on SPARK-34707:


Created PR https://github.com/apache/spark/pull/31931

> Code-gen broadcast nested loop join (left outer/right outer)
> 
>
> Key: SPARK-34707
> URL: https://issues.apache.org/jira/browse/SPARK-34707
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Cheng Su
>Priority: Minor
>
> We saw 1x run-time improvement for code-gen broadcast nested loop inner join 
> (https://issues.apache.org/jira/browse/SPARK-34620 ). Similarly let's add 
> code-gen for left outer (build right side), and right outer (build left side) 
> as well here.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30511) Spark marks intentionally killed speculative tasks as pending leads to holding idle executors

2020-01-16 Thread Zebing Lin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zebing Lin updated SPARK-30511:
---
Description: 
*TL;DR*
 When speculative tasks fail/get killed, they are still considered as pending 
and count towards the calculation of number of needed executors.
h3. Symptom

In one of our production job (where it's running 4 tasks per executor), we 
found that it was holding 6 executors at the end with only 2 tasks running (1 
speculative). With more logging enabled, we found the job printed:
{code:java}
pendingTasks is 0 pendingSpeculativeTasks is 17 totalRunningTasks is 2
{code}
 while the job only had 1 speculative task running and 16 speculative tasks 
intentionally killed because of corresponding original tasks had finished.

An easy repro of the issue (`--conf spark.speculation=true --conf 
spark.executor.cores=4 --conf spark.dynamicAllocation.maxExecutors=1000` in 
cluster mode):
{code:java}
val n = 4000
val someRDD = sc.parallelize(1 to n, n)
someRDD.mapPartitionsWithIndex( (index: Int, it: Iterator[Int]) => {
if (index < 300 && index >= 150) {
Thread.sleep(index * 1000) // Fake running tasks
} else if (index == 300) {
Thread.sleep(1000 * 1000) // Fake long running tasks
}
it.toList.map(x => index + ", " + x).iterator
}).collect
{code}
You will see when running the last task, we would be hold 38 executors (see 
attachment), which is exactly (152 + 3) / 4 = 38.
h3. The Bug

Upon examining the code of _pendingSpeculativeTasks_: 
{code:java}
stageAttemptToNumSpeculativeTasks.map { case (stageAttempt, numTasks) =>
  numTasks - 
stageAttemptToSpeculativeTaskIndices.get(stageAttempt).map(_.size).getOrElse(0)
}.sum
{code}
where _stageAttemptToNumSpeculativeTasks(stageAttempt)_ is incremented on 
_onSpeculativeTaskSubmitted_, but never decremented.  
_stageAttemptToNumSpeculativeTasks -= stageAttempt_ is performed on stage 
completion. *This means Spark is marking ended speculative tasks as pending, 
which leads to Spark to hold more executors that it actually needs!*

I will have a PR ready to fix this issue, along with SPARK-28403 too

 

 

 

  was:
*TL;DR*
 When speculative tasks finished/failed/got killed, they are still considered 
as pending and count towards the calculation of number of needed executors.
h3. Symptom

In one of our production job (where it's running 4 tasks per executor), we 
found that it was holding 6 executors at the end with only 2 tasks running (1 
speculative). With more logging enabled, we found the job printed:
{code:java}
pendingTasks is 0 pendingSpeculativeTasks is 17 totalRunningTasks is 2
{code}
 while the job only had 1 speculative task running and 16 speculative tasks 
intentionally killed because of corresponding original tasks had finished.

An easy repro of the issue (`--conf spark.speculation=true --conf 
spark.executor.cores=4 --conf spark.dynamicAllocation.maxExecutors=1000` in 
cluster mode):
{code:java}
val n = 4000
val someRDD = sc.parallelize(1 to n, n)
someRDD.mapPartitionsWithIndex( (index: Int, it: Iterator[Int]) => {
if (index < 300 && index >= 150) {
Thread.sleep(index * 1000) // Fake running tasks
} else if (index == 300) {
Thread.sleep(1000 * 1000) // Fake long running tasks
}
it.toList.map(x => index + ", " + x).iterator
}).collect
{code}
You will see when running the last task, we would be hold 38 executors (see 
attachment), which is exactly (152 + 3) / 4 = 38.
h3. The Bug

Upon examining the code of _pendingSpeculativeTasks_: 
{code:java}
stageAttemptToNumSpeculativeTasks.map { case (stageAttempt, numTasks) =>
  numTasks - 
stageAttemptToSpeculativeTaskIndices.get(stageAttempt).map(_.size).getOrElse(0)
}.sum
{code}
where _stageAttemptToNumSpeculativeTasks(stageAttempt)_ is incremented on 
_onSpeculativeTaskSubmitted_, but never decremented.  
_stageAttemptToNumSpeculativeTasks -= stageAttempt_ is performed on stage 
completion. *This means Spark is marking ended speculative tasks as pending, 
which leads to Spark to hold more executors that it actually needs!*

I will have a PR ready to fix this issue, along with SPARK-28403 too

 

 

 


> Spark marks intentionally killed speculative tasks as pending leads to 
> holding idle executors
> -
>
> Key: SPARK-30511
> URL: https://issues.apache.org/jira/browse/SPARK-30511
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 2.3.0
>Reporter: Zebing Lin
>Priority: Major
> Attachments: Screen Shot 2020-01-15 at 11.13.17.png
>
>
> *TL;DR*
>  When speculative tasks fail/get killed, they are still considered as pending 
> and count towards the calculation of number of needed executors.
> h3. Symptom
> In one of our production job (where it's running 4 tasks per executor), we 
> foun

[jira] [Updated] (SPARK-30511) Spark marks intentionally killed speculative tasks as pending leads to holding idle executors

2020-01-16 Thread Zebing Lin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zebing Lin updated SPARK-30511:
---
Summary: Spark marks intentionally killed speculative tasks as pending 
leads to holding idle executors  (was: Spark marks ended speculative tasks as 
pending leads to holding idle executors)

> Spark marks intentionally killed speculative tasks as pending leads to 
> holding idle executors
> -
>
> Key: SPARK-30511
> URL: https://issues.apache.org/jira/browse/SPARK-30511
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 2.3.0
>Reporter: Zebing Lin
>Priority: Major
> Attachments: Screen Shot 2020-01-15 at 11.13.17.png
>
>
> *TL;DR*
>  When speculative tasks finished/failed/got killed, they are still considered 
> as pending and count towards the calculation of number of needed executors.
> h3. Symptom
> In one of our production job (where it's running 4 tasks per executor), we 
> found that it was holding 6 executors at the end with only 2 tasks running (1 
> speculative). With more logging enabled, we found the job printed:
> {code:java}
> pendingTasks is 0 pendingSpeculativeTasks is 17 totalRunningTasks is 2
> {code}
>  while the job only had 1 speculative task running and 16 speculative tasks 
> intentionally killed because of corresponding original tasks had finished.
> An easy repro of the issue (`--conf spark.speculation=true --conf 
> spark.executor.cores=4 --conf spark.dynamicAllocation.maxExecutors=1000` in 
> cluster mode):
> {code:java}
> val n = 4000
> val someRDD = sc.parallelize(1 to n, n)
> someRDD.mapPartitionsWithIndex( (index: Int, it: Iterator[Int]) => {
> if (index < 300 && index >= 150) {
> Thread.sleep(index * 1000) // Fake running tasks
> } else if (index == 300) {
> Thread.sleep(1000 * 1000) // Fake long running tasks
> }
> it.toList.map(x => index + ", " + x).iterator
> }).collect
> {code}
> You will see when running the last task, we would be hold 38 executors (see 
> attachment), which is exactly (152 + 3) / 4 = 38.
> h3. The Bug
> Upon examining the code of _pendingSpeculativeTasks_: 
> {code:java}
> stageAttemptToNumSpeculativeTasks.map { case (stageAttempt, numTasks) =>
>   numTasks - 
> stageAttemptToSpeculativeTaskIndices.get(stageAttempt).map(_.size).getOrElse(0)
> }.sum
> {code}
> where _stageAttemptToNumSpeculativeTasks(stageAttempt)_ is incremented on 
> _onSpeculativeTaskSubmitted_, but never decremented.  
> _stageAttemptToNumSpeculativeTasks -= stageAttempt_ is performed on stage 
> completion. *This means Spark is marking ended speculative tasks as pending, 
> which leads to Spark to hold more executors that it actually needs!*
> I will have a PR ready to fix this issue, along with SPARK-28403 too
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-28403) Executor Allocation Manager can add an extra executor when speculative tasks

2020-01-15 Thread Zebing Lin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17016316#comment-17016316
 ] 

Zebing Lin edited comment on SPARK-28403 at 1/15/20 8:53 PM:
-

In our production, this just caused a fluctuation of requested executors:
{code:java}
Total executors: Running = 6, Needed = 6, Requested = 6
Lowering target number of executors to 5 (previously 6) because not all 
requested executors are actually needed
Total executors: Running = 6, Needed = 5, Requested = 6
Lowering target number of executors to 5 (previously 6) because not all 
requested executors are actually needed
Total executors: Running = 6, Needed = 5, Requested = 6
{code}
I think this logic can be deleted.


was (Author: zebingl):
In our production, this just caused a fluctuation of requested executors:
{code:java}
Total executors: Running = 6, Needed = 6, Requested = 6
Lowering target number of executors to 5 (previously 6) because not all 
requested executors are actually needed
Total executors: Running = 6, Needed = 5, Requested = 6
Total executors: Running = 6, Needed = 6, Requested = 6
Lowering target number of executors to 5 (previously 6) because not all 
requested executors are actually needed
Total executors: Running = 6, Needed = 5, Requested = 6
{code}
I think this logic can be deleted.

> Executor Allocation Manager can add an extra executor when speculative tasks
> 
>
> Key: SPARK-28403
> URL: https://issues.apache.org/jira/browse/SPARK-28403
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Thomas Graves
>Priority: Major
>
> It looks like SPARK-19326 added a bug in the execuctor allocation maanger 
> where it adds an extra executor when it shouldn't when we have pending 
> speculative tasks but the target number didn't change. 
> [https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala#L377]
> It doesn't look like this is necessary since it already added in the 
> pendingSpeculative tasks.
> See the questioning of this on the PR at:
> https://github.com/apache/spark/pull/18492/files#diff-b096353602813e47074ace09a3890d56R379



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28403) Executor Allocation Manager can add an extra executor when speculative tasks

2020-01-15 Thread Zebing Lin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17016316#comment-17016316
 ] 

Zebing Lin commented on SPARK-28403:


In our production, this just caused a fluctuation of requested executors:
{code:java}
Total executors: Running = 6, Needed = 6, Requested = 6
Lowering target number of executors to 5 (previously 6) because not all 
requested executors are actually needed
Total executors: Running = 6, Needed = 5, Requested = 6
Total executors: Running = 6, Needed = 6, Requested = 6
Lowering target number of executors to 5 (previously 6) because not all 
requested executors are actually needed
Total executors: Running = 6, Needed = 5, Requested = 6
{code}
I think this logic can be deleted.

> Executor Allocation Manager can add an extra executor when speculative tasks
> 
>
> Key: SPARK-28403
> URL: https://issues.apache.org/jira/browse/SPARK-28403
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Thomas Graves
>Priority: Major
>
> It looks like SPARK-19326 added a bug in the execuctor allocation maanger 
> where it adds an extra executor when it shouldn't when we have pending 
> speculative tasks but the target number didn't change. 
> [https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala#L377]
> It doesn't look like this is necessary since it already added in the 
> pendingSpeculative tasks.
> See the questioning of this on the PR at:
> https://github.com/apache/spark/pull/18492/files#diff-b096353602813e47074ace09a3890d56R379



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30511) Spark marks ended speculative tasks as pending leads to holding idle executors

2020-01-15 Thread Zebing Lin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zebing Lin updated SPARK-30511:
---
Description: 
*TL;DR*
 When speculative tasks finished/failed/got killed, they are still considered 
as pending and count towards the calculation of number of needed executors.
h3. Symptom

In one of our production job (where it's running 4 tasks per executor), we 
found that it was holding 6 executors at the end with only 2 tasks running (1 
speculative). With more logging enabled, we found the job printed:
{code:java}
pendingTasks is 0 pendingSpeculativeTasks is 17 totalRunningTasks is 2
{code}
 while the job only had 1 speculative task running and 16 speculative tasks 
intentionally killed because of corresponding original tasks had finished.

An easy repro of the issue (`--conf spark.speculation=true --conf 
spark.executor.cores=4 --conf spark.dynamicAllocation.maxExecutors=1000` in 
cluster mode):
{code:java}
val n = 4000
val someRDD = sc.parallelize(1 to n, n)
someRDD.mapPartitionsWithIndex( (index: Int, it: Iterator[Int]) => {
if (index < 300 && index >= 150) {
Thread.sleep(index * 1000) // Fake running tasks
} else if (index == 300) {
Thread.sleep(1000 * 1000) // Fake long running tasks
}
it.toList.map(x => index + ", " + x).iterator
}).collect
{code}
You will see when running the last task, we would be hold 38 executors (see 
attachment), which is exactly (152 + 3) / 4 = 38.
h3. The Bug

Upon examining the code of _pendingSpeculativeTasks_: 
{code:java}
stageAttemptToNumSpeculativeTasks.map { case (stageAttempt, numTasks) =>
  numTasks - 
stageAttemptToSpeculativeTaskIndices.get(stageAttempt).map(_.size).getOrElse(0)
}.sum
{code}
where _stageAttemptToNumSpeculativeTasks(stageAttempt)_ is incremented on 
_onSpeculativeTaskSubmitted_, but never decremented.  
_stageAttemptToNumSpeculativeTasks -= stageAttempt_ is performed on stage 
completion. *This means Spark is marking ended speculative tasks as pending, 
which leads to Spark to hold more executors that it actually needs!*

I will have a PR ready to fix this issue, along with SPARK-28403 too

 

 

 

  was:
*TL;DR*
 When speculative tasks finished/failed/got killed, they are still considered 
as pending and count towards the calculation of number of needed executors.
h3. Symptom

In one of our production job (where it's running 4 tasks per executor), we 
found that it was holding 6 executors at the end with only 2 tasks running (1 
speculative). With more logging enabled, we found the job printed:
{code:java}
pendingTasks is 0 pendingSpeculativeTasks is 17 totalRunningTasks is 2
{code}
 while the job only had 1 speculative task running and 16 speculative tasks 
intentionally killed because of corresponding original tasks had finished.

An easy repro of the issue (`--conf spark.speculation=true --conf 
spark.executor.cores=4 --conf spark.dynamicAllocation.maxExecutors=1000` in 
cluster mode):
{code:java}
val n = 4000
val someRDD = sc.parallelize(1 to n, n)
someRDD.mapPartitionsWithIndex( (index: Int, it: Iterator[Int]) => {
if (index < 300 && index >= 150) {
Thread.sleep(index * 1000) // Fake running tasks
} else if (index == 300) {
Thread.sleep(1000 * 1000) // Fake long running tasks
}
it.toList.map(x => index + ", " + x).iterator
}).collect
{code}
You will see when running the last task, we would be hold 38 executors (see 
attachment), which is exactly (152 + 3) / 4 = 38.
h3. The Bug

Upon examining the code of _pendingSpeculativeTasks_: 
{code:java}
stageAttemptToNumSpeculativeTasks.map { case (stageAttempt, numTasks) =>
  numTasks - 
stageAttemptToSpeculativeTaskIndices.get(stageAttempt).map(_.size).getOrElse(0)
}.sum
{code}
where _stageAttemptToNumSpeculativeTasks(stageAttempt)_ is incremented on 
_onSpeculativeTaskSubmitted_, but never decremented.  
_stageAttemptToNumSpeculativeTasks -= stageAttempt_ is performed on stage 
completion. *This means Spark is marking ended speculative tasks as pending, 
which leads to Spark to hold more executors that it actually needs!*

I will have a PR ready to fix this issue, along with SPARK-2840 too

 

 

 


> Spark marks ended speculative tasks as pending leads to holding idle executors
> --
>
> Key: SPARK-30511
> URL: https://issues.apache.org/jira/browse/SPARK-30511
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 2.3.0
>Reporter: Zebing Lin
>Priority: Major
> Attachments: Screen Shot 2020-01-15 at 11.13.17.png
>
>
> *TL;DR*
>  When speculative tasks finished/failed/got killed, they are still considered 
> as pending and count towards the calculation of number of needed executors.
> h3. Symptom
> In one of our production job (where it's running 4 tasks per executor), we 
> found that it wa

[jira] [Updated] (SPARK-30511) Spark marks ended speculative tasks as pending leads to holding idle executors

2020-01-15 Thread Zebing Lin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zebing Lin updated SPARK-30511:
---
Description: 
*TL;DR*
 When speculative tasks finished/failed/got killed, they are still considered 
as pending and count towards the calculation of number of needed executors.
h3. Symptom

In one of our production job (where it's running 4 tasks per executor), we 
found that it was holding 6 executors at the end with only 2 tasks running (1 
speculative). With more logging enabled, we found the job printed:
{code:java}
pendingTasks is 0 pendingSpeculativeTasks is 17 totalRunningTasks is 2
{code}
 while the job only had 1 speculative task running and 16 speculative tasks 
intentionally killed because of corresponding original tasks had finished.

An easy repro of the issue (`--conf spark.speculation=true --conf 
spark.executor.cores=4 --conf spark.dynamicAllocation.maxExecutors=1000` in 
cluster mode):
{code:java}
val n = 4000
val someRDD = sc.parallelize(1 to n, n)
someRDD.mapPartitionsWithIndex( (index: Int, it: Iterator[Int]) => {
if (index < 300 && index >= 150) {
Thread.sleep(index * 1000) // Fake running tasks
} else if (index == 300) {
Thread.sleep(1000 * 1000) // Fake long running tasks
}
it.toList.map(x => index + ", " + x).iterator
}).collect
{code}
You will see when running the last task, we would be hold 38 executors (see 
attachment), which is exactly (152 + 3) / 4 = 38.
h3. The Bug

Upon examining the code of _pendingSpeculativeTasks_: 
{code:java}
stageAttemptToNumSpeculativeTasks.map { case (stageAttempt, numTasks) =>
  numTasks - 
stageAttemptToSpeculativeTaskIndices.get(stageAttempt).map(_.size).getOrElse(0)
}.sum
{code}
where _stageAttemptToNumSpeculativeTasks(stageAttempt)_ is incremented on 
_onSpeculativeTaskSubmitted_, but never decremented.  
_stageAttemptToNumSpeculativeTasks -= stageAttempt_ is performed on stage 
completion. *This means Spark is marking ended speculative tasks as pending, 
which leads to Spark to hold more executors that it actually needs!*

I will have a PR ready to fix this issue, along with SPARK-2840 too

 

 

 

  was:
*TL;DR*
 When speculative tasks finished/failed/got killed, they are still considered 
as pending and count towards the calculation of number of needed executors.
h3. Symptom

In one of our production job (where it's running 4 tasks per executor), we 
found that it was holding 6 executors at the end with only 2 tasks running (1 
speculative). With more logging enabled, we found the job printed:
{code:java}
pendingTasks is 0 pendingSpeculativeTasks is 17 totalRunningTasks is 2
{code}
 while the job only had 1 speculative task running and 16 speculative tasks 
intentionally killed because of corresponding original tasks had finished.

An easy repro of the issue (`--conf spark.speculation=true --conf 
spark.executor.cores=4 --conf spark.dynamicAllocation.maxExecutors=1000` in 
cluster mode):
{code:java}
val n = 4000
val someRDD = sc.parallelize(1 to n, n)
someRDD.mapPartitionsWithIndex( (index: Int, it: Iterator[Int]) => {
if (index < 300 && index >= 150) {
Thread.sleep(index * 1000) // Fake running tasks
} else if (index == 300) {
Thread.sleep(1000 * 1000) // Fake long running tasks
}
it.toList.map(x => index + ", " + x).iterator
}).collect
{code}
You will see when running the last task, we would be hold 39 executors (see 
attachment).
h3. The Bug

Upon examining the code of _pendingSpeculativeTasks_: 
{code:java}
stageAttemptToNumSpeculativeTasks.map { case (stageAttempt, numTasks) =>
  numTasks - 
stageAttemptToSpeculativeTaskIndices.get(stageAttempt).map(_.size).getOrElse(0)
}.sum
{code}
where _stageAttemptToNumSpeculativeTasks(stageAttempt)_ is incremented on 
_onSpeculativeTaskSubmitted_, but never decremented.  
_stageAttemptToNumSpeculativeTasks -= stageAttempt_ is performed on stage 
completion. *This means Spark is marking ended speculative tasks as pending, 
which leads to Spark to hold more executors that it actually needs!*

I will have a PR ready to fix this issue, along with SPARK-2840 too

 

 

 


> Spark marks ended speculative tasks as pending leads to holding idle executors
> --
>
> Key: SPARK-30511
> URL: https://issues.apache.org/jira/browse/SPARK-30511
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 2.3.0
>Reporter: Zebing Lin
>Priority: Major
> Attachments: Screen Shot 2020-01-15 at 11.13.17.png
>
>
> *TL;DR*
>  When speculative tasks finished/failed/got killed, they are still considered 
> as pending and count towards the calculation of number of needed executors.
> h3. Symptom
> In one of our production job (where it's running 4 tasks per executor), we 
> found that it was holding 6 executors at the end with 

[jira] [Updated] (SPARK-30511) Spark marks ended speculative tasks as pending leads to holding idle executors

2020-01-15 Thread Zebing Lin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zebing Lin updated SPARK-30511:
---
Attachment: Screen Shot 2020-01-15 at 11.13.17.png

> Spark marks ended speculative tasks as pending leads to holding idle executors
> --
>
> Key: SPARK-30511
> URL: https://issues.apache.org/jira/browse/SPARK-30511
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 2.3.0
>Reporter: Zebing Lin
>Priority: Major
> Attachments: Screen Shot 2020-01-15 at 11.13.17.png
>
>
> *TL;DR*
>  When speculative tasks finished/failed/got killed, they are still considered 
> as pending and count towards the calculation of number of needed executors.
> h3. Symptom
> In one of our production job (where it's running 4 tasks per executor), we 
> found that it was holding 6 executors at the end with only 2 tasks running (1 
> speculative). With more logging enabled, we found the job printed:
> {code:java}
> pendingTasks is 0 pendingSpeculativeTasks is 17 totalRunningTasks is 2
> {code}
>  while the job only had 1 speculative task running and 16 speculative tasks 
> intentionally killed because of corresponding original tasks had finished.
> An easy repro of the issue (`--conf spark.speculation=true --conf 
> spark.executor.cores=4 --conf spark.dynamicAllocation.maxExecutors=1000` in 
> cluster mode):
> {code:java}
> val n = 4000
> val someRDD = sc.parallelize(1 to n, n)
> someRDD.mapPartitionsWithIndex( (index: Int, it: Iterator[Int]) => {
> if (index < 300 && index >= 150) {
> Thread.sleep(index * 1000) // Fake running tasks
> } else if (index == 300) {
> Thread.sleep(1000 * 1000) // Fake long running tasks
> }
> it.toList.map(x => index + ", " + x).iterator
> }).collect
> {code}
> You will see when running the last task, we would be hold 39 executors (see 
> attachment).
> h3. The Bug
> Upon examining the code of _pendingSpeculativeTasks_: 
> {code:java}
> stageAttemptToNumSpeculativeTasks.map { case (stageAttempt, numTasks) =>
>   numTasks - 
> stageAttemptToSpeculativeTaskIndices.get(stageAttempt).map(_.size).getOrElse(0)
> }.sum
> {code}
> where _stageAttemptToNumSpeculativeTasks(stageAttempt)_ is incremented on 
> _onSpeculativeTaskSubmitted_, but never decremented.  
> _stageAttemptToNumSpeculativeTasks -= stageAttempt_ is performed on stage 
> completion. *This means Spark is marking ended speculative tasks as pending, 
> which leads to Spark to hold more executors that it actually needs!*
> I will have a PR ready to fix this issue, along with SPARK-2840 too
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30511) Spark marks ended speculative tasks as pending leads to holding idle executors

2020-01-15 Thread Zebing Lin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zebing Lin updated SPARK-30511:
---
Description: 
*TL;DR*
 When speculative tasks finished/failed/got killed, they are still considered 
as pending and count towards the calculation of number of needed executors.
h3. Symptom

In one of our production job (where it's running 4 tasks per executor), we 
found that it was holding 6 executors at the end with only 2 tasks running (1 
speculative). With more logging enabled, we found the job printed:
{code:java}
pendingTasks is 0 pendingSpeculativeTasks is 17 totalRunningTasks is 2
{code}
 while the job only had 1 speculative task running and 16 speculative tasks 
intentionally killed because of corresponding original tasks had finished.

An easy repro of the issue (`--conf spark.speculation=true --conf 
spark.executor.cores=4 --conf spark.dynamicAllocation.maxExecutors=1000` in 
cluster mode):
{code:java}
val n = 4000
val someRDD = sc.parallelize(1 to n, n)
someRDD.mapPartitionsWithIndex( (index: Int, it: Iterator[Int]) => {
if (index < 300 && index >= 150) {
Thread.sleep(index * 1000) // Fake running tasks
} else if (index == 300) {
Thread.sleep(1000 * 1000) // Fake long running tasks
}
it.toList.map(x => index + ", " + x).iterator
}).collect
{code}
You will see when running the last task, we would be hold 39 executors:
h3. The Bug

Upon examining the code of _pendingSpeculativeTasks_: 
{code:java}
stageAttemptToNumSpeculativeTasks.map { case (stageAttempt, numTasks) =>
  numTasks - 
stageAttemptToSpeculativeTaskIndices.get(stageAttempt).map(_.size).getOrElse(0)
}.sum
{code}
where _stageAttemptToNumSpeculativeTasks(stageAttempt)_ is incremented on 
_onSpeculativeTaskSubmitted_, but never decremented.  
_stageAttemptToNumSpeculativeTasks -= stageAttempt_ is performed on stage 
completion. *This means Spark is marking ended speculative tasks as pending, 
which leads to Spark to hold more executors that it actually needs!*

I will have a PR ready to fix this issue, along with SPARK-2840 too

 

 

 

  was:
*TL;DR*
 When speculative tasks finished/failed/got killed, they are still considered 
as pending and count towards the calculation of number of needed executors.
h3. Symptom

In one of our production job (where it's running 4 tasks per executor), we 
found that it was holding 6 executors at the end with only 2 tasks running (1 
speculative). With more logging enabled, we found the job printed:
{code:java}
pendingTasks is 0 pendingSpeculativeTasks is 17 totalRunningTasks is 2
{code}
 while the job only had 1 speculative task running and 16 speculative tasks 
intentionally killed because of corresponding original tasks had finished.

An easy repro of the issue (`--conf spark.speculation=true --conf 
spark.executor.cores=4 --conf spark.dynamicAllocation.maxExecutors=1000` in 
cluster mode):
{code:java}
val n = 4000
val someRDD = sc.parallelize(1 to n, n)
someRDD.mapPartitionsWithIndex( (index: Int, it: Iterator[Int]) => {
if (index < 300 && index >= 150) {
Thread.sleep(index * 1000) // Fake running tasks
} else if (index == 300) {
Thread.sleep(1000 * 1000) // Fake long running tasks
}
it.toList.map(x => index + ", " + x).iterator
}).collect
{code}
You will see when running the last task, we would be hold 39 executors:

!image-2020-01-15-11-09-29-215.png!
h3. The Bug

Upon examining the code of _pendingSpeculativeTasks_: 
{code:java}
stageAttemptToNumSpeculativeTasks.map { case (stageAttempt, numTasks) =>
  numTasks - 
stageAttemptToSpeculativeTaskIndices.get(stageAttempt).map(_.size).getOrElse(0)
}.sum
{code}
where _stageAttemptToNumSpeculativeTasks(stageAttempt)_ is incremented on 
_onSpeculativeTaskSubmitted_, but never decremented.  
_stageAttemptToNumSpeculativeTasks -= stageAttempt_ is performed on stage 
completion. *This means Spark is marking ended speculative tasks as pending, 
which leads to Spark to hold more executors that it actually needs!*

I will have a PR ready to fix this issue, along with SPARK-2840 too

 

 

 


> Spark marks ended speculative tasks as pending leads to holding idle executors
> --
>
> Key: SPARK-30511
> URL: https://issues.apache.org/jira/browse/SPARK-30511
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 2.3.0
>Reporter: Zebing Lin
>Priority: Major
>
> *TL;DR*
>  When speculative tasks finished/failed/got killed, they are still considered 
> as pending and count towards the calculation of number of needed executors.
> h3. Symptom
> In one of our production job (where it's running 4 tasks per executor), we 
> found that it was holding 6 executors at the end with only 2 tasks running (1 
> speculative). With more logging enabled, we found the job printed:
> {cod

[jira] [Updated] (SPARK-30511) Spark marks ended speculative tasks as pending leads to holding idle executors

2020-01-15 Thread Zebing Lin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zebing Lin updated SPARK-30511:
---
Description: 
*TL;DR*
 When speculative tasks finished/failed/got killed, they are still considered 
as pending and count towards the calculation of number of needed executors.
h3. Symptom

In one of our production job (where it's running 4 tasks per executor), we 
found that it was holding 6 executors at the end with only 2 tasks running (1 
speculative). With more logging enabled, we found the job printed:
{code:java}
pendingTasks is 0 pendingSpeculativeTasks is 17 totalRunningTasks is 2
{code}
 while the job only had 1 speculative task running and 16 speculative tasks 
intentionally killed because of corresponding original tasks had finished.

An easy repro of the issue (`--conf spark.speculation=true --conf 
spark.executor.cores=4 --conf spark.dynamicAllocation.maxExecutors=1000` in 
cluster mode):
{code:java}
val n = 4000
val someRDD = sc.parallelize(1 to n, n)
someRDD.mapPartitionsWithIndex( (index: Int, it: Iterator[Int]) => {
if (index < 300 && index >= 150) {
Thread.sleep(index * 1000) // Fake running tasks
} else if (index == 300) {
Thread.sleep(1000 * 1000) // Fake long running tasks
}
it.toList.map(x => index + ", " + x).iterator
}).collect
{code}
You will see when running the last task, we would be hold 39 executors (see 
attachment).
h3. The Bug

Upon examining the code of _pendingSpeculativeTasks_: 
{code:java}
stageAttemptToNumSpeculativeTasks.map { case (stageAttempt, numTasks) =>
  numTasks - 
stageAttemptToSpeculativeTaskIndices.get(stageAttempt).map(_.size).getOrElse(0)
}.sum
{code}
where _stageAttemptToNumSpeculativeTasks(stageAttempt)_ is incremented on 
_onSpeculativeTaskSubmitted_, but never decremented.  
_stageAttemptToNumSpeculativeTasks -= stageAttempt_ is performed on stage 
completion. *This means Spark is marking ended speculative tasks as pending, 
which leads to Spark to hold more executors that it actually needs!*

I will have a PR ready to fix this issue, along with SPARK-2840 too

 

 

 

  was:
*TL;DR*
 When speculative tasks finished/failed/got killed, they are still considered 
as pending and count towards the calculation of number of needed executors.
h3. Symptom

In one of our production job (where it's running 4 tasks per executor), we 
found that it was holding 6 executors at the end with only 2 tasks running (1 
speculative). With more logging enabled, we found the job printed:
{code:java}
pendingTasks is 0 pendingSpeculativeTasks is 17 totalRunningTasks is 2
{code}
 while the job only had 1 speculative task running and 16 speculative tasks 
intentionally killed because of corresponding original tasks had finished.

An easy repro of the issue (`--conf spark.speculation=true --conf 
spark.executor.cores=4 --conf spark.dynamicAllocation.maxExecutors=1000` in 
cluster mode):
{code:java}
val n = 4000
val someRDD = sc.parallelize(1 to n, n)
someRDD.mapPartitionsWithIndex( (index: Int, it: Iterator[Int]) => {
if (index < 300 && index >= 150) {
Thread.sleep(index * 1000) // Fake running tasks
} else if (index == 300) {
Thread.sleep(1000 * 1000) // Fake long running tasks
}
it.toList.map(x => index + ", " + x).iterator
}).collect
{code}
You will see when running the last task, we would be hold 39 executors:
h3. The Bug

Upon examining the code of _pendingSpeculativeTasks_: 
{code:java}
stageAttemptToNumSpeculativeTasks.map { case (stageAttempt, numTasks) =>
  numTasks - 
stageAttemptToSpeculativeTaskIndices.get(stageAttempt).map(_.size).getOrElse(0)
}.sum
{code}
where _stageAttemptToNumSpeculativeTasks(stageAttempt)_ is incremented on 
_onSpeculativeTaskSubmitted_, but never decremented.  
_stageAttemptToNumSpeculativeTasks -= stageAttempt_ is performed on stage 
completion. *This means Spark is marking ended speculative tasks as pending, 
which leads to Spark to hold more executors that it actually needs!*

I will have a PR ready to fix this issue, along with SPARK-2840 too

 

 

 


> Spark marks ended speculative tasks as pending leads to holding idle executors
> --
>
> Key: SPARK-30511
> URL: https://issues.apache.org/jira/browse/SPARK-30511
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 2.3.0
>Reporter: Zebing Lin
>Priority: Major
>
> *TL;DR*
>  When speculative tasks finished/failed/got killed, they are still considered 
> as pending and count towards the calculation of number of needed executors.
> h3. Symptom
> In one of our production job (where it's running 4 tasks per executor), we 
> found that it was holding 6 executors at the end with only 2 tasks running (1 
> speculative). With more logging enabled, we found the job printed:
> {code:java}
> pendingTa

[jira] [Updated] (SPARK-30511) Spark marks ended speculative tasks as pending leads to holding idle executors

2020-01-15 Thread Zebing Lin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zebing Lin updated SPARK-30511:
---
Description: 
*TL;DR*
 When speculative tasks finished/failed/got killed, they are still considered 
as pending and count towards the calculation of number of needed executors.
h3. Symptom

In one of our production job (where it's running 4 tasks per executor), we 
found that it was holding 6 executors at the end with only 2 tasks running (1 
speculative). With more logging enabled, we found the job printed:
{code:java}
pendingTasks is 0 pendingSpeculativeTasks is 17 totalRunningTasks is 2
{code}
 while the job only had 1 speculative task running and 16 speculative tasks 
intentionally killed because of corresponding original tasks had finished.

An easy repro of the issue (`--conf spark.speculation=true --conf 
spark.executor.cores=4 --conf spark.dynamicAllocation.maxExecutors=1000` in 
cluster mode):
{code:java}
val n = 4000
val someRDD = sc.parallelize(1 to n, n)
someRDD.mapPartitionsWithIndex( (index: Int, it: Iterator[Int]) => {
if (index < 300 && index >= 150) {
Thread.sleep(index * 1000) // Fake running tasks
} else if (index == 300) {
Thread.sleep(1000 * 1000) // Fake long running tasks
}
it.toList.map(x => index + ", " + x).iterator
}).collect
{code}
You will see when running the last task, we would be hold 39 executors:

!image-2020-01-15-11-09-29-215.png!
h3. The Bug

Upon examining the code of _pendingSpeculativeTasks_: 
{code:java}
stageAttemptToNumSpeculativeTasks.map { case (stageAttempt, numTasks) =>
  numTasks - 
stageAttemptToSpeculativeTaskIndices.get(stageAttempt).map(_.size).getOrElse(0)
}.sum
{code}
where _stageAttemptToNumSpeculativeTasks(stageAttempt)_ is incremented on 
_onSpeculativeTaskSubmitted_, but never decremented.  
_stageAttemptToNumSpeculativeTasks -= stageAttempt_ is performed on stage 
completion. *This means Spark is marking ended speculative tasks as pending, 
which leads to Spark to hold more executors that it actually needs!*

I will have a PR ready to fix this issue, along with SPARK-2840 too

 

 

 

  was:
*TL;DR*
 When speculative tasks finished/failed/got killed, they are still considered 
as pending and count towards the calculation of number of needed executors.
h3. Symptom

In one of our production job (where it's running 4 tasks per executor), we 
found that it was holding 6 executors at the end with only 2 tasks running (1 
speculative). With more logging enabled, we found the job printed:
{code:java}
pendingTasks is 0 pendingSpeculativeTasks is 17 totalRunningTasks is 2
{code}
 while the job only had 1 speculative task running and 16 speculative tasks 
intentionally killed because of corresponding original tasks had finished.
h3. The Bug

Upon examining the code of _pendingSpeculativeTasks_: 
{code:java}
stageAttemptToNumSpeculativeTasks.map { case (stageAttempt, numTasks) =>
  numTasks - 
stageAttemptToSpeculativeTaskIndices.get(stageAttempt).map(_.size).getOrElse(0)
}.sum
{code}
where _stageAttemptToNumSpeculativeTasks(stageAttempt)_ is incremented on 
_onSpeculativeTaskSubmitted_, but never decremented.  
_stageAttemptToNumSpeculativeTasks -= stageAttempt_ is performed on stage 
completion. *This means Spark is marking ended speculative tasks as pending, 
which leads to Spark to hold more executors that it actually needs!*

I will have a PR ready to fix this issue, along with SPARK-2840 too

 

 

 


> Spark marks ended speculative tasks as pending leads to holding idle executors
> --
>
> Key: SPARK-30511
> URL: https://issues.apache.org/jira/browse/SPARK-30511
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 2.3.0
>Reporter: Zebing Lin
>Priority: Major
>
> *TL;DR*
>  When speculative tasks finished/failed/got killed, they are still considered 
> as pending and count towards the calculation of number of needed executors.
> h3. Symptom
> In one of our production job (where it's running 4 tasks per executor), we 
> found that it was holding 6 executors at the end with only 2 tasks running (1 
> speculative). With more logging enabled, we found the job printed:
> {code:java}
> pendingTasks is 0 pendingSpeculativeTasks is 17 totalRunningTasks is 2
> {code}
>  while the job only had 1 speculative task running and 16 speculative tasks 
> intentionally killed because of corresponding original tasks had finished.
> An easy repro of the issue (`--conf spark.speculation=true --conf 
> spark.executor.cores=4 --conf spark.dynamicAllocation.maxExecutors=1000` in 
> cluster mode):
> {code:java}
> val n = 4000
> val someRDD = sc.parallelize(1 to n, n)
> someRDD.mapPartitionsWithIndex( (index: Int, it: Iterator[Int]) => {
> if (index < 300 && index >= 150) {
> Thread.s

[jira] [Updated] (SPARK-30511) Spark marks ended speculative tasks as pending leads to holding idle executors

2020-01-14 Thread Zebing Lin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zebing Lin updated SPARK-30511:
---
External issue ID:   (was: SPARK-2840)

> Spark marks ended speculative tasks as pending leads to holding idle executors
> --
>
> Key: SPARK-30511
> URL: https://issues.apache.org/jira/browse/SPARK-30511
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 2.3.0
>Reporter: Zebing Lin
>Priority: Major
>
> *TL;DR*
>  When speculative tasks finished/failed/got killed, they are still considered 
> as pending and count towards the calculation of number of needed executors.
> h3. Symptom
> In one of our production job (where it's running 4 tasks per executor), we 
> found that it was holding 6 executors at the end with only 2 tasks running (1 
> speculative). With more logging enabled, we found the job printed:
> {code:java}
> pendingTasks is 0 pendingSpeculativeTasks is 17 totalRunningTasks is 2
> {code}
>  while the job only had 1 speculative task running and 16 speculative tasks 
> intentionally killed because of corresponding original tasks had finished.
> h3. The Bug
> Upon examining the code of _pendingSpeculativeTasks_: 
> {code:java}
> stageAttemptToNumSpeculativeTasks.map { case (stageAttempt, numTasks) =>
>   numTasks - 
> stageAttemptToSpeculativeTaskIndices.get(stageAttempt).map(_.size).getOrElse(0)
> }.sum
> {code}
> where _stageAttemptToNumSpeculativeTasks(stageAttempt)_ is incremented on 
> _onSpeculativeTaskSubmitted_, but never decremented.  
> _stageAttemptToNumSpeculativeTasks -= stageAttempt_ is performed on stage 
> completion. *This means Spark is marking ended speculative tasks as pending, 
> which leads to Spark to hold more executors that it actually needs!*
> I will have a PR ready to fix this issue, along with SPARK-2840 too
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30511) Spark marks ended speculative tasks as pending leads to holding idle executors

2020-01-14 Thread Zebing Lin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zebing Lin updated SPARK-30511:
---
Description: 
*TL;DR*
 When speculative tasks finished/failed/got killed, they are still considered 
as pending and count towards the calculation of number of needed executors.
h3. Symptom

In one of our production job (where it's running 4 tasks per executor), we 
found that it was holding 6 executors at the end with only 2 tasks running (1 
speculative). With more logging enabled, we found the job printed:
{code:java}
pendingTasks is 0 pendingSpeculativeTasks is 17 totalRunningTasks is 2
{code}
 while the job only had 1 speculative task running and 16 speculative tasks 
intentionally killed because of corresponding original tasks had finished.
h3. The Bug

Upon examining the code of _pendingSpeculativeTasks_: 
{code:java}
stageAttemptToNumSpeculativeTasks.map { case (stageAttempt, numTasks) =>
  numTasks - 
stageAttemptToSpeculativeTaskIndices.get(stageAttempt).map(_.size).getOrElse(0)
}.sum
{code}
where _stageAttemptToNumSpeculativeTasks(stageAttempt)_ is incremented on 
_onSpeculativeTaskSubmitted_, but never decremented.  
_stageAttemptToNumSpeculativeTasks -= stageAttempt_ is performed on stage 
completion. *This means Spark is marking ended speculative tasks as pending, 
which leads to Spark to hold more executors that it actually needs!*

I will have a PR ready to fix this issue, along with SPARK-2840 too

 

 

 

  was:
*TL;DR*
When speculative tasks finished/failed/got killed, they are still considered as 
pending and count towards the calculation of number of needed executors.
h3. Symptom

In one of our production job (where it's running 4 tasks per executor), we 
found that it was holding 6 executors at the end with only 2 tasks running (1 
speculative). With more logging enabled, we found the job printed:

 
{code:java}
pendingTasks is 0 pendingSpeculativeTasks is 17 totalRunningTasks is 2

{code}
 

while the job only had 1 speculative task running and 16 speculative tasks 
intentionally killed because of corresponding original tasks had finished.
h3. The Bug

Upon examining the code of _pendingSpeculativeTasks_:

 
{code:java}
stageAttemptToNumSpeculativeTasks.map { case (stageAttempt, numTasks) =>
  numTasks - 
stageAttemptToSpeculativeTaskIndices.get(stageAttempt).map(_.size).getOrElse(0)
}.sum
{code}
where _stageAttemptToNumSpeculativeTasks(stageAttempt)_ is incremented on 
_onSpeculativeTaskSubmitted_, but never decremented.  
_stageAttemptToNumSpeculativeTasks -= stageAttempt_ is performed on stage 
completion. *This means Spark is marking ended speculative tasks as pending, 
which leads to Spark to hold more executors that it actually needs!*

 

I will have a PR ready to fix this issue, along with SPARK-2840 too

 

 

 


> Spark marks ended speculative tasks as pending leads to holding idle executors
> --
>
> Key: SPARK-30511
> URL: https://issues.apache.org/jira/browse/SPARK-30511
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 2.3.0
>Reporter: Zebing Lin
>Priority: Major
>
> *TL;DR*
>  When speculative tasks finished/failed/got killed, they are still considered 
> as pending and count towards the calculation of number of needed executors.
> h3. Symptom
> In one of our production job (where it's running 4 tasks per executor), we 
> found that it was holding 6 executors at the end with only 2 tasks running (1 
> speculative). With more logging enabled, we found the job printed:
> {code:java}
> pendingTasks is 0 pendingSpeculativeTasks is 17 totalRunningTasks is 2
> {code}
>  while the job only had 1 speculative task running and 16 speculative tasks 
> intentionally killed because of corresponding original tasks had finished.
> h3. The Bug
> Upon examining the code of _pendingSpeculativeTasks_: 
> {code:java}
> stageAttemptToNumSpeculativeTasks.map { case (stageAttempt, numTasks) =>
>   numTasks - 
> stageAttemptToSpeculativeTaskIndices.get(stageAttempt).map(_.size).getOrElse(0)
> }.sum
> {code}
> where _stageAttemptToNumSpeculativeTasks(stageAttempt)_ is incremented on 
> _onSpeculativeTaskSubmitted_, but never decremented.  
> _stageAttemptToNumSpeculativeTasks -= stageAttempt_ is performed on stage 
> completion. *This means Spark is marking ended speculative tasks as pending, 
> which leads to Spark to hold more executors that it actually needs!*
> I will have a PR ready to fix this issue, along with SPARK-2840 too
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30511) Spark marks ended speculative tasks as pending leads to holding idle executors

2020-01-14 Thread Zebing Lin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zebing Lin updated SPARK-30511:
---
External issue ID: SPARK-2840

> Spark marks ended speculative tasks as pending leads to holding idle executors
> --
>
> Key: SPARK-30511
> URL: https://issues.apache.org/jira/browse/SPARK-30511
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 2.3.0
>Reporter: Zebing Lin
>Priority: Major
>
> *TL;DR*
>  When speculative tasks finished/failed/got killed, they are still considered 
> as pending and count towards the calculation of number of needed executors.
> h3. Symptom
> In one of our production job (where it's running 4 tasks per executor), we 
> found that it was holding 6 executors at the end with only 2 tasks running (1 
> speculative). With more logging enabled, we found the job printed:
> {code:java}
> pendingTasks is 0 pendingSpeculativeTasks is 17 totalRunningTasks is 2
> {code}
>  while the job only had 1 speculative task running and 16 speculative tasks 
> intentionally killed because of corresponding original tasks had finished.
> h3. The Bug
> Upon examining the code of _pendingSpeculativeTasks_: 
> {code:java}
> stageAttemptToNumSpeculativeTasks.map { case (stageAttempt, numTasks) =>
>   numTasks - 
> stageAttemptToSpeculativeTaskIndices.get(stageAttempt).map(_.size).getOrElse(0)
> }.sum
> {code}
> where _stageAttemptToNumSpeculativeTasks(stageAttempt)_ is incremented on 
> _onSpeculativeTaskSubmitted_, but never decremented.  
> _stageAttemptToNumSpeculativeTasks -= stageAttempt_ is performed on stage 
> completion. *This means Spark is marking ended speculative tasks as pending, 
> which leads to Spark to hold more executors that it actually needs!*
> I will have a PR ready to fix this issue, along with SPARK-2840 too
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30511) Spark marks ended speculative tasks as pending leads to holding idle executors

2020-01-14 Thread Zebing Lin (Jira)
Zebing Lin created SPARK-30511:
--

 Summary: Spark marks ended speculative tasks as pending leads to 
holding idle executors
 Key: SPARK-30511
 URL: https://issues.apache.org/jira/browse/SPARK-30511
 Project: Spark
  Issue Type: Bug
  Components: Scheduler
Affects Versions: 2.3.0
Reporter: Zebing Lin


*TL;DR*
When speculative tasks finished/failed/got killed, they are still considered as 
pending and count towards the calculation of number of needed executors.
h3. Symptom

In one of our production job (where it's running 4 tasks per executor), we 
found that it was holding 6 executors at the end with only 2 tasks running (1 
speculative). With more logging enabled, we found the job printed:

 
{code:java}
pendingTasks is 0 pendingSpeculativeTasks is 17 totalRunningTasks is 2

{code}
 

while the job only had 1 speculative task running and 16 speculative tasks 
intentionally killed because of corresponding original tasks had finished.
h3. The Bug

Upon examining the code of _pendingSpeculativeTasks_:

 
{code:java}
stageAttemptToNumSpeculativeTasks.map { case (stageAttempt, numTasks) =>
  numTasks - 
stageAttemptToSpeculativeTaskIndices.get(stageAttempt).map(_.size).getOrElse(0)
}.sum
{code}
where _stageAttemptToNumSpeculativeTasks(stageAttempt)_ is incremented on 
_onSpeculativeTaskSubmitted_, but never decremented.  
_stageAttemptToNumSpeculativeTasks -= stageAttempt_ is performed on stage 
completion. *This means Spark is marking ended speculative tasks as pending, 
which leads to Spark to hold more executors that it actually needs!*

 

I will have a PR ready to fix this issue, along with SPARK-2840 too

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org