[jira] [Updated] (SPARK-33418) TaskSchedulerImpl: Check pending tasks in advance when resource offers

dingbei (Jira) Wed, 11 Nov 2020 00:56:56 -0800


     [ 
https://issues.apache.org/jira/browse/SPARK-33418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


dingbei updated SPARK-33418:
----------------------------
    Description: 
It begins with the needs to  start a lot of spark streaming receivers .  *The 
launch time gets super long when it comes to more than 300 receivers.* I will 
show tests data I did and how I improved this.

*Tests preparation*

There are two cores exists in every executors.(one for receiver and the other 
one to process every batch of datas). I observed launch time of all receivers 
through  spark web UI (Total Uptime when the last receiver started).

*Tests and data*

At first, we set the number of executors to 200 which means to start 200 
receivers and everything goes well. It takes about 50s to launch all 
receivers.({color:#ff0000}pic 1{color})

Then we set the number of executors to 500 which means to start 500 receivers. 
The launch time became around 5 mins.({color:#ff0000}pic 2{color})

 *Dig into souce code*

Then I start to look for the reason in the source code.  I use Thread dump to 
check which methods takes relatively long time.({color:#ff0000}pic 3{color}) 
Then I type logs between these methods. At last I find that the loop in 
{color:#00875a}TaskSchedulerImpl.resourceOffers{color} will executes more than 
600000.({color:#ff0000}pic 4{color})

*Explaination and Solution*

The loop in TaskSchedulerImpl.resourceOffers will iterate all none-zombie 
TaskSetManagers in a queue of Pool. Normally the size of this queue is not so 
big because it gets removed when all of its tasks is done. But for spark 
streaming jobs, we all konw receivers will be wrapped as a non-stop job ,which 
means its TaskSetManager will exists  in the queue all the time until the 
application is finished. For example, when it start to launch the 10th receiver 
,the size of the queue is 10 ,so it will iterates 10 times and when it starts 
to launch the 500th receiver, it will iterate 500 times . However 499 of the 
iteration are not necessay ,their task is already on running .  

When I digged deep into the code. I find that it decides whether a 
TaskSetManagers still has pending tasks left in {color:#00875a}TaskSetManagers 
.dequeueTaskFromList{color}({color:#ff0000}pic 5{color}) which is far away form 
the loop in {color:#00875a}TaskSchedulerImpl.resourceOffers{color}. So I move 
the pending tasks code ahead to  the loop in 
{color:#00875a}TaskSchedulerImpl.resourceOffers{color}.({color:#ff0000}pic 
6{color}) ,and I also consided the speculation mode.

*conclusion*

I think the spark contributors haven't thought a scenario where a lot of job 
are running at the same time which I know is unusual but still  a good 
complement。We managed to reduce the launch time of all receivers to around 50s 
stablely (500 receivers).

  was:
It begins with the needs to  start a lot of spark streaming receivers .  *The 
launch time gets super long when it comes to more than 300 receivers.* I will 
show tests data I did and how I improved this.

*Tests preparation*

There are two cores exists in every executors.(one for receiver and the other 
one to process every batch of datas). I observed launch time of all receivers 
through  spark web UI (Total Uptime when the last receiver started).

*Tests and data*

At first, we set the number of executors to 200 which means to start 200 
receivers and everything goes well. It takes about 50s to launch all 
receivers.({color:#ff0000}pic 1{color})

Then we set the number of executors to 500 which means to start 500 receivers. 
The launch time became around 5 mins.({color:#ff0000}pic 2{color})

 *Dig into souce code*

Then I start to look for the reason in the source code.  I use Thread dump to 
check which methods takes relatively long time.({color:#ff0000}pic 3{color}) 
Then I type logs between these methods. At last I find that the loop in 
{color:#00875a}TaskSchedulerImpl.resourceOffers{color} will executes more than 
600000.({color:#ff0000}pic 4{color})

*Explaination and Solution*

The loop in TaskSchedulerImpl.resourceOffers will iterate all none-zombie 
TaskSetManagers in a queue of Pool. Normally the size of this queue is not so 
big because it gets removed when all of its tasks is done. But for spark 
streaming jobs, we all konw receivers will be wrapped as a non-stop job ,which 
means its TaskSetManager will exists  in the queue all the time until the 
application is finished. For example, when it start to launch the 10th receiver 
,the size of the queue is 10 ,so it will iterates 10 times and when it starts 
to launch the 500th receiver, it will iterate 500 times . However 499 of the 
iteration are not necessay ,their task is already on running .  

When I digged deep into the code. I find that it decides whether a 
TaskSetManagers still has pending tasks left in {color:#00875a}TaskSetManagers 
.dequeueTaskFromList{color}({color:#ff0000}pic 5{color}) which is far away form 
the loop in {color:#00875a}TaskSchedulerImpl.resourceOffers{color}. So I move 
the pending tasks code ahead to  the loop in 
TaskSchedulerImpl.resourceOffers.({color:#ff0000}pic 6{color}) ,and I also 
consided the speculation mode.

*conclusion*

**I think the spark contributors haven't thought a scenario where a lot of job 
are running at the same time which I know is unusual but still  a good 
complement。We managed to reduce the launch time of all receivers to around 50s 
(500 receivers).


> TaskSchedulerImpl: Check pending tasks in advance when resource offers
> ----------------------------------------------------------------------
>
>                 Key: SPARK-33418
>                 URL: https://issues.apache.org/jira/browse/SPARK-33418
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 3.0.1
>            Reporter: dingbei
>            Priority: Major
>
> It begins with the needs to  start a lot of spark streaming receivers .  *The 
> launch time gets super long when it comes to more than 300 receivers.* I will 
> show tests data I did and how I improved this.
> *Tests preparation*
> There are two cores exists in every executors.(one for receiver and the other 
> one to process every batch of datas). I observed launch time of all receivers 
> through  spark web UI (Total Uptime when the last receiver started).
> *Tests and data*
> At first, we set the number of executors to 200 which means to start 200 
> receivers and everything goes well. It takes about 50s to launch all 
> receivers.({color:#ff0000}pic 1{color})
> Then we set the number of executors to 500 which means to start 500 
> receivers. The launch time became around 5 mins.({color:#ff0000}pic 2{color})
>  *Dig into souce code*
> Then I start to look for the reason in the source code.  I use Thread dump to 
> check which methods takes relatively long time.({color:#ff0000}pic 3{color}) 
> Then I type logs between these methods. At last I find that the loop in 
> {color:#00875a}TaskSchedulerImpl.resourceOffers{color} will executes more 
> than 600000.({color:#ff0000}pic 4{color})
> *Explaination and Solution*
> The loop in TaskSchedulerImpl.resourceOffers will iterate all none-zombie 
> TaskSetManagers in a queue of Pool. Normally the size of this queue is not so 
> big because it gets removed when all of its tasks is done. But for spark 
> streaming jobs, we all konw receivers will be wrapped as a non-stop job 
> ,which means its TaskSetManager will exists  in the queue all the time until 
> the application is finished. For example, when it start to launch the 10th 
> receiver ,the size of the queue is 10 ,so it will iterates 10 times and when 
> it starts to launch the 500th receiver, it will iterate 500 times . However 
> 499 of the iteration are not necessay ,their task is already on running .  
> When I digged deep into the code. I find that it decides whether a 
> TaskSetManagers still has pending tasks left in 
> {color:#00875a}TaskSetManagers .dequeueTaskFromList{color}({color:#ff0000}pic 
> 5{color}) which is far away form the loop in 
> {color:#00875a}TaskSchedulerImpl.resourceOffers{color}. So I move the pending 
> tasks code ahead to  the loop in 
> {color:#00875a}TaskSchedulerImpl.resourceOffers{color}.({color:#ff0000}pic 
> 6{color}) ,and I also consided the speculation mode.
> *conclusion*
> I think the spark contributors haven't thought a scenario where a lot of job 
> are running at the same time which I know is unusual but still  a good 
> complement。We managed to reduce the launch time of all receivers to around 
> 50s stablely (500 receivers).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33418) TaskSchedulerImpl: Check pending tasks in advance when resource offers

Reply via email to