[ 
https://issues.apache.org/jira/browse/SPARK-33418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dingbei updated SPARK-33418:
----------------------------
    Attachment:     (was: 4.jpg)

> TaskSchedulerImpl: Check pending tasks in advance when resource offers
> ----------------------------------------------------------------------
>
>                 Key: SPARK-33418
>                 URL: https://issues.apache.org/jira/browse/SPARK-33418
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 3.0.1
>            Reporter: dingbei
>            Priority: Major
>
> It begins with the needs to  start a lot of spark streaming receivers .  *The 
> launch time gets super long when it comes to more than 300 receivers.* I will 
> show tests data I did and how I improved this.
> *Tests preparation*
> There are two cores exists in every executors.(one for receiver and the other 
> one to process every batch of datas). I observed launch time of all receivers 
> through  spark web UI (Total Uptime when the last receiver started).
> *Tests and data*
> At first, we set the number of executors to 200 which means to start 200 
> receivers and everything goes well. It takes about 50s to launch all 
> receivers.({color:#ff0000}pic 1{color})
> Then we set the number of executors to 500 which means to start 500 
> receivers. The launch time became around 5 mins.({color:#ff0000}pic 2{color})
>  *Dig into souce code*
> Then I start to look for the reason in the source code.  I use Thread dump to 
> check which methods takes relatively long time.({color:#ff0000}pic 3{color}) 
> Then I type logs between these methods. At last I find that the loop in 
> {color:#00875a}TaskSchedulerImpl.resourceOffers{color} will executes more 
> than 600000.({color:#ff0000}pic 4{color})
> *Explaination and Solution*
> The loop in TaskSchedulerImpl.resourceOffers will iterate all none-zombie 
> TaskSetManagers in a queue of Pool. Normally the size of this queue is not so 
> big because it gets removed when all of its tasks is done. But for spark 
> streaming jobs, we all konw receivers will be wrapped as a non-stop job 
> ,which means its TaskSetManager will exists  in the queue all the time until 
> the application is finished. For example, when it start to launch the 10th 
> receiver ,the size of the queue is 10 ,so it will iterates 10 times and when 
> it starts to launch the 500th receiver, it will iterate 500 times . However 
> 499 of the iteration are not necessay ,their task is already on running .  
> When I digged deep into the code. I find that it decides whether a 
> TaskSetManagers still has pending tasks left in 
> {color:#00875a}TaskSetManagers .dequeueTaskFromList{color}({color:#ff0000}pic 
> 5{color}) which is far away form the loop in 
> {color:#00875a}TaskSchedulerImpl.resourceOffers{color}. So I move the pending 
> tasks code ahead to  the loop in 
> {color:#00875a}TaskSchedulerImpl.resourceOffers{color}.({color:#ff0000}pic 
> 6{color}) ,and I also consided the speculation mode.
> *conclusion*
> I think the spark contributors haven't thought a scenario where a lot of job 
> are running at the same time which I know is unusual but still  a good 
> complement。We managed to reduce the launch time of all receivers to around 
> 50s stablely (500 receivers).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to