[ https://issues.apache.org/jira/browse/SPARK-33418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
dingbei updated SPARK-33418: ---------------------------- Attachment: (was: 1.png) > TaskSchedulerImpl: Check pending tasks in advance when resource offers > ---------------------------------------------------------------------- > > Key: SPARK-33418 > URL: https://issues.apache.org/jira/browse/SPARK-33418 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Affects Versions: 3.0.1 > Reporter: dingbei > Priority: Major > Attachments: 2.png, 3.jpg, 4.jpg, 5.png, 6.jpg > > > It begins with the needs to start a lot of spark streaming receivers . *The > launch time gets super long when it comes to more than 300 receivers.* I will > show tests data I did and how I improved this. > *Tests preparation* > There are two cores exists in every executors.(one for receiver and the other > one to process every batch of datas). I observed launch time of all receivers > through spark web UI (Total Uptime when the last receiver started). > *Tests and data* > At first, we set the number of executors to 200 which means to start 200 > receivers and everything goes well. It takes about 50s to launch all > receivers.({color:#ff0000}pic 1{color}) > Then we set the number of executors to 500 which means to start 500 > receivers. The launch time became around 5 mins.({color:#ff0000}pic 2{color}) > *Dig into souce code* > Then I start to look for the reason in the source code. I use Thread dump to > check which methods takes relatively long time.({color:#ff0000}pic 3{color}) > Then I type logs between these methods. At last I find that the loop in > {color:#00875a}TaskSchedulerImpl.resourceOffers{color} will executes more > than 600000.({color:#ff0000}pic 4{color}) > *Explaination and Solution* > The loop in TaskSchedulerImpl.resourceOffers will iterate all none-zombie > TaskSetManagers in a queue of Pool. Normally the size of this queue is not so > big because it gets removed when all of its tasks is done. But for spark > streaming jobs, we all konw receivers will be wrapped as a non-stop job > ,which means its TaskSetManager will exists in the queue all the time until > the application is finished. For example, when it start to launch the 10th > receiver ,the size of the queue is 10 ,so it will iterates 10 times and when > it starts to launch the 500th receiver, it will iterate 500 times . However > 499 of the iteration are not necessay ,their task is already on running . > When I digged deep into the code. I find that it decides whether a > TaskSetManagers still has pending tasks left in > {color:#00875a}TaskSetManagers .dequeueTaskFromList{color}({color:#ff0000}pic > 5{color}) which is far away form the loop in > {color:#00875a}TaskSchedulerImpl.resourceOffers{color}. So I move the pending > tasks code ahead to the loop in > {color:#00875a}TaskSchedulerImpl.resourceOffers{color}.({color:#ff0000}pic > 6{color}) ,and I also consided the speculation mode. > *conclusion* > I think the spark contributors haven't thought a scenario where a lot of job > are running at the same time which I know is unusual but still a good > complement。We managed to reduce the launch time of all receivers to around > 50s stablely (500 receivers). -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org