Well it is a little bit difficult to say, because a lot of things are mixing up 
here. What function is calculated? Does it need a lot of memory? Could it be 
that you run out of memory and some spillover happens and you have a lot of IO 
to disk which is blocking? 

Related to that could be 1 executor 40 cores. How much memory does it have and 
need?

I would not put Kafka+ZK on the server where the driver is running.

A different Spark version - that may depend on what are the answers to the 
questions above.

> Am 11.03.2019 um 07:40 schrieb Hough, Stephen C 
> <stephenc.ho...@sc.com.invalid>:
> 
> Spark Version: 2.0.2
>  
> I am running a cluster with 400 workers where each worker has 1 executor 
> configured with 40 cores for a total capacity of 16,000 cores.
> I run about 10,000 jobs with 1.5M tasks where the job is a simple 
> spark.parallelize(list, list.size()).map(func).collectAsysnc().  The job 
> contains a list of tasks ranging from 1 to 2000.
> ‘func’ will run our code which will do computations then post events over 
> Kafka and return a Boolean, we ignore the result.  The duration of this func 
> can be from seconds up to 20 mins.
>  
> The application context is launched on a driver server with 32 cores and the 
> only other services running on the box is a Kafka broker and zookeeper 
> service.
>  
> This particular batch in development environment took around 2 hours to run 
> which met our SLA however when we ran in production it took 3 hours to 
> complete, we thought it may have been due to another cluster we were running 
> with around 300 workers however AWS informed us that the networks were 
> isolated.  I moved the job to run later after the other clusters batch had 
> finished and the time reduced back down to 2 hrs.
> I analyzed our logs and it shows that a yet to be determined incident 
> @22:02:42 caused Spark to ‘go slow’.
>  
> By capturing the duration from the executor thread message ‘Finished task’ I 
> tracked the TID seen by the task result getter to determine duration until 
> the result is processed on the driver and a core is freed for the scheduler.
>  
> For the most part it is within a reasonable range of 10ms then suddenly at 
> the given incident time it suddenly rises to 5s, 20s, 587s, peaking at 32m 
> only 8 mins after the incident.  So it takes 32 mins from the time the result 
> was sent back to spark driver to the time it is processed which explains the 
> performance hit because during this time the freed cores on the worker go 
> idle waiting for a new task.  Note I did track the time I saw our Kafka event 
> sent by this task and we saw it roughly 2ms later on the driver so the 
> results are getting to server over the network okay.
>  
> Looking at the rpc code it became apparent to me that if we start to see a 
> build of messages the dispatcher should turn single-threaded as it processes 
> the backlog, so I did another scan of the driver logs to look for long 
> running dispatcher threads, i.e. a dispatcher that processes more than 1 
> consecutive message.  A very obvious issue became apparent.
>  
> Dispatcher: 23 started 22:02:42:647 processed 80386 consecutive messages for 
> a duration of 53 minutes.
>  
> If one looks at the beginning of these messages it is obvious that a slowdown 
> occurs, the first 3 are within millis of each other, then a suspicious 100ms 
> delay starts happening.
>  
> 04-03-19 22:02:43:032 [INFO] [dispatcher-event-loop-23] 
> o.a.s.s.c.CoarseGrainedSchedulerBackend$DriverEndpoint - Launching task 
> 1418419 on executor id: 1048 hostname: 10.9.141.180
> 04-03-19 22:02:43:034 [INFO] [dispatcher-event-loop-23] 
> o.a.s.s.c.CoarseGrainedSchedulerBackend$DriverEndpoint - Launching task 
> 1418420 on executor id: 967 hostname: 10.9.134.69
> 04-03-19 22:02:43:037 [INFO] [dispatcher-event-loop-23] 
> o.a.s.s.c.CoarseGrainedSchedulerBackend$DriverEndpoint - Launching task 
> 1418421 on executor id: 791 hostname: 10.9.139.73
> 04-03-19 22:02:43:136 [INFO] [dispatcher-event-loop-23] 
> o.a.s.s.c.CoarseGrainedSchedulerBackend$DriverEndpoint - Launching task 
> 1418422 on executor id: 941 hostname: 10.9.142.127
> 04-03-19 22:02:43:234 [INFO] [dispatcher-event-loop-23] 
> o.a.s.s.c.CoarseGrainedSchedulerBackend$DriverEndpoint - Launching task 
> 1418423 on executor id: 1085 hostname: 10.9.142.23
> 04-03-19 22:02:43:348 [INFO] [dispatcher-event-loop-23] 
> o.a.s.s.c.CoarseGrainedSchedulerBackend$DriverEndpoint - Launching task 
> 1418424 on executor id: 944 hostname: 10.9.141.65
>  
> Unfortunately I can’t turn on any more extra logging for the DriverEndpoint 
> ‘StatusUpdate’ handler however at a guess I would say the launchTasks, 
> executorData.executorEndpoint.send operation is introducing some sort of 
> blocking which causes a backlog that takes time to process.
>  
> When the system is running okay we don’t see this behaviour.
>  
> Q, Have you guys seen this behaviour before, and if so would an update to 
> Spark 2.4 do the trick.
>  
> If not are there any extra logging or debugging I can do to track down what 
> the external event may be that is introducing the delay.  Given the volume of 
> tasks I can only really analyze logs.
>  
>  
>  
> 
> This email and any attachments are confidential and may also be privileged. 
> If you are not the intended recipient, please delete all copies and notify 
> the sender immediately. You may wish to refer to the incorporation details of 
> Standard Chartered PLC, Standard Chartered Bank and their subsidiaries at 
> https://www.sc.com/en/our-locations. Please refer to 
> https://www.sc.com/en/privacy-policy/ for Standard Chartered Bank’s Privacy 
> Policy.

Reply via email to