Well it is a little bit difficult to say, because a lot of things are mixing up here. What function is calculated? Does it need a lot of memory? Could it be that you run out of memory and some spillover happens and you have a lot of IO to disk which is blocking?
Related to that could be 1 executor 40 cores. How much memory does it have and need? I would not put Kafka+ZK on the server where the driver is running. A different Spark version - that may depend on what are the answers to the questions above. > Am 11.03.2019 um 07:40 schrieb Hough, Stephen C > <stephenc.ho...@sc.com.invalid>: > > Spark Version: 2.0.2 > > I am running a cluster with 400 workers where each worker has 1 executor > configured with 40 cores for a total capacity of 16,000 cores. > I run about 10,000 jobs with 1.5M tasks where the job is a simple > spark.parallelize(list, list.size()).map(func).collectAsysnc(). The job > contains a list of tasks ranging from 1 to 2000. > ‘func’ will run our code which will do computations then post events over > Kafka and return a Boolean, we ignore the result. The duration of this func > can be from seconds up to 20 mins. > > The application context is launched on a driver server with 32 cores and the > only other services running on the box is a Kafka broker and zookeeper > service. > > This particular batch in development environment took around 2 hours to run > which met our SLA however when we ran in production it took 3 hours to > complete, we thought it may have been due to another cluster we were running > with around 300 workers however AWS informed us that the networks were > isolated. I moved the job to run later after the other clusters batch had > finished and the time reduced back down to 2 hrs. > I analyzed our logs and it shows that a yet to be determined incident > @22:02:42 caused Spark to ‘go slow’. > > By capturing the duration from the executor thread message ‘Finished task’ I > tracked the TID seen by the task result getter to determine duration until > the result is processed on the driver and a core is freed for the scheduler. > > For the most part it is within a reasonable range of 10ms then suddenly at > the given incident time it suddenly rises to 5s, 20s, 587s, peaking at 32m > only 8 mins after the incident. So it takes 32 mins from the time the result > was sent back to spark driver to the time it is processed which explains the > performance hit because during this time the freed cores on the worker go > idle waiting for a new task. Note I did track the time I saw our Kafka event > sent by this task and we saw it roughly 2ms later on the driver so the > results are getting to server over the network okay. > > Looking at the rpc code it became apparent to me that if we start to see a > build of messages the dispatcher should turn single-threaded as it processes > the backlog, so I did another scan of the driver logs to look for long > running dispatcher threads, i.e. a dispatcher that processes more than 1 > consecutive message. A very obvious issue became apparent. > > Dispatcher: 23 started 22:02:42:647 processed 80386 consecutive messages for > a duration of 53 minutes. > > If one looks at the beginning of these messages it is obvious that a slowdown > occurs, the first 3 are within millis of each other, then a suspicious 100ms > delay starts happening. > > 04-03-19 22:02:43:032 [INFO] [dispatcher-event-loop-23] > o.a.s.s.c.CoarseGrainedSchedulerBackend$DriverEndpoint - Launching task > 1418419 on executor id: 1048 hostname: 10.9.141.180 > 04-03-19 22:02:43:034 [INFO] [dispatcher-event-loop-23] > o.a.s.s.c.CoarseGrainedSchedulerBackend$DriverEndpoint - Launching task > 1418420 on executor id: 967 hostname: 10.9.134.69 > 04-03-19 22:02:43:037 [INFO] [dispatcher-event-loop-23] > o.a.s.s.c.CoarseGrainedSchedulerBackend$DriverEndpoint - Launching task > 1418421 on executor id: 791 hostname: 10.9.139.73 > 04-03-19 22:02:43:136 [INFO] [dispatcher-event-loop-23] > o.a.s.s.c.CoarseGrainedSchedulerBackend$DriverEndpoint - Launching task > 1418422 on executor id: 941 hostname: 10.9.142.127 > 04-03-19 22:02:43:234 [INFO] [dispatcher-event-loop-23] > o.a.s.s.c.CoarseGrainedSchedulerBackend$DriverEndpoint - Launching task > 1418423 on executor id: 1085 hostname: 10.9.142.23 > 04-03-19 22:02:43:348 [INFO] [dispatcher-event-loop-23] > o.a.s.s.c.CoarseGrainedSchedulerBackend$DriverEndpoint - Launching task > 1418424 on executor id: 944 hostname: 10.9.141.65 > > Unfortunately I can’t turn on any more extra logging for the DriverEndpoint > ‘StatusUpdate’ handler however at a guess I would say the launchTasks, > executorData.executorEndpoint.send operation is introducing some sort of > blocking which causes a backlog that takes time to process. > > When the system is running okay we don’t see this behaviour. > > Q, Have you guys seen this behaviour before, and if so would an update to > Spark 2.4 do the trick. > > If not are there any extra logging or debugging I can do to track down what > the external event may be that is introducing the delay. Given the volume of > tasks I can only really analyze logs. > > > > > This email and any attachments are confidential and may also be privileged. > If you are not the intended recipient, please delete all copies and notify > the sender immediately. You may wish to refer to the incorporation details of > Standard Chartered PLC, Standard Chartered Bank and their subsidiaries at > https://www.sc.com/en/our-locations. Please refer to > https://www.sc.com/en/privacy-policy/ for Standard Chartered Bank’s Privacy > Policy.