I was recently doing some research into Spark on YARN's startup time and observed slow, synchronous allocation of containers/executors. I am testing on a 4 node bare metal cluster w/48 cores and 128GB memory per node. YARN was only allocating about 3 containers per second. Moreover when starting 3 Spark applications at the same time with each requesting 44 containers, the first application would get all 44 requested containers and then the next application would start getting containers and so on.
>From looking at the code, it appears this is by design. There is an undocumented configuration variable that will enable asynchronous allocation of containers. I'm sure I'm missing something, but why is this not the default? Is there a bug or race condition in this code path? I've done some testing with it and it's been working and is significantly faster. Here's the config: `yarn.scheduler.capacity.schedule-asynchronously.enable` I created a JIRA ticket in YARN's project, but I am curious if anyone else has experience similar issues or have tested this configuration extensively. YARN-7327 <https://issues.apache.org/jira/browse/YARN-7327> -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org