Hi Tobias, What version are you using? In some recent versions, we had a couple of large hardcoded sleeps on the Spark side.
-Sandy On Fri, Dec 5, 2014 at 11:15 AM, Andrew Or <and...@databricks.com> wrote: > Hey Tobias, > > As you suspect, the reason why it's slow is because the resource manager > in YARN takes a while to grant resources. This is because YARN needs to > first set up the application master container, and then this AM needs to > request more containers for Spark executors. I think this accounts for most > of the overhead. The remaining source probably comes from how our own YARN > integration code polls application (every second) and cluster resource > states (every 5 seconds IIRC). I haven't explored in detail whether there > are optimizations there that can speed this up, but I believe most of the > overhead comes from YARN itself. > > In other words, no I don't know of any quick fix on your end that you can > do to speed this up. > > -Andrew > > > 2014-12-03 20:10 GMT-08:00 Tobias Pfeiffer <t...@preferred.jp>: > > Hi, >> >> I am using spark-submit to submit my application to YARN in >> "yarn-cluster" mode. I have both the Spark assembly jar file as well as my >> application jar file put in HDFS and can see from the logging output that >> both files are used from there. However, it still takes about 10 seconds >> for my application's yarnAppState to switch from ACCEPTED to RUNNING. >> >> I am aware that this is probably not a Spark issue, but some YARN >> configuration setting (or YARN-inherent slowness), I was just wondering if >> anyone has an advice for how to speed this up. >> >> Thanks >> Tobias >> > >