Hey Tobias, As you suspect, the reason why it's slow is because the resource manager in YARN takes a while to grant resources. This is because YARN needs to first set up the application master container, and then this AM needs to request more containers for Spark executors. I think this accounts for most of the overhead. The remaining source probably comes from how our own YARN integration code polls application (every second) and cluster resource states (every 5 seconds IIRC). I haven't explored in detail whether there are optimizations there that can speed this up, but I believe most of the overhead comes from YARN itself.
In other words, no I don't know of any quick fix on your end that you can do to speed this up. -Andrew 2014-12-03 20:10 GMT-08:00 Tobias Pfeiffer <t...@preferred.jp>: > Hi, > > I am using spark-submit to submit my application to YARN in "yarn-cluster" > mode. I have both the Spark assembly jar file as well as my application jar > file put in HDFS and can see from the logging output that both files are > used from there. However, it still takes about 10 seconds for my > application's yarnAppState to switch from ACCEPTED to RUNNING. > > I am aware that this is probably not a Spark issue, but some YARN > configuration setting (or YARN-inherent slowness), I was just wondering if > anyone has an advice for how to speed this up. > > Thanks > Tobias >