we also launch jobs programmatically, both on standalone mode and yarn-client mode. in standalone mode it always worked, in yarn-client mode we ran into some issues and were forced to use spark-submit, but i still have on my todo list to move back to a normal java launch without spark-submit at some point. for me spark is a library that i use to do distributed computations within my app, and ideally a library should not tell me how to launch my app. i mean, if multiple libraries that i use all had their own launch script i would get stuck very quickly.... hadoop jar vs spark-submit vs kiji launch vs hbase jar.... bad idea, i think!
however i do understand the practical reasons why spark-submit can about... On Thu, May 21, 2015 at 10:30 PM, Nathan Kronenfeld < nkronenfeld@uncharted.software> wrote: > > >> In researching and discussing these issues with Cloudera and others, >> we've been told that only one mechanism is supported for starting Spark >> jobs: the *spark-submit* scripts. >> > > Is this new? We've been submitting jobs directly from a programatically > created spark context (instead of through spark-submit) from the beginning > (from 0.7.x to 1.2) - to a local cluster. > > In moving to 1.3 on Yarn cluster recently, we've had no end of problems > trying to switch this over (though I think we're almost there). > > Why would one want to eliminate this possibility? > > -Nathan > >