Funny, someone from my team talked to me about that idea yesterday. We use SparkLauncher, but it just calls spark-submit that calls other scripts that starts a new Java program that tries to submit (in our case in cluster mode - driver is started in the Spark cluster) and exit. That make it a challenge to troubleshoot cases where submit fails, especially when users tries our app on their own spark environment. He hoped to get a more decent / specific exception if submit failed, or be able to debug it in an IDE (the actual calling to the master, its response etc).
Ofir Manor Co-Founder & CTO | Equalum Mobile: +972-54-7801286 | Email: ofir.ma...@equalum.io On Mon, Oct 10, 2016 at 9:13 PM, Russell Spitzer <russell.spit...@gmail.com> wrote: > Just folks who don't want to use spark-submit, no real use-cases I've seen > yet. > > I didn't know about SparkLauncher myself and I don't think there are any > official docs on that or launching spark as an embedded library for tests. > > On Mon, Oct 10, 2016 at 11:09 AM Matei Zaharia <matei.zaha...@gmail.com> > wrote: > >> What are the main use cases you've seen for this? Maybe we can add a page >> to the docs about how to launch Spark as an embedded library. >> >> Matei >> >> On Oct 10, 2016, at 10:21 AM, Russell Spitzer <russell.spit...@gmail.com> >> wrote: >> >> I actually had not seen SparkLauncher before, that looks pretty great :) >> >> On Mon, Oct 10, 2016 at 10:17 AM Russell Spitzer < >> russell.spit...@gmail.com> wrote: >> >>> I'm definitely only talking about non-embedded uses here as I also use >>> embedded Spark (cassandra, and kafka) to run tests. This is almost always >>> safe since everything is in the same JVM. It's only once we get to >>> launching against a real distributed env do we end up with issues. >>> >>> Since Pyspark uses spark submit in the java gateway i'm not sure if that >>> matters :) >>> >>> The cases I see are usually usually going through main directly, adding >>> jars programatically. >>> >>> Usually ends up with classpath errors (Spark not on the CP, their jar >>> not on the CP, dependencies not on the cp), >>> conf errors (executors have the incorrect environment, executor >>> classpath broken, not understanding spark-defaults won't do anything), >>> Jar version mismatches >>> Etc ... >>> >>> On Mon, Oct 10, 2016 at 10:05 AM Sean Owen <so...@cloudera.com> wrote: >>> >>>> I have also 'embedded' a Spark driver without much trouble. It isn't >>>> that it can't work. >>>> >>>> The Launcher API is ptobably the recommended way to do that though. >>>> spark-submit is the way to go for non programmatic access. >>>> >>>> If you're not doing one of those things and it is not working, yeah I >>>> think people would tell you you're on your own. I think that's consistent >>>> with all the JIRA discussions I have seen over time. >>>> >>>> >>>> On Mon, Oct 10, 2016, 17:33 Russell Spitzer <russell.spit...@gmail.com> >>>> wrote: >>>> >>>>> I've seen a variety of users attempting to work around using Spark >>>>> Submit with at best middling levels of success. I think it would be >>>>> helpful >>>>> if the project had a clear statement that submitting an application >>>>> without >>>>> using Spark Submit is truly for experts only or is unsupported entirely. >>>>> >>>>> I know this is a pretty strong stance and other people have had >>>>> different experiences than me so please let me know what you think :) >>>>> >>>> >>