I guess I need a bit more clarification, what kind of assumptions was the dispatcher making?
Tim On Thu, Sep 17, 2015 at 10:18 PM, Alan Braithwaite <a...@cloudflare.com> wrote: > Hi Tim, > > Thanks for the follow up. It's not so much that I expect the executor to > inherit the configuration of the dispatcher as I* don't *expect the > dispatcher to make assumptions about the system environment of the executor > (since it lives in a docker). I could potentially see a case where you > might want to explicitly forbid the defaults, but I can't think of any > right now. > > Otherwise, I'm confused as to why the defaults in the docker image for the > executor are just ignored. I suppose that it's the dispatchers job to > ensure the *exact* configuration of the executor, regardless of the > defaults set on the executors machine? Is that the assumption being made? > I can understand that in contexts which aren't docker driven since jobs > could be rolling out in the middle of a config update. Trying to think of > this outside the terms of just mesos/docker (since I'm fully aware that > docker doesn't rule the world yet). > > So I can see this from both perspectives now and passing in the properties > file will probably work just fine for me, but for my better understanding: > When the executor starts, will it read any of the environment that it's > executing in or will it just take only the properties given to it by the > dispatcher and nothing more? > > Lemme know if anything needs more clarification and thanks for your mesos > contribution to spark! > > - Alan > > On Thu, Sep 17, 2015 at 5:03 PM, Timothy Chen <t...@mesosphere.io> wrote: > >> Hi Alan, >> >> If I understand correctly, you are setting executor home when you launch >> the dispatcher and not on the configuration when you submit job, and expect >> it to inherit that configuration? >> >> When I worked on the dispatcher I was assuming all configuration is >> passed to the dispatcher to launch the job exactly how you will need to >> launch it with client mode. >> >> But indeed it shouldn't crash dispatcher, I'll take a closer look when I >> get a chance. >> >> Can you recommend changes on the documentation, either in email or a PR? >> >> Thanks! >> >> Tim >> >> Sent from my iPhone >> >> On Sep 17, 2015, at 12:29 PM, Alan Braithwaite <a...@cloudflare.com> >> wrote: >> >> Hey All, >> >> To bump this thread once again, I'm having some trouble using the >> dispatcher as well. >> >> I'm using Mesos Cluster Manager with Docker Executors. I've deployed the >> dispatcher as Marathon job. When I submit a job using spark submit, the >> dispatcher writes back that the submission was successful and then promptly >> dies in marathon. Looking at the logs reveals it was hitting the following >> line: >> >> 398: throw new SparkException("Executor Spark home >> `spark.mesos.executor.home` is not set!") >> >> Which is odd because it's set in multiple places (SPARK_HOME, >> spark.mesos.executor.home, spark.home, etc). Reading the code, it >> appears that the driver desc pulls only from the request and disregards any >> other properties that may be configured. Testing by passing --conf >> spark.mesos.executor.home=/usr/local/spark on the command line to >> spark-submit confirms this. We're trying to isolate the number of places >> where we have to set properties within spark and were hoping that it will >> be possible to have this pull in the spark-defaults.conf from somewhere, or >> at least allow the user to inform the dispatcher through spark-submit that >> those properties will be available once the job starts. >> >> Finally, I don't think the dispatcher should crash in this event. It >> seems not exceptional that a job is misconfigured when submitted. >> >> Please direct me on the right path if I'm headed in the wrong direction. >> Also let me know if I should open some tickets for these issues. >> >> Thanks, >> - Alan >> >> On Fri, Sep 11, 2015 at 1:05 PM, Tim Chen <t...@mesosphere.io> wrote: >> >>> Yes you can create an issue, or actually contribute a patch to update it >>> :) >>> >>> Sorry the docs is a bit light, I'm going to make it more complete along >>> the way. >>> >>> Tim >>> >>> >>> On Fri, Sep 11, 2015 at 11:11 AM, Tom Waterhouse (tomwater) < >>> tomwa...@cisco.com> wrote: >>> >>>> Tim, >>>> >>>> Thank you for the explanation. You are correct, my Mesos experience is >>>> very light, and I haven’t deployed anything via Marathon yet. What you >>>> have stated here makes sense, I will look into doing this. >>>> >>>> Adding this info to the docs would be great. Is the appropriate action >>>> to create an issue regarding improvement of the docs? For those of us who >>>> are gaining the experience having such a pointer is very helpful. >>>> >>>> Tom >>>> >>>> From: Tim Chen <t...@mesosphere.io> >>>> Date: Thursday, September 10, 2015 at 10:25 AM >>>> To: Tom Waterhouse <tomwa...@cisco.com> >>>> Cc: "user@spark.apache.org" <user@spark.apache.org> >>>> Subject: Re: Spark on Mesos with Jobs in Cluster Mode Documentation >>>> >>>> Hi Tom, >>>> >>>> Sorry the documentation isn't really rich, since it's probably assuming >>>> users understands how Mesos and framework works. >>>> >>>> First I need explain the rationale of why create the dispatcher. If >>>> you're not familiar with Mesos yet, each node in your datacenter is >>>> installed a Mesos slave where it's responsible for publishing resources and >>>> running/watching tasks, and Mesos master is responsible for taking the >>>> aggregated resources and scheduling them among frameworks. >>>> >>>> Frameworks are not managed by Mesos, as Mesos master/slave doesn't >>>> launch and maintain framework but assume they're launched and kept running >>>> on its own. All the existing frameworks in the ecosystem therefore all have >>>> their own ways to deploy, HA and persist state (e.g: Aurora, Marathon, >>>> etc). >>>> >>>> Therefore, to introduce cluster mode with Mesos, we must create a >>>> framework that is long running that can be running in your datacenter, and >>>> can handle launching spark drivers on demand and handle HA, etc. This is >>>> what the dispatcher is all about. >>>> >>>> So the idea is that you should launch the dispatcher not on the client, >>>> but on a machine in your datacenter. In Mesosphere's DCOS we launch all >>>> frameworks and long running services with Marathon, and you can use >>>> Marathon to launch the Spark dispatcher. >>>> >>>> Then all clients instead of specifying the Mesos master URL (e.g: >>>> mesos://mesos.master:2181), then just talks to the dispatcher only >>>> (mesos://spark-dispatcher.mesos:7077), and the dispatcher will then start >>>> and watch the driver for you. >>>> >>>> Tim >>>> >>>> >>>> >>>> On Thu, Sep 10, 2015 at 10:13 AM, Tom Waterhouse (tomwater) < >>>> tomwa...@cisco.com> wrote: >>>> >>>>> After spending most of yesterday scouring the Internet for sources of >>>>> documentation for submitting Spark jobs in cluster mode to a Spark cluster >>>>> managed by Mesos I was able to do just that, but I am not convinced that >>>>> how I have things setup is correct. >>>>> >>>>> I used the Mesos published >>>>> <https://open.mesosphere.com/getting-started/datacenter/install/> >>>>> instructions for setting up my Mesos cluster. I have three Zookeeper >>>>> instances, three Mesos master instances, and three Mesos slave instances. >>>>> This is all running in Openstack. >>>>> >>>>> The documentation on the Spark documentation site states that “To use >>>>> cluster mode, you must start the MesosClusterDispatcher in your cluster >>>>> via >>>>> the sbin/start-mesos-dispatcher.sh script, passing in the Mesos >>>>> master url (e.g: mesos://host:5050).” That is it, no more >>>>> information than that. So that is what I did: I have one machine that I >>>>> use as the Spark client for submitting jobs. I started the Mesos >>>>> dispatcher with script as described, and using the client machine’s IP >>>>> address and port as the target for the job submitted the job. >>>>> >>>>> The job is currently running in Mesos as expected. This is not >>>>> however how I would have expected to configure the system. As running >>>>> there is one instance of the Spark Mesos dispatcher running outside of >>>>> Mesos, so not a part of the sphere of Mesos resource management. >>>>> >>>>> I used the following Stack Overflow posts as guidelines: >>>>> http://stackoverflow.com/questions/31164725/spark-mesos-dispatcher >>>>> http://stackoverflow.com/questions/31294515/start-spark-via-mesos >>>>> >>>>> There must be better documentation on how to deploy Spark in Mesos >>>>> with jobs able to be deployed in cluster mode. >>>>> >>>>> I can follow up with more specific information regarding my >>>>> deployment if necessary. >>>>> >>>>> Tom >>>>> >>>> >>>> >>> >> >