Re: Spark on Mesos with Jobs in Cluster Mode Documentation

Alan Braithwaite Thu, 17 Sep 2015 22:20:07 -0700

Hi Tim,

Thanks for the follow up.  It's not so much that I expect the executor to
inherit the configuration of the dispatcher as I* don't *expect the
dispatcher to make assumptions about the system environment of the executor
(since it lives in a docker).  I could potentially see a case where you
might want to explicitly forbid the defaults, but I can't think of any
right now.


Otherwise, I'm confused as to why the defaults in the docker image for the
executor are just ignored.  I suppose that it's the dispatchers job to
ensure the *exact* configuration of the executor, regardless of the
defaults set on the executors machine?  Is that the assumption being made?
I can understand that in contexts which aren't docker driven since jobs
could be rolling out in the middle of a config update.  Trying to think of
this outside the terms of just mesos/docker (since I'm fully aware that
docker doesn't rule the world yet).

So I can see this from both perspectives now and passing in the properties
file will probably work just fine for me, but for my better understanding:
When the executor starts, will it read any of the environment that it's
executing in or will it just take only the properties given to it by the
dispatcher and nothing more?

Lemme know if anything needs more clarification and thanks for your mesos
contribution to spark!

- Alan

On Thu, Sep 17, 2015 at 5:03 PM, Timothy Chen <t...@mesosphere.io> wrote:

> Hi Alan,
>
> If I understand correctly, you are setting executor home when you launch
> the dispatcher and not on the configuration when you submit job, and expect
> it to inherit that configuration?
>
> When I worked on the dispatcher I was assuming all configuration is passed
> to the dispatcher to launch the job exactly how you will need to launch it
> with client mode.
>
> But indeed it shouldn't crash dispatcher, I'll take a closer look when I
> get a chance.
>
> Can you recommend changes on the documentation, either in email or a PR?
>
> Thanks!
>
> Tim
>
> Sent from my iPhone
>
> On Sep 17, 2015, at 12:29 PM, Alan Braithwaite <a...@cloudflare.com>
> wrote:
>
> Hey All,
>
> To bump this thread once again, I'm having some trouble using the
> dispatcher as well.
>
> I'm using Mesos Cluster Manager with Docker Executors.  I've deployed the
> dispatcher as Marathon job.  When I submit a job using spark submit, the
> dispatcher writes back that the submission was successful and then promptly
> dies in marathon.  Looking at the logs reveals it was hitting the following
> line:
>
> 398:          throw new SparkException("Executor Spark home
> `spark.mesos.executor.home` is not set!")
>
> Which is odd because it's set in multiple places (SPARK_HOME,
> spark.mesos.executor.home, spark.home, etc).  Reading the code, it
> appears that the driver desc pulls only from the request and disregards any
> other properties that may be configured.  Testing by passing --conf
> spark.mesos.executor.home=/usr/local/spark on the command line to
> spark-submit confirms this.  We're trying to isolate the number of places
> where we have to set properties within spark and were hoping that it will
> be possible to have this pull in the spark-defaults.conf from somewhere, or
> at least allow the user to inform the dispatcher through spark-submit that
> those properties will be available once the job starts.
>
> Finally, I don't think the dispatcher should crash in this event.  It
> seems not exceptional that a job is misconfigured when submitted.
>
> Please direct me on the right path if I'm headed in the wrong direction.
> Also let me know if I should open some tickets for these issues.
>
> Thanks,
> - Alan
>
> On Fri, Sep 11, 2015 at 1:05 PM, Tim Chen <t...@mesosphere.io> wrote:
>
>> Yes you can create an issue, or actually contribute a patch to update it
>> :)
>>
>> Sorry the docs is a bit light, I'm going to make it more complete along
>> the way.
>>
>> Tim
>>
>>
>> On Fri, Sep 11, 2015 at 11:11 AM, Tom Waterhouse (tomwater) <
>> tomwa...@cisco.com> wrote:
>>
>>> Tim,
>>>
>>> Thank you for the explanation.  You are correct, my Mesos experience is
>>> very light, and I haven’t deployed anything via Marathon yet.  What you
>>> have stated here makes sense, I will look into doing this.
>>>
>>> Adding this info to the docs would be great.  Is the appropriate action
>>> to create an issue regarding improvement of the docs?  For those of us who
>>> are gaining the experience having such a pointer is very helpful.
>>>
>>> Tom
>>>
>>> From: Tim Chen <t...@mesosphere.io>
>>> Date: Thursday, September 10, 2015 at 10:25 AM
>>> To: Tom Waterhouse <tomwa...@cisco.com>
>>> Cc: "user@spark.apache.org" <user@spark.apache.org>
>>> Subject: Re: Spark on Mesos with Jobs in Cluster Mode Documentation
>>>
>>> Hi Tom,
>>>
>>> Sorry the documentation isn't really rich, since it's probably assuming
>>> users understands how Mesos and framework works.
>>>
>>> First I need explain the rationale of why create the dispatcher. If
>>> you're not familiar with Mesos yet, each node in your datacenter is
>>> installed a Mesos slave where it's responsible for publishing resources and
>>> running/watching tasks, and Mesos master is responsible for taking the
>>> aggregated resources and scheduling them among frameworks.
>>>
>>> Frameworks are not managed by Mesos, as Mesos master/slave doesn't
>>> launch and maintain framework but assume they're launched and kept running
>>> on its own. All the existing frameworks in the ecosystem therefore all have
>>> their own ways to deploy, HA and persist state (e.g: Aurora, Marathon, etc).
>>>
>>> Therefore, to introduce cluster mode with Mesos, we must create a
>>> framework that is long running that can be running in your datacenter, and
>>> can handle launching spark drivers on demand and handle HA, etc. This is
>>> what the dispatcher is all about.
>>>
>>> So the idea is that you should launch the dispatcher not on the client,
>>> but on a machine in your datacenter. In Mesosphere's DCOS we launch all
>>> frameworks and long running services with Marathon, and you can use
>>> Marathon to launch the Spark dispatcher.
>>>
>>> Then all clients instead of specifying the Mesos master URL (e.g:
>>> mesos://mesos.master:2181), then just talks to the dispatcher only
>>> (mesos://spark-dispatcher.mesos:7077), and the dispatcher will then start
>>> and watch the driver for you.
>>>
>>> Tim
>>>
>>>
>>>
>>> On Thu, Sep 10, 2015 at 10:13 AM, Tom Waterhouse (tomwater) <
>>> tomwa...@cisco.com> wrote:
>>>
>>>> After spending most of yesterday scouring the Internet for sources of
>>>> documentation for submitting Spark jobs in cluster mode to a Spark cluster
>>>> managed by Mesos I was able to do just that, but I am not convinced that
>>>> how I have things setup is correct.
>>>>
>>>> I used the Mesos published
>>>> <https://open.mesosphere.com/getting-started/datacenter/install/>
>>>> instructions for setting up my Mesos cluster.  I have three Zookeeper
>>>> instances, three Mesos master instances, and three Mesos slave instances.
>>>> This is all running in Openstack.
>>>>
>>>> The documentation on the Spark documentation site states that “To use
>>>> cluster mode, you must start the MesosClusterDispatcher in your cluster via
>>>> the sbin/start-mesos-dispatcher.sh script, passing in the Mesos master
>>>> url (e.g: mesos://host:5050).”  That is it, no more information than
>>>> that.  So that is what I did: I have one machine that I use as the Spark
>>>> client for submitting jobs.  I started the Mesos dispatcher with script as
>>>> described, and using the client machine’s IP address and port as the target
>>>> for the job submitted the job.
>>>>
>>>> The job is currently running in Mesos as expected.  This is not however
>>>> how I would have expected to configure the system.  As running there is one
>>>> instance of the Spark Mesos dispatcher running outside of Mesos, so not a
>>>> part of the sphere of Mesos resource management.
>>>>
>>>> I used the following Stack Overflow posts as guidelines:
>>>> http://stackoverflow.com/questions/31164725/spark-mesos-dispatcher
>>>> http://stackoverflow.com/questions/31294515/start-spark-via-mesos
>>>>
>>>> There must be better documentation on how to deploy Spark in Mesos with
>>>> jobs able to be deployed in cluster mode.
>>>>
>>>> I can follow up with more specific information regarding my deployment
>>>> if necessary.
>>>>
>>>> Tom
>>>>
>>>
>>>
>>
>

Re: Spark on Mesos with Jobs in Cluster Mode Documentation

Reply via email to