Re: Spark on Mesos with Jobs in Cluster Mode Documentation

Tim Chen Sat, 19 Sep 2015 01:10:32 -0700

I guess I need a bit more clarification, what kind of assumptions was the
dispatcher making?


Tim


On Thu, Sep 17, 2015 at 10:18 PM, Alan Braithwaite <a...@cloudflare.com>
wrote:

> Hi Tim,
>
> Thanks for the follow up.  It's not so much that I expect the executor to
> inherit the configuration of the dispatcher as I* don't *expect the
> dispatcher to make assumptions about the system environment of the executor
> (since it lives in a docker).  I could potentially see a case where you
> might want to explicitly forbid the defaults, but I can't think of any
> right now.
>
> Otherwise, I'm confused as to why the defaults in the docker image for the
> executor are just ignored.  I suppose that it's the dispatchers job to
> ensure the *exact* configuration of the executor, regardless of the
> defaults set on the executors machine?  Is that the assumption being made?
> I can understand that in contexts which aren't docker driven since jobs
> could be rolling out in the middle of a config update.  Trying to think of
> this outside the terms of just mesos/docker (since I'm fully aware that
> docker doesn't rule the world yet).
>
> So I can see this from both perspectives now and passing in the properties
> file will probably work just fine for me, but for my better understanding:
> When the executor starts, will it read any of the environment that it's
> executing in or will it just take only the properties given to it by the
> dispatcher and nothing more?
>
> Lemme know if anything needs more clarification and thanks for your mesos
> contribution to spark!
>
> - Alan
>
> On Thu, Sep 17, 2015 at 5:03 PM, Timothy Chen <t...@mesosphere.io> wrote:
>
>> Hi Alan,
>>
>> If I understand correctly, you are setting executor home when you launch
>> the dispatcher and not on the configuration when you submit job, and expect
>> it to inherit that configuration?
>>
>> When I worked on the dispatcher I was assuming all configuration is
>> passed to the dispatcher to launch the job exactly how you will need to
>> launch it with client mode.
>>
>> But indeed it shouldn't crash dispatcher, I'll take a closer look when I
>> get a chance.
>>
>> Can you recommend changes on the documentation, either in email or a PR?
>>
>> Thanks!
>>
>> Tim
>>
>> Sent from my iPhone
>>
>> On Sep 17, 2015, at 12:29 PM, Alan Braithwaite <a...@cloudflare.com>
>> wrote:
>>
>> Hey All,
>>
>> To bump this thread once again, I'm having some trouble using the
>> dispatcher as well.
>>
>> I'm using Mesos Cluster Manager with Docker Executors.  I've deployed the
>> dispatcher as Marathon job.  When I submit a job using spark submit, the
>> dispatcher writes back that the submission was successful and then promptly
>> dies in marathon.  Looking at the logs reveals it was hitting the following
>> line:
>>
>> 398:          throw new SparkException("Executor Spark home
>> `spark.mesos.executor.home` is not set!")
>>
>> Which is odd because it's set in multiple places (SPARK_HOME,
>> spark.mesos.executor.home, spark.home, etc).  Reading the code, it
>> appears that the driver desc pulls only from the request and disregards any
>> other properties that may be configured.  Testing by passing --conf
>> spark.mesos.executor.home=/usr/local/spark on the command line to
>> spark-submit confirms this.  We're trying to isolate the number of places
>> where we have to set properties within spark and were hoping that it will
>> be possible to have this pull in the spark-defaults.conf from somewhere, or
>> at least allow the user to inform the dispatcher through spark-submit that
>> those properties will be available once the job starts.
>>
>> Finally, I don't think the dispatcher should crash in this event.  It
>> seems not exceptional that a job is misconfigured when submitted.
>>
>> Please direct me on the right path if I'm headed in the wrong direction.
>> Also let me know if I should open some tickets for these issues.
>>
>> Thanks,
>> - Alan
>>
>> On Fri, Sep 11, 2015 at 1:05 PM, Tim Chen <t...@mesosphere.io> wrote:
>>
>>> Yes you can create an issue, or actually contribute a patch to update it
>>> :)
>>>
>>> Sorry the docs is a bit light, I'm going to make it more complete along
>>> the way.
>>>
>>> Tim
>>>
>>>
>>> On Fri, Sep 11, 2015 at 11:11 AM, Tom Waterhouse (tomwater) <
>>> tomwa...@cisco.com> wrote:
>>>
>>>> Tim,
>>>>
>>>> Thank you for the explanation.  You are correct, my Mesos experience is
>>>> very light, and I haven’t deployed anything via Marathon yet.  What you
>>>> have stated here makes sense, I will look into doing this.
>>>>
>>>> Adding this info to the docs would be great.  Is the appropriate action
>>>> to create an issue regarding improvement of the docs?  For those of us who
>>>> are gaining the experience having such a pointer is very helpful.
>>>>
>>>> Tom
>>>>
>>>> From: Tim Chen <t...@mesosphere.io>
>>>> Date: Thursday, September 10, 2015 at 10:25 AM
>>>> To: Tom Waterhouse <tomwa...@cisco.com>
>>>> Cc: "user@spark.apache.org" <user@spark.apache.org>
>>>> Subject: Re: Spark on Mesos with Jobs in Cluster Mode Documentation
>>>>
>>>> Hi Tom,
>>>>
>>>> Sorry the documentation isn't really rich, since it's probably assuming
>>>> users understands how Mesos and framework works.
>>>>
>>>> First I need explain the rationale of why create the dispatcher. If
>>>> you're not familiar with Mesos yet, each node in your datacenter is
>>>> installed a Mesos slave where it's responsible for publishing resources and
>>>> running/watching tasks, and Mesos master is responsible for taking the
>>>> aggregated resources and scheduling them among frameworks.
>>>>
>>>> Frameworks are not managed by Mesos, as Mesos master/slave doesn't
>>>> launch and maintain framework but assume they're launched and kept running
>>>> on its own. All the existing frameworks in the ecosystem therefore all have
>>>> their own ways to deploy, HA and persist state (e.g: Aurora, Marathon, 
>>>> etc).
>>>>
>>>> Therefore, to introduce cluster mode with Mesos, we must create a
>>>> framework that is long running that can be running in your datacenter, and
>>>> can handle launching spark drivers on demand and handle HA, etc. This is
>>>> what the dispatcher is all about.
>>>>
>>>> So the idea is that you should launch the dispatcher not on the client,
>>>> but on a machine in your datacenter. In Mesosphere's DCOS we launch all
>>>> frameworks and long running services with Marathon, and you can use
>>>> Marathon to launch the Spark dispatcher.
>>>>
>>>> Then all clients instead of specifying the Mesos master URL (e.g:
>>>> mesos://mesos.master:2181), then just talks to the dispatcher only
>>>> (mesos://spark-dispatcher.mesos:7077), and the dispatcher will then start
>>>> and watch the driver for you.
>>>>
>>>> Tim
>>>>
>>>>
>>>>
>>>> On Thu, Sep 10, 2015 at 10:13 AM, Tom Waterhouse (tomwater) <
>>>> tomwa...@cisco.com> wrote:
>>>>
>>>>> After spending most of yesterday scouring the Internet for sources of
>>>>> documentation for submitting Spark jobs in cluster mode to a Spark cluster
>>>>> managed by Mesos I was able to do just that, but I am not convinced that
>>>>> how I have things setup is correct.
>>>>>
>>>>> I used the Mesos published
>>>>> <https://open.mesosphere.com/getting-started/datacenter/install/>
>>>>> instructions for setting up my Mesos cluster.  I have three Zookeeper
>>>>> instances, three Mesos master instances, and three Mesos slave instances.
>>>>> This is all running in Openstack.
>>>>>
>>>>> The documentation on the Spark documentation site states that “To use
>>>>> cluster mode, you must start the MesosClusterDispatcher in your cluster 
>>>>> via
>>>>> the sbin/start-mesos-dispatcher.sh script, passing in the Mesos
>>>>> master url (e.g: mesos://host:5050).”  That is it, no more
>>>>> information than that.  So that is what I did: I have one machine that I
>>>>> use as the Spark client for submitting jobs.  I started the Mesos
>>>>> dispatcher with script as described, and using the client machine’s IP
>>>>> address and port as the target for the job submitted the job.
>>>>>
>>>>> The job is currently running in Mesos as expected.  This is not
>>>>> however how I would have expected to configure the system.  As running
>>>>> there is one instance of the Spark Mesos dispatcher running outside of
>>>>> Mesos, so not a part of the sphere of Mesos resource management.
>>>>>
>>>>> I used the following Stack Overflow posts as guidelines:
>>>>> http://stackoverflow.com/questions/31164725/spark-mesos-dispatcher
>>>>> http://stackoverflow.com/questions/31294515/start-spark-via-mesos
>>>>>
>>>>> There must be better documentation on how to deploy Spark in Mesos
>>>>> with jobs able to be deployed in cluster mode.
>>>>>
>>>>> I can follow up with more specific information regarding my
>>>>> deployment if necessary.
>>>>>
>>>>> Tom
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Spark on Mesos with Jobs in Cluster Mode Documentation

Reply via email to