Re: Spark on Mesos with Jobs in Cluster Mode Documentation

Timothy Chen Thu, 17 Sep 2015 17:05:09 -0700

Hi Alan,

If I understand correctly, you are setting executor home when you launch the 
dispatcher and not on the configuration when you submit job, and expect it to 
inherit that configuration?


When I worked on the dispatcher I was assuming all configuration is passed to 
the dispatcher to launch the job exactly how you will need to launch it with 
client mode.

But indeed it shouldn't crash dispatcher, I'll take a closer look when I get a 
chance.

Can you recommend changes on the documentation, either in email or a PR?

Thanks!

Tim

Sent from my iPhone

> On Sep 17, 2015, at 12:29 PM, Alan Braithwaite <a...@cloudflare.com> wrote:
> 
> Hey All,
> 
> To bump this thread once again, I'm having some trouble using the dispatcher 
> as well.
> 
> I'm using Mesos Cluster Manager with Docker Executors.  I've deployed the 
> dispatcher as Marathon job.  When I submit a job using spark submit, the 
> dispatcher writes back that the submission was successful and then promptly 
> dies in marathon.  Looking at the logs reveals it was hitting the following 
> line:
> 
> 398:          throw new SparkException("Executor Spark home 
> `spark.mesos.executor.home` is not set!")
> 
> Which is odd because it's set in multiple places (SPARK_HOME, 
> spark.mesos.executor.home, spark.home, etc).  Reading the code, it appears 
> that the driver desc pulls only from the request and disregards any other 
> properties that may be configured.  Testing by passing --conf 
> spark.mesos.executor.home=/usr/local/spark on the command line to 
> spark-submit confirms this.  We're trying to isolate the number of places 
> where we have to set properties within spark and were hoping that it will be 
> possible to have this pull in the spark-defaults.conf from somewhere, or at 
> least allow the user to inform the dispatcher through spark-submit that those 
> properties will be available once the job starts. 
> 
> Finally, I don't think the dispatcher should crash in this event.  It seems 
> not exceptional that a job is misconfigured when submitted.
> 
> Please direct me on the right path if I'm headed in the wrong direction.  
> Also let me know if I should open some tickets for these issues.
> 
> Thanks,
> - Alan
> 
>> On Fri, Sep 11, 2015 at 1:05 PM, Tim Chen <t...@mesosphere.io> wrote:
>> Yes you can create an issue, or actually contribute a patch to update it :)
>> 
>> Sorry the docs is a bit light, I'm going to make it more complete along the 
>> way.
>> 
>> Tim
>> 
>> 
>>> On Fri, Sep 11, 2015 at 11:11 AM, Tom Waterhouse (tomwater) 
>>> <tomwa...@cisco.com> wrote:
>>> Tim,
>>> 
>>> Thank you for the explanation.  You are correct, my Mesos experience is 
>>> very light, and I haven’t deployed anything via Marathon yet.  What you 
>>> have stated here makes sense, I will look into doing this.
>>> 
>>> Adding this info to the docs would be great.  Is the appropriate action to 
>>> create an issue regarding improvement of the docs?  For those of us who are 
>>> gaining the experience having such a pointer is very helpful.
>>> 
>>> Tom
>>> 
>>> From: Tim Chen <t...@mesosphere.io>
>>> Date: Thursday, September 10, 2015 at 10:25 AM
>>> To: Tom Waterhouse <tomwa...@cisco.com>
>>> Cc: "user@spark.apache.org" <user@spark.apache.org>
>>> Subject: Re: Spark on Mesos with Jobs in Cluster Mode Documentation
>>> 
>>> Hi Tom,
>>> 
>>> Sorry the documentation isn't really rich, since it's probably assuming 
>>> users understands how Mesos and framework works.
>>> 
>>> First I need explain the rationale of why create the dispatcher. If you're 
>>> not familiar with Mesos yet, each node in your datacenter is installed a 
>>> Mesos slave where it's responsible for publishing resources and 
>>> running/watching tasks, and Mesos master is responsible for taking the 
>>> aggregated resources and scheduling them among frameworks. 
>>> 
>>> Frameworks are not managed by Mesos, as Mesos master/slave doesn't launch 
>>> and maintain framework but assume they're launched and kept running on its 
>>> own. All the existing frameworks in the ecosystem therefore all have their 
>>> own ways to deploy, HA and persist state (e.g: Aurora, Marathon, etc).
>>> 
>>> Therefore, to introduce cluster mode with Mesos, we must create a framework 
>>> that is long running that can be running in your datacenter, and can handle 
>>> launching spark drivers on demand and handle HA, etc. This is what the 
>>> dispatcher is all about.
>>> 
>>> So the idea is that you should launch the dispatcher not on the client, but 
>>> on a machine in your datacenter. In Mesosphere's DCOS we launch all 
>>> frameworks and long running services with Marathon, and you can use 
>>> Marathon to launch the Spark dispatcher.
>>> 
>>> Then all clients instead of specifying the Mesos master URL (e.g: 
>>> mesos://mesos.master:2181), then just talks to the dispatcher only 
>>> (mesos://spark-dispatcher.mesos:7077), and the dispatcher will then start 
>>> and watch the driver for you.
>>> 
>>> Tim
>>> 
>>> 
>>> 
>>>> On Thu, Sep 10, 2015 at 10:13 AM, Tom Waterhouse (tomwater) 
>>>> <tomwa...@cisco.com> wrote:
>>>> After spending most of yesterday scouring the Internet for sources of 
>>>> documentation for submitting Spark jobs in cluster mode to a Spark cluster 
>>>> managed by Mesos I was able to do just that, but I am not convinced that 
>>>> how I have things setup is correct.
>>>> 
>>>> I used the Mesos published instructions for setting up my Mesos cluster.  
>>>> I have three Zookeeper instances, three Mesos master instances, and three 
>>>> Mesos slave instances.  This is all running in Openstack.
>>>> 
>>>> The documentation on the Spark documentation site states that “To use 
>>>> cluster mode, you must start the MesosClusterDispatcher in your cluster 
>>>> via the sbin/start-mesos-dispatcher.sh script, passing in the Mesos master 
>>>> url (e.g: mesos://host:5050).”  That is it, no more information than that. 
>>>>  So that is what I did: I have one machine that I use as the Spark client 
>>>> for submitting jobs.  I started the Mesos dispatcher with script as 
>>>> described, and using the client machine’s IP address and port as the 
>>>> target for the job submitted the job.
>>>> 
>>>> The job is currently running in Mesos as expected.  This is not however 
>>>> how I would have expected to configure the system.  As running there is 
>>>> one instance of the Spark Mesos dispatcher running outside of Mesos, so 
>>>> not a part of the sphere of Mesos resource management.  
>>>> 
>>>> I used the following Stack Overflow posts as guidelines:
>>>> http://stackoverflow.com/questions/31164725/spark-mesos-dispatcher
>>>> http://stackoverflow.com/questions/31294515/start-spark-via-mesos
>>>> 
>>>> There must be better documentation on how to deploy Spark in Mesos with 
>>>> jobs able to be deployed in cluster mode.
>>>> 
>>>> I can follow up with more specific information regarding my deployment if 
>>>> necessary.
>>>> 
>>>> Tom
>

Re: Spark on Mesos with Jobs in Cluster Mode Documentation

Reply via email to