After spending most of yesterday scouring the Internet for sources of documentation for submitting Spark jobs in cluster mode to a Spark cluster managed by Mesos I was able to do just that, but I am not convinced that how I have things setup is correct.
I used the Mesos published<https://open.mesosphere.com/getting-started/datacenter/install/> instructions for setting up my Mesos cluster. I have three Zookeeper instances, three Mesos master instances, and three Mesos slave instances. This is all running in Openstack. The documentation on the Spark documentation site states that “To use cluster mode, you must start the MesosClusterDispatcher in your cluster via the sbin/start-mesos-dispatcher.sh script, passing in the Mesos master url (e.g: mesos://host:5050).” That is it, no more information than that. So that is what I did: I have one machine that I use as the Spark client for submitting jobs. I started the Mesos dispatcher with script as described, and using the client machine’s IP address and port as the target for the job submitted the job. The job is currently running in Mesos as expected. This is not however how I would have expected to configure the system. As running there is one instance of the Spark Mesos dispatcher running outside of Mesos, so not a part of the sphere of Mesos resource management. I used the following Stack Overflow posts as guidelines: http://stackoverflow.com/questions/31164725/spark-mesos-dispatcher http://stackoverflow.com/questions/31294515/start-spark-via-mesos There must be better documentation on how to deploy Spark in Mesos with jobs able to be deployed in cluster mode. I can follow up with more specific information regarding my deployment if necessary. Tom