You can look at different modes over here http://docs.sigmoidanalytics.com/index.php/Spark_On_Mesos#Mesos_Run_Modes
These people has very good tutorial to get you started http://mesosphere.com/docs/tutorials/run-spark-on-mesos/#overview Thanks Best Regards On Tue, Nov 4, 2014 at 1:44 PM, Romi Kuntsman <r...@totango.com> wrote: > How can I configure Mesos allocation policy to share resources between all > current Spark applications? I can't seem to find it in the architecture > docs. > > *Romi Kuntsman*, *Big Data Engineer* > http://www.totango.com > > On Tue, Nov 4, 2014 at 9:11 AM, Akhil Das <ak...@sigmoidanalytics.com> > wrote: > >> Yes. i believe Mesos is the right choice for you. >> http://mesos.apache.org/documentation/latest/mesos-architecture/ >> >> Thanks >> Best Regards >> >> On Mon, Nov 3, 2014 at 9:33 PM, Romi Kuntsman <r...@totango.com> wrote: >> >>> So, as said there, static partitioning is used in "Spark’s standalone >>> and YARN modes, as well as the coarse-grained Mesos mode". >>> That leaves us only with Mesos, where there is *dynamic sharing* of CPU >>> cores. >>> >>> It says "when the application is not running tasks on a machine, other >>> applications may run tasks on those cores". >>> But my applications are short lived (seconds to minutes), and they read >>> a large dataset, process it, and write the results. They are also IO-bound, >>> meaning most of the time is spent reading input data (from S3) and writing >>> the results back. >>> >>> Is it possible to divide the resources between them, according to how >>> many are trying to run at the same time? >>> So for example if I have 12 cores - if one job is scheduled, it will get >>> 12 cores, but if 3 are scheduled, then each one will get 4 cores and then >>> will all start. >>> >>> Thanks! >>> >>> *Romi Kuntsman*, *Big Data Engineer* >>> http://www.totango.com >>> >>> On Mon, Nov 3, 2014 at 5:46 PM, Akhil Das <ak...@sigmoidanalytics.com> >>> wrote: >>> >>>> Have a look at scheduling pools >>>> <https://spark.apache.org/docs/latest/job-scheduling.html>. If you >>>> want more sophisticated resource allocation, then you are better of to use >>>> cluster managers like mesos or yarn >>>> >>>> Thanks >>>> Best Regards >>>> >>>> On Mon, Nov 3, 2014 at 9:10 PM, Romi Kuntsman <r...@totango.com> wrote: >>>> >>>>> Hello, >>>>> >>>>> I have a Spark 1.1.0 standalone cluster, with several nodes, and >>>>> several jobs (applications) being scheduled at the same time. >>>>> By default, each Spark job takes up all available CPUs. >>>>> This way, when more than one job is scheduled, all but the first are >>>>> stuck in "WAITING". >>>>> On the other hand, if I tell each job to initially limit itself to a >>>>> fixed number of CPUs, and that job runs by itself, the cluster is >>>>> under-utilized and the job runs longer than it could have if it took all >>>>> the available resources. >>>>> >>>>> - How to give the tasks a more fair resource division, which lets many >>>>> jobs run together, and together lets them use all the available resources? >>>>> - How do you divide resources between applications on your usecase? >>>>> >>>>> P.S. I started reading about Mesos but couldn't figure out if/how it >>>>> could solve the described issue. >>>>> >>>>> Thanks! >>>>> >>>>> *Romi Kuntsman*, *Big Data Engineer* >>>>> http://www.totango.com >>>>> >>>> >>>> >>> >> >