How can I configure Mesos allocation policy to share resources between all current Spark applications? I can't seem to find it in the architecture docs.
*Romi Kuntsman*, *Big Data Engineer* http://www.totango.com On Tue, Nov 4, 2014 at 9:11 AM, Akhil Das <ak...@sigmoidanalytics.com> wrote: > Yes. i believe Mesos is the right choice for you. > http://mesos.apache.org/documentation/latest/mesos-architecture/ > > Thanks > Best Regards > > On Mon, Nov 3, 2014 at 9:33 PM, Romi Kuntsman <r...@totango.com> wrote: > >> So, as said there, static partitioning is used in "Spark’s standalone and >> YARN modes, as well as the coarse-grained Mesos mode". >> That leaves us only with Mesos, where there is *dynamic sharing* of CPU >> cores. >> >> It says "when the application is not running tasks on a machine, other >> applications may run tasks on those cores". >> But my applications are short lived (seconds to minutes), and they read a >> large dataset, process it, and write the results. They are also IO-bound, >> meaning most of the time is spent reading input data (from S3) and writing >> the results back. >> >> Is it possible to divide the resources between them, according to how >> many are trying to run at the same time? >> So for example if I have 12 cores - if one job is scheduled, it will get >> 12 cores, but if 3 are scheduled, then each one will get 4 cores and then >> will all start. >> >> Thanks! >> >> *Romi Kuntsman*, *Big Data Engineer* >> http://www.totango.com >> >> On Mon, Nov 3, 2014 at 5:46 PM, Akhil Das <ak...@sigmoidanalytics.com> >> wrote: >> >>> Have a look at scheduling pools >>> <https://spark.apache.org/docs/latest/job-scheduling.html>. If you want >>> more sophisticated resource allocation, then you are better of to use >>> cluster managers like mesos or yarn >>> >>> Thanks >>> Best Regards >>> >>> On Mon, Nov 3, 2014 at 9:10 PM, Romi Kuntsman <r...@totango.com> wrote: >>> >>>> Hello, >>>> >>>> I have a Spark 1.1.0 standalone cluster, with several nodes, and >>>> several jobs (applications) being scheduled at the same time. >>>> By default, each Spark job takes up all available CPUs. >>>> This way, when more than one job is scheduled, all but the first are >>>> stuck in "WAITING". >>>> On the other hand, if I tell each job to initially limit itself to a >>>> fixed number of CPUs, and that job runs by itself, the cluster is >>>> under-utilized and the job runs longer than it could have if it took all >>>> the available resources. >>>> >>>> - How to give the tasks a more fair resource division, which lets many >>>> jobs run together, and together lets them use all the available resources? >>>> - How do you divide resources between applications on your usecase? >>>> >>>> P.S. I started reading about Mesos but couldn't figure out if/how it >>>> could solve the described issue. >>>> >>>> Thanks! >>>> >>>> *Romi Kuntsman*, *Big Data Engineer* >>>> http://www.totango.com >>>> >>> >>> >> >