Yes. i believe Mesos is the right choice for you. http://mesos.apache.org/documentation/latest/mesos-architecture/
Thanks Best Regards On Mon, Nov 3, 2014 at 9:33 PM, Romi Kuntsman <r...@totango.com> wrote: > So, as said there, static partitioning is used in "Spark’s standalone and > YARN modes, as well as the coarse-grained Mesos mode". > That leaves us only with Mesos, where there is *dynamic sharing* of CPU > cores. > > It says "when the application is not running tasks on a machine, other > applications may run tasks on those cores". > But my applications are short lived (seconds to minutes), and they read a > large dataset, process it, and write the results. They are also IO-bound, > meaning most of the time is spent reading input data (from S3) and writing > the results back. > > Is it possible to divide the resources between them, according to how many > are trying to run at the same time? > So for example if I have 12 cores - if one job is scheduled, it will get > 12 cores, but if 3 are scheduled, then each one will get 4 cores and then > will all start. > > Thanks! > > *Romi Kuntsman*, *Big Data Engineer* > http://www.totango.com > > On Mon, Nov 3, 2014 at 5:46 PM, Akhil Das <ak...@sigmoidanalytics.com> > wrote: > >> Have a look at scheduling pools >> <https://spark.apache.org/docs/latest/job-scheduling.html>. If you want >> more sophisticated resource allocation, then you are better of to use >> cluster managers like mesos or yarn >> >> Thanks >> Best Regards >> >> On Mon, Nov 3, 2014 at 9:10 PM, Romi Kuntsman <r...@totango.com> wrote: >> >>> Hello, >>> >>> I have a Spark 1.1.0 standalone cluster, with several nodes, and several >>> jobs (applications) being scheduled at the same time. >>> By default, each Spark job takes up all available CPUs. >>> This way, when more than one job is scheduled, all but the first are >>> stuck in "WAITING". >>> On the other hand, if I tell each job to initially limit itself to a >>> fixed number of CPUs, and that job runs by itself, the cluster is >>> under-utilized and the job runs longer than it could have if it took all >>> the available resources. >>> >>> - How to give the tasks a more fair resource division, which lets many >>> jobs run together, and together lets them use all the available resources? >>> - How do you divide resources between applications on your usecase? >>> >>> P.S. I started reading about Mesos but couldn't figure out if/how it >>> could solve the described issue. >>> >>> Thanks! >>> >>> *Romi Kuntsman*, *Big Data Engineer* >>> http://www.totango.com >>> >> >> >