I have a single Spark cluster, not multiple frameworks and not multiple versions. Is it relevant for my use-case? Where can I find information about exactly how to make Mesos tell Spark how many resources of the cluster to use? (instead of the default take-all)
*Romi Kuntsman*, *Big Data Engineer* http://www.totango.com On Tue, Nov 4, 2014 at 11:00 AM, Akhil Das <ak...@sigmoidanalytics.com> wrote: > You can look at different modes over here > http://docs.sigmoidanalytics.com/index.php/Spark_On_Mesos#Mesos_Run_Modes > > These people has very good tutorial to get you started > http://mesosphere.com/docs/tutorials/run-spark-on-mesos/#overview > > Thanks > Best Regards > > On Tue, Nov 4, 2014 at 1:44 PM, Romi Kuntsman <r...@totango.com> wrote: > >> How can I configure Mesos allocation policy to share resources between >> all current Spark applications? I can't seem to find it in the architecture >> docs. >> >> *Romi Kuntsman*, *Big Data Engineer* >> http://www.totango.com >> >> On Tue, Nov 4, 2014 at 9:11 AM, Akhil Das <ak...@sigmoidanalytics.com> >> wrote: >> >>> Yes. i believe Mesos is the right choice for you. >>> http://mesos.apache.org/documentation/latest/mesos-architecture/ >>> >>> Thanks >>> Best Regards >>> >>> On Mon, Nov 3, 2014 at 9:33 PM, Romi Kuntsman <r...@totango.com> wrote: >>> >>>> So, as said there, static partitioning is used in "Spark’s standalone >>>> and YARN modes, as well as the coarse-grained Mesos mode". >>>> That leaves us only with Mesos, where there is *dynamic sharing* of >>>> CPU cores. >>>> >>>> It says "when the application is not running tasks on a machine, other >>>> applications may run tasks on those cores". >>>> But my applications are short lived (seconds to minutes), and they read >>>> a large dataset, process it, and write the results. They are also IO-bound, >>>> meaning most of the time is spent reading input data (from S3) and writing >>>> the results back. >>>> >>>> Is it possible to divide the resources between them, according to how >>>> many are trying to run at the same time? >>>> So for example if I have 12 cores - if one job is scheduled, it will >>>> get 12 cores, but if 3 are scheduled, then each one will get 4 cores and >>>> then will all start. >>>> >>>> Thanks! >>>> >>>> *Romi Kuntsman*, *Big Data Engineer* >>>> http://www.totango.com >>>> >>>> On Mon, Nov 3, 2014 at 5:46 PM, Akhil Das <ak...@sigmoidanalytics.com> >>>> wrote: >>>> >>>>> Have a look at scheduling pools >>>>> <https://spark.apache.org/docs/latest/job-scheduling.html>. If you >>>>> want more sophisticated resource allocation, then you are better of to use >>>>> cluster managers like mesos or yarn >>>>> >>>>> Thanks >>>>> Best Regards >>>>> >>>>> On Mon, Nov 3, 2014 at 9:10 PM, Romi Kuntsman <r...@totango.com> >>>>> wrote: >>>>> >>>>>> Hello, >>>>>> >>>>>> I have a Spark 1.1.0 standalone cluster, with several nodes, and >>>>>> several jobs (applications) being scheduled at the same time. >>>>>> By default, each Spark job takes up all available CPUs. >>>>>> This way, when more than one job is scheduled, all but the first are >>>>>> stuck in "WAITING". >>>>>> On the other hand, if I tell each job to initially limit itself to a >>>>>> fixed number of CPUs, and that job runs by itself, the cluster is >>>>>> under-utilized and the job runs longer than it could have if it took all >>>>>> the available resources. >>>>>> >>>>>> - How to give the tasks a more fair resource division, which lets >>>>>> many jobs run together, and together lets them use all the available >>>>>> resources? >>>>>> - How do you divide resources between applications on your usecase? >>>>>> >>>>>> P.S. I started reading about Mesos but couldn't figure out if/how it >>>>>> could solve the described issue. >>>>>> >>>>>> Thanks! >>>>>> >>>>>> *Romi Kuntsman*, *Big Data Engineer* >>>>>> http://www.totango.com >>>>>> >>>>> >>>>> >>>> >>> >> >