Re: Spark job resource allocation best practices

Romi Kuntsman Tue, 04 Nov 2014 00:17:08 -0800

How can I configure Mesos allocation policy to share resources between all
current Spark applications? I can't seem to find it in the architecture
docs.


*Romi Kuntsman*, *Big Data Engineer*
 http://www.totango.com

On Tue, Nov 4, 2014 at 9:11 AM, Akhil Das <ak...@sigmoidanalytics.com>
wrote:

> Yes. i believe Mesos is the right choice for you.
> http://mesos.apache.org/documentation/latest/mesos-architecture/
>
> Thanks
> Best Regards
>
> On Mon, Nov 3, 2014 at 9:33 PM, Romi Kuntsman <r...@totango.com> wrote:
>
>> So, as said there, static partitioning is used in "Spark’s standalone and
>> YARN modes, as well as the coarse-grained Mesos mode".
>> That leaves us only with Mesos, where there is *dynamic sharing* of CPU
>> cores.
>>
>> It says "when the application is not running tasks on a machine, other
>> applications may run tasks on those cores".
>> But my applications are short lived (seconds to minutes), and they read a
>> large dataset, process it, and write the results. They are also IO-bound,
>> meaning most of the time is spent reading input data (from S3) and writing
>> the results back.
>>
>> Is it possible to divide the resources between them, according to how
>> many are trying to run at the same time?
>> So for example if I have 12 cores - if one job is scheduled, it will get
>> 12 cores, but if 3 are scheduled, then each one will get 4 cores and then
>> will all start.
>>
>> Thanks!
>>
>> *Romi Kuntsman*, *Big Data Engineer*
>>  http://www.totango.com
>>
>> On Mon, Nov 3, 2014 at 5:46 PM, Akhil Das <ak...@sigmoidanalytics.com>
>> wrote:
>>
>>> Have a look at scheduling pools
>>> <https://spark.apache.org/docs/latest/job-scheduling.html>. If you want
>>> more sophisticated resource allocation, then you are better of to use
>>> cluster managers like mesos or yarn
>>>
>>> Thanks
>>> Best Regards
>>>
>>> On Mon, Nov 3, 2014 at 9:10 PM, Romi Kuntsman <r...@totango.com> wrote:
>>>
>>>> Hello,
>>>>
>>>> I have a Spark 1.1.0 standalone cluster, with several nodes, and
>>>> several jobs (applications) being scheduled at the same time.
>>>> By default, each Spark job takes up all available CPUs.
>>>> This way, when more than one job is scheduled, all but the first are
>>>> stuck in "WAITING".
>>>> On the other hand, if I tell each job to initially limit itself to a
>>>> fixed number of CPUs, and that job runs by itself, the cluster is
>>>> under-utilized and the job runs longer than it could have if it took all
>>>> the available resources.
>>>>
>>>> - How to give the tasks a more fair resource division, which lets many
>>>> jobs run together, and together lets them use all the available resources?
>>>> - How do you divide resources between applications on your usecase?
>>>>
>>>> P.S. I started reading about Mesos but couldn't figure out if/how it
>>>> could solve the described issue.
>>>>
>>>> Thanks!
>>>>
>>>> *Romi Kuntsman*, *Big Data Engineer*
>>>>  http://www.totango.com
>>>>
>>>
>>>
>>
>

Re: Spark job resource allocation best practices

Reply via email to