Re: Spark job resource allocation best practices

Romi Kuntsman Tue, 04 Nov 2014 01:18:50 -0800

I have a single Spark cluster, not multiple frameworks and not multiple
versions. Is it relevant for my use-case?
Where can I find information about exactly how to make Mesos tell Spark how
many resources of the cluster to use? (instead of the default take-all)


*Romi Kuntsman*, *Big Data Engineer*
 http://www.totango.com

On Tue, Nov 4, 2014 at 11:00 AM, Akhil Das <ak...@sigmoidanalytics.com>
wrote:

> You can look at different modes over here
> http://docs.sigmoidanalytics.com/index.php/Spark_On_Mesos#Mesos_Run_Modes
>
> These people has very good tutorial to get you started
> http://mesosphere.com/docs/tutorials/run-spark-on-mesos/#overview
>
> Thanks
> Best Regards
>
> On Tue, Nov 4, 2014 at 1:44 PM, Romi Kuntsman <r...@totango.com> wrote:
>
>> How can I configure Mesos allocation policy to share resources between
>> all current Spark applications? I can't seem to find it in the architecture
>> docs.
>>
>> *Romi Kuntsman*, *Big Data Engineer*
>>  http://www.totango.com
>>
>> On Tue, Nov 4, 2014 at 9:11 AM, Akhil Das <ak...@sigmoidanalytics.com>
>> wrote:
>>
>>> Yes. i believe Mesos is the right choice for you.
>>> http://mesos.apache.org/documentation/latest/mesos-architecture/
>>>
>>> Thanks
>>> Best Regards
>>>
>>> On Mon, Nov 3, 2014 at 9:33 PM, Romi Kuntsman <r...@totango.com> wrote:
>>>
>>>> So, as said there, static partitioning is used in "Spark’s standalone
>>>> and YARN modes, as well as the coarse-grained Mesos mode".
>>>> That leaves us only with Mesos, where there is *dynamic sharing* of
>>>> CPU cores.
>>>>
>>>> It says "when the application is not running tasks on a machine, other
>>>> applications may run tasks on those cores".
>>>> But my applications are short lived (seconds to minutes), and they read
>>>> a large dataset, process it, and write the results. They are also IO-bound,
>>>> meaning most of the time is spent reading input data (from S3) and writing
>>>> the results back.
>>>>
>>>> Is it possible to divide the resources between them, according to how
>>>> many are trying to run at the same time?
>>>> So for example if I have 12 cores - if one job is scheduled, it will
>>>> get 12 cores, but if 3 are scheduled, then each one will get 4 cores and
>>>> then will all start.
>>>>
>>>> Thanks!
>>>>
>>>> *Romi Kuntsman*, *Big Data Engineer*
>>>>  http://www.totango.com
>>>>
>>>> On Mon, Nov 3, 2014 at 5:46 PM, Akhil Das <ak...@sigmoidanalytics.com>
>>>> wrote:
>>>>
>>>>> Have a look at scheduling pools
>>>>> <https://spark.apache.org/docs/latest/job-scheduling.html>. If you
>>>>> want more sophisticated resource allocation, then you are better of to use
>>>>> cluster managers like mesos or yarn
>>>>>
>>>>> Thanks
>>>>> Best Regards
>>>>>
>>>>> On Mon, Nov 3, 2014 at 9:10 PM, Romi Kuntsman <r...@totango.com>
>>>>> wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I have a Spark 1.1.0 standalone cluster, with several nodes, and
>>>>>> several jobs (applications) being scheduled at the same time.
>>>>>> By default, each Spark job takes up all available CPUs.
>>>>>> This way, when more than one job is scheduled, all but the first are
>>>>>> stuck in "WAITING".
>>>>>> On the other hand, if I tell each job to initially limit itself to a
>>>>>> fixed number of CPUs, and that job runs by itself, the cluster is
>>>>>> under-utilized and the job runs longer than it could have if it took all
>>>>>> the available resources.
>>>>>>
>>>>>> - How to give the tasks a more fair resource division, which lets
>>>>>> many jobs run together, and together lets them use all the available
>>>>>> resources?
>>>>>> - How do you divide resources between applications on your usecase?
>>>>>>
>>>>>> P.S. I started reading about Mesos but couldn't figure out if/how it
>>>>>> could solve the described issue.
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>> *Romi Kuntsman*, *Big Data Engineer*
>>>>>>  http://www.totango.com
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Spark job resource allocation best practices

Reply via email to