Re: Spark job resource allocation best practices

Akhil Das Tue, 04 Nov 2014 01:01:39 -0800

You can look at different modes over here
http://docs.sigmoidanalytics.com/index.php/Spark_On_Mesos#Mesos_Run_Modes


These people has very good tutorial to get you started
http://mesosphere.com/docs/tutorials/run-spark-on-mesos/#overview

Thanks
Best Regards

On Tue, Nov 4, 2014 at 1:44 PM, Romi Kuntsman <r...@totango.com> wrote:

> How can I configure Mesos allocation policy to share resources between all
> current Spark applications? I can't seem to find it in the architecture
> docs.
>
> *Romi Kuntsman*, *Big Data Engineer*
>  http://www.totango.com
>
> On Tue, Nov 4, 2014 at 9:11 AM, Akhil Das <ak...@sigmoidanalytics.com>
> wrote:
>
>> Yes. i believe Mesos is the right choice for you.
>> http://mesos.apache.org/documentation/latest/mesos-architecture/
>>
>> Thanks
>> Best Regards
>>
>> On Mon, Nov 3, 2014 at 9:33 PM, Romi Kuntsman <r...@totango.com> wrote:
>>
>>> So, as said there, static partitioning is used in "Spark’s standalone
>>> and YARN modes, as well as the coarse-grained Mesos mode".
>>> That leaves us only with Mesos, where there is *dynamic sharing* of CPU
>>> cores.
>>>
>>> It says "when the application is not running tasks on a machine, other
>>> applications may run tasks on those cores".
>>> But my applications are short lived (seconds to minutes), and they read
>>> a large dataset, process it, and write the results. They are also IO-bound,
>>> meaning most of the time is spent reading input data (from S3) and writing
>>> the results back.
>>>
>>> Is it possible to divide the resources between them, according to how
>>> many are trying to run at the same time?
>>> So for example if I have 12 cores - if one job is scheduled, it will get
>>> 12 cores, but if 3 are scheduled, then each one will get 4 cores and then
>>> will all start.
>>>
>>> Thanks!
>>>
>>> *Romi Kuntsman*, *Big Data Engineer*
>>>  http://www.totango.com
>>>
>>> On Mon, Nov 3, 2014 at 5:46 PM, Akhil Das <ak...@sigmoidanalytics.com>
>>> wrote:
>>>
>>>> Have a look at scheduling pools
>>>> <https://spark.apache.org/docs/latest/job-scheduling.html>. If you
>>>> want more sophisticated resource allocation, then you are better of to use
>>>> cluster managers like mesos or yarn
>>>>
>>>> Thanks
>>>> Best Regards
>>>>
>>>> On Mon, Nov 3, 2014 at 9:10 PM, Romi Kuntsman <r...@totango.com> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> I have a Spark 1.1.0 standalone cluster, with several nodes, and
>>>>> several jobs (applications) being scheduled at the same time.
>>>>> By default, each Spark job takes up all available CPUs.
>>>>> This way, when more than one job is scheduled, all but the first are
>>>>> stuck in "WAITING".
>>>>> On the other hand, if I tell each job to initially limit itself to a
>>>>> fixed number of CPUs, and that job runs by itself, the cluster is
>>>>> under-utilized and the job runs longer than it could have if it took all
>>>>> the available resources.
>>>>>
>>>>> - How to give the tasks a more fair resource division, which lets many
>>>>> jobs run together, and together lets them use all the available resources?
>>>>> - How do you divide resources between applications on your usecase?
>>>>>
>>>>> P.S. I started reading about Mesos but couldn't figure out if/how it
>>>>> could solve the described issue.
>>>>>
>>>>> Thanks!
>>>>>
>>>>> *Romi Kuntsman*, *Big Data Engineer*
>>>>>  http://www.totango.com
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Spark job resource allocation best practices

Reply via email to