[ 
https://issues.apache.org/jira/browse/MESOS-354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14551042#comment-14551042
 ] 

Vinod Kone commented on MESOS-354:
----------------------------------

This is the high level idea of how the different components (described in the 
design doc) interact for oversubscription for the MVP.

--> Resource estimator sends an estimate of 'oversubscribable' resources to the 
slave.

--> Slave periodically checks if its cached value of 'revocable resources' 
(i.e., allocations of revocable containers + oversubscribable resources) has 
changed. If changed, slave forwards 'revocable resources' to the master.

--> Master rescinds outstanding revocable offers when it gets new 'revocable 
resources' estimate and updates the allocator.

--> On receiving 'revocable resources' update, allocator updates 
'revocable_available' (revocable resources - revocable allocation) resources.

--> 'revocable_available' gets allocated to (and recovered from) frameworks in 
the same way as 'available' (regular resources).

--> When sending offers master sends separate offers for revocable and regular 
resources.

Some salient features of this proposal:
--> Allocator changes are minimal.
--> Slave forwards estimates only when there is a change => low load on master.
--> Split offers allows master to rescind only revocable resources when 
necessary.

Thoughts?

> Oversubscribe resources
> -----------------------
>
>                 Key: MESOS-354
>                 URL: https://issues.apache.org/jira/browse/MESOS-354
>             Project: Mesos
>          Issue Type: Epic
>          Components: isolation, master, slave
>            Reporter: brian wickman
>            Priority: Minor
>              Labels: mesosphere, twitter
>         Attachments: mesos_virtual_offers.pdf
>
>
> This proposal is predicated upon offer revocation.
> The idea would be to add a new "revoked" status either by (1) piggybacking 
> off an existing status update (TASK_LOST or TASK_KILLED) or (2) introducing a 
> new status update TASK_REVOKED.
> In order to augment an offer with metadata about revocability, there are 
> options:
>   1) Add a revocable boolean to the Offer and
>     a) offer only one type of Offer per slave at a particular time
>     b) offer both revocable and non-revocable resources at the same time but 
> require frameworks to understand that Offers can contain overlapping resources
>   2) Add a revocable_resources field on the Offer which is a superset of the 
> regular resources field.  By consuming > resources <= revocable_resources in 
> a launchTask, the Task becomes a revocable task.  If launching a task with < 
> resources, the Task is non-revocable.
> The use cases for revocable tasks are batch tasks (e.g. hadoop/pig/mapreduce) 
> and non-revocable tasks are online higher-SLA tasks (e.g. services.)
> Consider a non-revocable that asks for 4 cores, 8 GB RAM and 20 GB of disk.  
> One of these resources is a rate (4 cpu seconds per second) and two of them 
> are fixed values (8GB and 20GB respectively, though disk resources can be 
> further broken down into spindles - fixed - and iops - a rate.)  In practice, 
> these are the maximum resources in the respective dimensions that this task 
> will use.  In reality, we provision tasks at some factor below peak, and only 
> hit peak resource consumption in rare circumstances or perhaps at a diurnal 
> peak.  
> In the meantime, we stand to gain from offering the some constant factor of 
> the difference between (reserved - actual) of non-revocable tasks as 
> revocable resources, depending upon our tolerance for revocable task churn.  
> The main challenge is coming up with an accurate short / medium / long-term 
> prediction of resource consumption based upon current behavior.
> In many cases it would be OK to be sloppy:
>   * CPU / iops / network IO are rates (compressible) and can often be OK 
> below guarantees for brief periods of time while task revocation takes place
>   * Memory slack can be provided by enabling swap and dynamically setting 
> swap paging boundaries.  Should swap ever be activated, that would be a 
> signal to revoke.
> The master / allocator would piggyback on the slave heartbeat mechanism to 
> learn of the amount of revocable resources available at any point in time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to