[ https://issues.apache.org/jira/browse/MESOS-354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14551042#comment-14551042 ]
Vinod Kone commented on MESOS-354: ---------------------------------- This is the high level idea of how the different components (described in the design doc) interact for oversubscription for the MVP. --> Resource estimator sends an estimate of 'oversubscribable' resources to the slave. --> Slave periodically checks if its cached value of 'revocable resources' (i.e., allocations of revocable containers + oversubscribable resources) has changed. If changed, slave forwards 'revocable resources' to the master. --> Master rescinds outstanding revocable offers when it gets new 'revocable resources' estimate and updates the allocator. --> On receiving 'revocable resources' update, allocator updates 'revocable_available' (revocable resources - revocable allocation) resources. --> 'revocable_available' gets allocated to (and recovered from) frameworks in the same way as 'available' (regular resources). --> When sending offers master sends separate offers for revocable and regular resources. Some salient features of this proposal: --> Allocator changes are minimal. --> Slave forwards estimates only when there is a change => low load on master. --> Split offers allows master to rescind only revocable resources when necessary. Thoughts? > Oversubscribe resources > ----------------------- > > Key: MESOS-354 > URL: https://issues.apache.org/jira/browse/MESOS-354 > Project: Mesos > Issue Type: Epic > Components: isolation, master, slave > Reporter: brian wickman > Priority: Minor > Labels: mesosphere, twitter > Attachments: mesos_virtual_offers.pdf > > > This proposal is predicated upon offer revocation. > The idea would be to add a new "revoked" status either by (1) piggybacking > off an existing status update (TASK_LOST or TASK_KILLED) or (2) introducing a > new status update TASK_REVOKED. > In order to augment an offer with metadata about revocability, there are > options: > 1) Add a revocable boolean to the Offer and > a) offer only one type of Offer per slave at a particular time > b) offer both revocable and non-revocable resources at the same time but > require frameworks to understand that Offers can contain overlapping resources > 2) Add a revocable_resources field on the Offer which is a superset of the > regular resources field. By consuming > resources <= revocable_resources in > a launchTask, the Task becomes a revocable task. If launching a task with < > resources, the Task is non-revocable. > The use cases for revocable tasks are batch tasks (e.g. hadoop/pig/mapreduce) > and non-revocable tasks are online higher-SLA tasks (e.g. services.) > Consider a non-revocable that asks for 4 cores, 8 GB RAM and 20 GB of disk. > One of these resources is a rate (4 cpu seconds per second) and two of them > are fixed values (8GB and 20GB respectively, though disk resources can be > further broken down into spindles - fixed - and iops - a rate.) In practice, > these are the maximum resources in the respective dimensions that this task > will use. In reality, we provision tasks at some factor below peak, and only > hit peak resource consumption in rare circumstances or perhaps at a diurnal > peak. > In the meantime, we stand to gain from offering the some constant factor of > the difference between (reserved - actual) of non-revocable tasks as > revocable resources, depending upon our tolerance for revocable task churn. > The main challenge is coming up with an accurate short / medium / long-term > prediction of resource consumption based upon current behavior. > In many cases it would be OK to be sloppy: > * CPU / iops / network IO are rates (compressible) and can often be OK > below guarantees for brief periods of time while task revocation takes place > * Memory slack can be provided by enabling swap and dynamically setting > swap paging boundaries. Should swap ever be activated, that would be a > signal to revoke. > The master / allocator would piggyback on the slave heartbeat mechanism to > learn of the amount of revocable resources available at any point in time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)