[jira] [Commented] (MESOS-2652) Update Mesos containerizer to understand revocable cpu resources
[ https://issues.apache.org/jira/browse/MESOS-2652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14640207#comment-14640207 ] Adam B commented on MESOS-2652: --- [~jieyu] Is this still an open issue? If so, please remove the "Fix Version" field, since it was not actually fixed yet. If not, please resolve this ticket. > Update Mesos containerizer to understand revocable cpu resources > > > Key: MESOS-2652 > URL: https://issues.apache.org/jira/browse/MESOS-2652 > Project: Mesos > Issue Type: Task >Reporter: Vinod Kone >Assignee: Ian Downes > Labels: twitter > Fix For: 0.23.0 > > Attachments: Abnormal performance with 3 additional revocable tasks > (1).png, Abnormal performance with 3 additional revocable tasks (2).png, > Abnormal performance with 3 additional revocable tasks (3).png, Abnormal > performance with 3 additional revocable tasks (4).png, Abnormal performance > with 3 additional revocable tasks (5).png, Abnormal performance with 3 > additional revocable tasks (6).png, Abnormal performance with 3 additional > revocable tasks (7).png, Performance improvement after reducing cpu.share to > 2 for revocable tasks (1).png, Performance improvement after reducing > cpu.share to 2 for revocable tasks (10).png, Performance improvement after > reducing cpu.share to 2 for revocable tasks (2).png, Performance improvement > after reducing cpu.share to 2 for revocable tasks (3).png, Performance > improvement after reducing cpu.share to 2 for revocable tasks (4).png, > Performance improvement after reducing cpu.share to 2 for revocable tasks > (5).png, Performance improvement after reducing cpu.share to 2 for revocable > tasks (6).png, Performance improvement after reducing cpu.share to 2 for > revocable tasks (7).png, Performance improvement after reducing cpu.share to > 2 for revocable tasks (8).png, Performance improvement after reducing > cpu.share to 2 for revocable tasks (9).png, cpu.share from 1024 to 10 for > revocable tasks (1).png, cpu.share from 1024 to 10 for revocable tasks > (2).png, flattened vs non-flattened cgroups layout (1).png, flattened vs > non-flattened cgroups layout (2).png > > > The CPU isolator needs to properly set limits for revocable and non-revocable > containers. > The proposed strategy is to use a two-way split of the cpu cgroup hierarchy > -- normal (non-revocable) and low priority (revocable) subtrees -- and to use > a biased split of CFS cpu.shares across the subtrees, e.g., a 20:1 split > (TBD). Containers would be present in only one of the subtrees. CFS quotas > will *not* be set on subtree roots, only cpu.shares. Each container would set > CFS quota and shares as done currently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2652) Update Mesos containerizer to understand revocable cpu resources
[ https://issues.apache.org/jira/browse/MESOS-2652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14630357#comment-14630357 ] Jie Yu commented on MESOS-2652: --- I did another experiment in which I used a non-flattened cgroups layout. To be more specific, the layout is the following: {noformat} /mesos |-- regular_container1/ # cpu.shares = 1024 * nr_cpus |-- regular_container2/ # cpu.shares = 1024 * nr_cpus |-- revocable/ # cpu.shares = 2 (minimal share) |--- revocable_container1/ |--- revocable_container2/ {noformat} Surprisingly, the performance using a non-flattened cgroups layout is similar to that using a flattened cgroups layout (using 10 cpu.shares per cpu for revocable containers). See the attached graph (I did the cgroups layout change at around 11:45am July 16). My interpretation of the above results is: workload characteristic outweighs the impact from cgroups layout (as long as the cpu.shares is set to be low enough). To be more specific, for some benchmark, if some of its threads go to wait status (waiting for some events, lock, etc) regularly (e.g., facesim), it's more vulnerable to interferences from revocable tasks. On the other hand, for some benchmark, if all of its threads are always runnable (e.g., ferret), it's less vulnerable to interferences because revocable tasks don't have a chance to run. > Update Mesos containerizer to understand revocable cpu resources > > > Key: MESOS-2652 > URL: https://issues.apache.org/jira/browse/MESOS-2652 > Project: Mesos > Issue Type: Task >Reporter: Vinod Kone >Assignee: Ian Downes > Labels: twitter > Fix For: 0.23.0 > > Attachments: Abnormal performance with 3 additional revocable tasks > (1).png, Abnormal performance with 3 additional revocable tasks (2).png, > Abnormal performance with 3 additional revocable tasks (3).png, Abnormal > performance with 3 additional revocable tasks (4).png, Abnormal performance > with 3 additional revocable tasks (5).png, Abnormal performance with 3 > additional revocable tasks (6).png, Abnormal performance with 3 additional > revocable tasks (7).png, Performance improvement after reducing cpu.share to > 2 for revocable tasks (1).png, Performance improvement after reducing > cpu.share to 2 for revocable tasks (10).png, Performance improvement after > reducing cpu.share to 2 for revocable tasks (2).png, Performance improvement > after reducing cpu.share to 2 for revocable tasks (3).png, Performance > improvement after reducing cpu.share to 2 for revocable tasks (4).png, > Performance improvement after reducing cpu.share to 2 for revocable tasks > (5).png, Performance improvement after reducing cpu.share to 2 for revocable > tasks (6).png, Performance improvement after reducing cpu.share to 2 for > revocable tasks (7).png, Performance improvement after reducing cpu.share to > 2 for revocable tasks (8).png, Performance improvement after reducing > cpu.share to 2 for revocable tasks (9).png, cpu.share from 1024 to 10 for > revocable tasks (1).png, cpu.share from 1024 to 10 for revocable tasks > (2).png, flattened vs non-flattened cgroups layout (1).png, flattened vs > non-flattened cgroups layout (2).png > > > The CPU isolator needs to properly set limits for revocable and non-revocable > containers. > The proposed strategy is to use a two-way split of the cpu cgroup hierarchy > -- normal (non-revocable) and low priority (revocable) subtrees -- and to use > a biased split of CFS cpu.shares across the subtrees, e.g., a 20:1 split > (TBD). Containers would be present in only one of the subtrees. CFS quotas > will *not* be set on subtree roots, only cpu.shares. Each container would set > CFS quota and shares as done currently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2652) Update Mesos containerizer to understand revocable cpu resources
[ https://issues.apache.org/jira/browse/MESOS-2652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14630351#comment-14630351 ] Jie Yu commented on MESOS-2652: --- I did a few more experiments using the Parsec based CPU benchmark to further quantify the interferences from revocable tasks. I launched 16 instances of the benchmark (using Aurora) on 16 slaves with each instance takes 16 cpus (all available cpus on the slave). I configured the fixed resource estimator such that instance N has N revocable tasks running (each revocable task does while(1) loop burning cpus). Initially, all revocable containers have cpu.share=1024 and uses SCHED_IDLE as the scheduling policy. As you can see in the graph, the interference is proportional to the number of revocable tasks for almost all benchmarks. Later, I changed their cpu.share to be 10. As you can see. setting cpu.share to be 10 reduces the interferences a lot. Also, interestingly, the interferences is not always proportional to the number of revocable tasks on the slave after I changed the cpu.share from 1024 to 10. For some benchmarks, there is no interferences (or very little interferences) no matter how many revocable tasks are running on the same slave. > Update Mesos containerizer to understand revocable cpu resources > > > Key: MESOS-2652 > URL: https://issues.apache.org/jira/browse/MESOS-2652 > Project: Mesos > Issue Type: Task >Reporter: Vinod Kone >Assignee: Ian Downes > Labels: twitter > Fix For: 0.23.0 > > Attachments: Abnormal performance with 3 additional revocable tasks > (1).png, Abnormal performance with 3 additional revocable tasks (2).png, > Abnormal performance with 3 additional revocable tasks (3).png, Abnormal > performance with 3 additional revocable tasks (4).png, Abnormal performance > with 3 additional revocable tasks (5).png, Abnormal performance with 3 > additional revocable tasks (6).png, Abnormal performance with 3 additional > revocable tasks (7).png, Performance improvement after reducing cpu.share to > 2 for revocable tasks (1).png, Performance improvement after reducing > cpu.share to 2 for revocable tasks (10).png, Performance improvement after > reducing cpu.share to 2 for revocable tasks (2).png, Performance improvement > after reducing cpu.share to 2 for revocable tasks (3).png, Performance > improvement after reducing cpu.share to 2 for revocable tasks (4).png, > Performance improvement after reducing cpu.share to 2 for revocable tasks > (5).png, Performance improvement after reducing cpu.share to 2 for revocable > tasks (6).png, Performance improvement after reducing cpu.share to 2 for > revocable tasks (7).png, Performance improvement after reducing cpu.share to > 2 for revocable tasks (8).png, Performance improvement after reducing > cpu.share to 2 for revocable tasks (9).png, cpu.share from 1024 to 10 for > revocable tasks (1).png, cpu.share from 1024 to 10 for revocable tasks (2).png > > > The CPU isolator needs to properly set limits for revocable and non-revocable > containers. > The proposed strategy is to use a two-way split of the cpu cgroup hierarchy > -- normal (non-revocable) and low priority (revocable) subtrees -- and to use > a biased split of CFS cpu.shares across the subtrees, e.g., a 20:1 split > (TBD). Containers would be present in only one of the subtrees. CFS quotas > will *not* be set on subtree roots, only cpu.shares. Each container would set > CFS quota and shares as done currently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2652) Update Mesos containerizer to understand revocable cpu resources
[ https://issues.apache.org/jira/browse/MESOS-2652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14627174#comment-14627174 ] Jie Yu commented on MESOS-2652: --- Absolutely! I think a smart QoS controller will help as well. For instance, a OoS controller can monitor the application specific SLA or some general indicators like CPI to predict potential interferences and kill revocable tasks if needed. > Update Mesos containerizer to understand revocable cpu resources > > > Key: MESOS-2652 > URL: https://issues.apache.org/jira/browse/MESOS-2652 > Project: Mesos > Issue Type: Task >Reporter: Vinod Kone >Assignee: Ian Downes > Labels: twitter > Fix For: 0.23.0 > > Attachments: Abnormal performance with 3 additional revocable tasks > (1).png, Abnormal performance with 3 additional revocable tasks (2).png, > Abnormal performance with 3 additional revocable tasks (3).png, Abnormal > performance with 3 additional revocable tasks (4).png, Abnormal performance > with 3 additional revocable tasks (5).png, Abnormal performance with 3 > additional revocable tasks (6).png, Abnormal performance with 3 additional > revocable tasks (7).png, Performance improvement after reducing cpu.share to > 2 for revocable tasks (1).png, Performance improvement after reducing > cpu.share to 2 for revocable tasks (10).png, Performance improvement after > reducing cpu.share to 2 for revocable tasks (2).png, Performance improvement > after reducing cpu.share to 2 for revocable tasks (3).png, Performance > improvement after reducing cpu.share to 2 for revocable tasks (4).png, > Performance improvement after reducing cpu.share to 2 for revocable tasks > (5).png, Performance improvement after reducing cpu.share to 2 for revocable > tasks (6).png, Performance improvement after reducing cpu.share to 2 for > revocable tasks (7).png, Performance improvement after reducing cpu.share to > 2 for revocable tasks (8).png, Performance improvement after reducing > cpu.share to 2 for revocable tasks (9).png > > > The CPU isolator needs to properly set limits for revocable and non-revocable > containers. > The proposed strategy is to use a two-way split of the cpu cgroup hierarchy > -- normal (non-revocable) and low priority (revocable) subtrees -- and to use > a biased split of CFS cpu.shares across the subtrees, e.g., a 20:1 split > (TBD). Containers would be present in only one of the subtrees. CFS quotas > will *not* be set on subtree roots, only cpu.shares. Each container would set > CFS quota and shares as done currently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2652) Update Mesos containerizer to understand revocable cpu resources
[ https://issues.apache.org/jira/browse/MESOS-2652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14627096#comment-14627096 ] Christos Kozyrakis commented on MESOS-2652: --- You are correct Jie, the lower you set the shares for the BE tasks, the better it will be. If you give the BE group 1:1024, the BE tasks will get 0.1% of the CPU time over the long term (assuming other tasks can consume the rest 99.9%). This is a perfectly good solution to begin with, The high priority throughput tasks will never even notice. Some low latency tasks will but it's not a bad starting point at all. However, every now and then you will see some glitches on low latency tasks. Even if the HP tasks are 100% busy, the 1:1024 setting will allow the BE task to run eventually, introducing a glitch of a few msec. Keep this in mind. If at some point this becomes an issue, there are several ways to deal with this: - disallow oversubscription for the (hopefully small % of) low latency apps that care about it - fix SCHED_IDLE somehow - use CPU_sets - ... > Update Mesos containerizer to understand revocable cpu resources > > > Key: MESOS-2652 > URL: https://issues.apache.org/jira/browse/MESOS-2652 > Project: Mesos > Issue Type: Task >Reporter: Vinod Kone >Assignee: Ian Downes > Labels: twitter > Fix For: 0.23.0 > > Attachments: Abnormal performance with 3 additional revocable tasks > (1).png, Abnormal performance with 3 additional revocable tasks (2).png, > Abnormal performance with 3 additional revocable tasks (3).png, Abnormal > performance with 3 additional revocable tasks (4).png, Abnormal performance > with 3 additional revocable tasks (5).png, Abnormal performance with 3 > additional revocable tasks (6).png, Abnormal performance with 3 additional > revocable tasks (7).png, Performance improvement after reducing cpu.share to > 2 for revocable tasks (1).png, Performance improvement after reducing > cpu.share to 2 for revocable tasks (10).png, Performance improvement after > reducing cpu.share to 2 for revocable tasks (2).png, Performance improvement > after reducing cpu.share to 2 for revocable tasks (3).png, Performance > improvement after reducing cpu.share to 2 for revocable tasks (4).png, > Performance improvement after reducing cpu.share to 2 for revocable tasks > (5).png, Performance improvement after reducing cpu.share to 2 for revocable > tasks (6).png, Performance improvement after reducing cpu.share to 2 for > revocable tasks (7).png, Performance improvement after reducing cpu.share to > 2 for revocable tasks (8).png, Performance improvement after reducing > cpu.share to 2 for revocable tasks (9).png > > > The CPU isolator needs to properly set limits for revocable and non-revocable > containers. > The proposed strategy is to use a two-way split of the cpu cgroup hierarchy > -- normal (non-revocable) and low priority (revocable) subtrees -- and to use > a biased split of CFS cpu.shares across the subtrees, e.g., a 20:1 split > (TBD). Containers would be present in only one of the subtrees. CFS quotas > will *not* be set on subtree roots, only cpu.shares. Each container would set > CFS quota and shares as done currently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2652) Update Mesos containerizer to understand revocable cpu resources
[ https://issues.apache.org/jira/browse/MESOS-2652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14627077#comment-14627077 ] Jie Yu commented on MESOS-2652: --- [~kozyraki] Thank you for pointing me the paper. I looked at figure 5. Looks like you assigned equal share to both BE task and HP task ("Both memcached and the antagonist are assigned 50% share of the CPU"). I am wondering if you have tested the scenario where the BE task has a very low share comparing to the HP task (e.g., 1:100)? It was mentioned in the paper that: {quote}Coming back to Fig. 5, this fully explains why memcached achieves good quality of service when its load is lower than 12%; it is accumulating virtual runtime more slowly than the square-wave workload and always staying behind, so it never gets preempted when the square-wave workload wakes{quote} I am wondering if you assign low share to BE task, will its vruntime run much faster and stay ahead of the HP task? > Update Mesos containerizer to understand revocable cpu resources > > > Key: MESOS-2652 > URL: https://issues.apache.org/jira/browse/MESOS-2652 > Project: Mesos > Issue Type: Task >Reporter: Vinod Kone >Assignee: Ian Downes > Labels: twitter > Fix For: 0.23.0 > > Attachments: Abnormal performance with 3 additional revocable tasks > (1).png, Abnormal performance with 3 additional revocable tasks (2).png, > Abnormal performance with 3 additional revocable tasks (3).png, Abnormal > performance with 3 additional revocable tasks (4).png, Abnormal performance > with 3 additional revocable tasks (5).png, Abnormal performance with 3 > additional revocable tasks (6).png, Abnormal performance with 3 additional > revocable tasks (7).png, Performance improvement after reducing cpu.share to > 2 for revocable tasks (1).png, Performance improvement after reducing > cpu.share to 2 for revocable tasks (10).png, Performance improvement after > reducing cpu.share to 2 for revocable tasks (2).png, Performance improvement > after reducing cpu.share to 2 for revocable tasks (3).png, Performance > improvement after reducing cpu.share to 2 for revocable tasks (4).png, > Performance improvement after reducing cpu.share to 2 for revocable tasks > (5).png, Performance improvement after reducing cpu.share to 2 for revocable > tasks (6).png, Performance improvement after reducing cpu.share to 2 for > revocable tasks (7).png, Performance improvement after reducing cpu.share to > 2 for revocable tasks (8).png, Performance improvement after reducing > cpu.share to 2 for revocable tasks (9).png > > > The CPU isolator needs to properly set limits for revocable and non-revocable > containers. > The proposed strategy is to use a two-way split of the cpu cgroup hierarchy > -- normal (non-revocable) and low priority (revocable) subtrees -- and to use > a biased split of CFS cpu.shares across the subtrees, e.g., a 20:1 split > (TBD). Containers would be present in only one of the subtrees. CFS quotas > will *not* be set on subtree roots, only cpu.shares. Each container would set > CFS quota and shares as done currently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2652) Update Mesos containerizer to understand revocable cpu resources
[ https://issues.apache.org/jira/browse/MESOS-2652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14627007#comment-14627007 ] Jie Yu commented on MESOS-2652: --- Christos, I think the current situation is: SCHED_IDLE does not work across cgroups. We have to find an alternative solution. I would say lowing the cpu.share is a *best-effort* way to mitigate cpu interferences without a QoS controller. And looks like this is the only plausible solution without a QoS controller. > Update Mesos containerizer to understand revocable cpu resources > > > Key: MESOS-2652 > URL: https://issues.apache.org/jira/browse/MESOS-2652 > Project: Mesos > Issue Type: Task >Reporter: Vinod Kone >Assignee: Ian Downes > Labels: twitter > Fix For: 0.23.0 > > Attachments: Abnormal performance with 3 additional revocable tasks > (1).png, Abnormal performance with 3 additional revocable tasks (2).png, > Abnormal performance with 3 additional revocable tasks (3).png, Abnormal > performance with 3 additional revocable tasks (4).png, Abnormal performance > with 3 additional revocable tasks (5).png, Abnormal performance with 3 > additional revocable tasks (6).png, Abnormal performance with 3 additional > revocable tasks (7).png, Performance improvement after reducing cpu.share to > 2 for revocable tasks (1).png, Performance improvement after reducing > cpu.share to 2 for revocable tasks (10).png, Performance improvement after > reducing cpu.share to 2 for revocable tasks (2).png, Performance improvement > after reducing cpu.share to 2 for revocable tasks (3).png, Performance > improvement after reducing cpu.share to 2 for revocable tasks (4).png, > Performance improvement after reducing cpu.share to 2 for revocable tasks > (5).png, Performance improvement after reducing cpu.share to 2 for revocable > tasks (6).png, Performance improvement after reducing cpu.share to 2 for > revocable tasks (7).png, Performance improvement after reducing cpu.share to > 2 for revocable tasks (8).png, Performance improvement after reducing > cpu.share to 2 for revocable tasks (9).png > > > The CPU isolator needs to properly set limits for revocable and non-revocable > containers. > The proposed strategy is to use a two-way split of the cpu cgroup hierarchy > -- normal (non-revocable) and low priority (revocable) subtrees -- and to use > a biased split of CFS cpu.shares across the subtrees, e.g., a 20:1 split > (TBD). Containers would be present in only one of the subtrees. CFS quotas > will *not* be set on subtree roots, only cpu.shares. Each container would set > CFS quota and shares as done currently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2652) Update Mesos containerizer to understand revocable cpu resources
[ https://issues.apache.org/jira/browse/MESOS-2652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14626981#comment-14626981 ] Christos Kozyrakis commented on MESOS-2652: --- Ian is correct. If we just rely on shares, we will run into CFS pitfalls no matter what. See http://csl.stanford.edu/~christos/publications/2014.mutilate.eurosys.pdf figure 5 for a memcache example. For non latency critical non revocable tasks, this will not be an issue. But for latency critical tasks, you will see a slowdown. > Update Mesos containerizer to understand revocable cpu resources > > > Key: MESOS-2652 > URL: https://issues.apache.org/jira/browse/MESOS-2652 > Project: Mesos > Issue Type: Task >Reporter: Vinod Kone >Assignee: Ian Downes > Labels: twitter > Fix For: 0.23.0 > > Attachments: Abnormal performance with 3 additional revocable tasks > (1).png, Abnormal performance with 3 additional revocable tasks (2).png, > Abnormal performance with 3 additional revocable tasks (3).png, Abnormal > performance with 3 additional revocable tasks (4).png, Abnormal performance > with 3 additional revocable tasks (5).png, Abnormal performance with 3 > additional revocable tasks (6).png, Abnormal performance with 3 additional > revocable tasks (7).png, Performance improvement after reducing cpu.share to > 2 for revocable tasks (1).png, Performance improvement after reducing > cpu.share to 2 for revocable tasks (10).png, Performance improvement after > reducing cpu.share to 2 for revocable tasks (2).png, Performance improvement > after reducing cpu.share to 2 for revocable tasks (3).png, Performance > improvement after reducing cpu.share to 2 for revocable tasks (4).png, > Performance improvement after reducing cpu.share to 2 for revocable tasks > (5).png, Performance improvement after reducing cpu.share to 2 for revocable > tasks (6).png, Performance improvement after reducing cpu.share to 2 for > revocable tasks (7).png, Performance improvement after reducing cpu.share to > 2 for revocable tasks (8).png, Performance improvement after reducing > cpu.share to 2 for revocable tasks (9).png > > > The CPU isolator needs to properly set limits for revocable and non-revocable > containers. > The proposed strategy is to use a two-way split of the cpu cgroup hierarchy > -- normal (non-revocable) and low priority (revocable) subtrees -- and to use > a biased split of CFS cpu.shares across the subtrees, e.g., a 20:1 split > (TBD). Containers would be present in only one of the subtrees. CFS quotas > will *not* be set on subtree roots, only cpu.shares. Each container would set > CFS quota and shares as done currently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2652) Update Mesos containerizer to understand revocable cpu resources
[ https://issues.apache.org/jira/browse/MESOS-2652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623136#comment-14623136 ] Jie Yu commented on MESOS-2652: --- {quote}E.g., high share ratio, revokable is idle, non-revokable consumes a ton of cpu time (more than, say, the 1000:1 ratio), then goes idle, revokable then has something to do and starts running ==> now what happens if the non-revokable wants to run? Won't the revokable task continue to run until the share ratio is equalized?{quote} As far as I know, there's no such preemption mechanism exists in the kernel that we can use. Real time priority allows preemption, but realtime priority is not compatible with cgroups (http://www.novell.com/support/kb/doc.php?id=7012851). {quote}I don't know the answer without reading the scheduler source code but given that my assumption about SCHED_IDLE turned out to be incomplete/incorrect then let's understand the preemption behavior before committing another incorrect mechanism{quote} Yeah, I am using the benchmark I mentioned above to see if the new hierarchy works as expected or not. I'll probably add another latency benchmark (e.g. http://parsa.epfl.ch/cloudsuite/memcached.html) to see if latency will be affected or not. But given that we don't have a way to allow kernel preempts revocable tasks, setting shares seems to be the only solution. > Update Mesos containerizer to understand revocable cpu resources > > > Key: MESOS-2652 > URL: https://issues.apache.org/jira/browse/MESOS-2652 > Project: Mesos > Issue Type: Task >Reporter: Vinod Kone >Assignee: Ian Downes > Labels: twitter > Fix For: 0.23.0 > > Attachments: Abnormal performance with 3 additional revocable tasks > (1).png, Abnormal performance with 3 additional revocable tasks (2).png, > Abnormal performance with 3 additional revocable tasks (3).png, Abnormal > performance with 3 additional revocable tasks (4).png, Abnormal performance > with 3 additional revocable tasks (5).png, Abnormal performance with 3 > additional revocable tasks (6).png, Abnormal performance with 3 additional > revocable tasks (7).png, Performance improvement after reducing cpu.share to > 2 for revocable tasks (1).png, Performance improvement after reducing > cpu.share to 2 for revocable tasks (10).png, Performance improvement after > reducing cpu.share to 2 for revocable tasks (2).png, Performance improvement > after reducing cpu.share to 2 for revocable tasks (3).png, Performance > improvement after reducing cpu.share to 2 for revocable tasks (4).png, > Performance improvement after reducing cpu.share to 2 for revocable tasks > (5).png, Performance improvement after reducing cpu.share to 2 for revocable > tasks (6).png, Performance improvement after reducing cpu.share to 2 for > revocable tasks (7).png, Performance improvement after reducing cpu.share to > 2 for revocable tasks (8).png, Performance improvement after reducing > cpu.share to 2 for revocable tasks (9).png > > > The CPU isolator needs to properly set limits for revocable and non-revocable > containers. > The proposed strategy is to use a two-way split of the cpu cgroup hierarchy > -- normal (non-revocable) and low priority (revocable) subtrees -- and to use > a biased split of CFS cpu.shares across the subtrees, e.g., a 20:1 split > (TBD). Containers would be present in only one of the subtrees. CFS quotas > will *not* be set on subtree roots, only cpu.shares. Each container would set > CFS quota and shares as done currently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2652) Update Mesos containerizer to understand revocable cpu resources
[ https://issues.apache.org/jira/browse/MESOS-2652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623126#comment-14623126 ] Ian Downes commented on MESOS-2652: --- I'm not sure I agree with using only shares because we haven't determined the preemption behavior... if we only use shares then I don't know when the non-revokable task will preempt a running revokable task. E.g., high share ratio, revokable is idle, non-revokable consumes a ton of cpu time (more than, say, the 1000:1 ratio), then goes idle, revokable then has something to do and starts running ==> now what happens if the non-revokable wants to run? Won't the revokable task continue to run until the share ratio is equalized? Furthermore, with a flattened hierarchy, *all* revokable tasks with unequalized shares will run... I don't know the answer without reading the scheduler source code but given that my assumption about SCHED_IDLE turned out to be incomplete/incorrect then let's understand the preemption behavior before committing another incorrect mechanism :-) > Update Mesos containerizer to understand revocable cpu resources > > > Key: MESOS-2652 > URL: https://issues.apache.org/jira/browse/MESOS-2652 > Project: Mesos > Issue Type: Task >Reporter: Vinod Kone >Assignee: Ian Downes > Labels: twitter > Fix For: 0.23.0 > > Attachments: Abnormal performance with 3 additional revocable tasks > (1).png, Abnormal performance with 3 additional revocable tasks (2).png, > Abnormal performance with 3 additional revocable tasks (3).png, Abnormal > performance with 3 additional revocable tasks (4).png, Abnormal performance > with 3 additional revocable tasks (5).png, Abnormal performance with 3 > additional revocable tasks (6).png, Abnormal performance with 3 additional > revocable tasks (7).png, Performance improvement after reducing cpu.share to > 2 for revocable tasks (1).png, Performance improvement after reducing > cpu.share to 2 for revocable tasks (10).png, Performance improvement after > reducing cpu.share to 2 for revocable tasks (2).png, Performance improvement > after reducing cpu.share to 2 for revocable tasks (3).png, Performance > improvement after reducing cpu.share to 2 for revocable tasks (4).png, > Performance improvement after reducing cpu.share to 2 for revocable tasks > (5).png, Performance improvement after reducing cpu.share to 2 for revocable > tasks (6).png, Performance improvement after reducing cpu.share to 2 for > revocable tasks (7).png, Performance improvement after reducing cpu.share to > 2 for revocable tasks (8).png, Performance improvement after reducing > cpu.share to 2 for revocable tasks (9).png > > > The CPU isolator needs to properly set limits for revocable and non-revocable > containers. > The proposed strategy is to use a two-way split of the cpu cgroup hierarchy > -- normal (non-revocable) and low priority (revocable) subtrees -- and to use > a biased split of CFS cpu.shares across the subtrees, e.g., a 20:1 split > (TBD). Containers would be present in only one of the subtrees. CFS quotas > will *not* be set on subtree roots, only cpu.shares. Each container would set > CFS quota and shares as done currently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2652) Update Mesos containerizer to understand revocable cpu resources
[ https://issues.apache.org/jira/browse/MESOS-2652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623105#comment-14623105 ] Jie Yu commented on MESOS-2652: --- https://reviews.apache.org/r/36410/ https://reviews.apache.org/r/36411/ https://reviews.apache.org/r/36412/ https://reviews.apache.org/r/36413/ > Update Mesos containerizer to understand revocable cpu resources > > > Key: MESOS-2652 > URL: https://issues.apache.org/jira/browse/MESOS-2652 > Project: Mesos > Issue Type: Task >Reporter: Vinod Kone >Assignee: Ian Downes > Labels: twitter > Fix For: 0.23.0 > > Attachments: Abnormal performance with 3 additional revocable tasks > (1).png, Abnormal performance with 3 additional revocable tasks (2).png, > Abnormal performance with 3 additional revocable tasks (3).png, Abnormal > performance with 3 additional revocable tasks (4).png, Abnormal performance > with 3 additional revocable tasks (5).png, Abnormal performance with 3 > additional revocable tasks (6).png, Abnormal performance with 3 additional > revocable tasks (7).png, Performance improvement after reducing cpu.share to > 2 for revocable tasks (1).png, Performance improvement after reducing > cpu.share to 2 for revocable tasks (10).png, Performance improvement after > reducing cpu.share to 2 for revocable tasks (2).png, Performance improvement > after reducing cpu.share to 2 for revocable tasks (3).png, Performance > improvement after reducing cpu.share to 2 for revocable tasks (4).png, > Performance improvement after reducing cpu.share to 2 for revocable tasks > (5).png, Performance improvement after reducing cpu.share to 2 for revocable > tasks (6).png, Performance improvement after reducing cpu.share to 2 for > revocable tasks (7).png, Performance improvement after reducing cpu.share to > 2 for revocable tasks (8).png, Performance improvement after reducing > cpu.share to 2 for revocable tasks (9).png > > > The CPU isolator needs to properly set limits for revocable and non-revocable > containers. > The proposed strategy is to use a two-way split of the cpu cgroup hierarchy > -- normal (non-revocable) and low priority (revocable) subtrees -- and to use > a biased split of CFS cpu.shares across the subtrees, e.g., a 20:1 split > (TBD). Containers would be present in only one of the subtrees. CFS quotas > will *not* be set on subtree roots, only cpu.shares. Each container would set > CFS quota and shares as done currently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2652) Update Mesos containerizer to understand revocable cpu resources
[ https://issues.apache.org/jira/browse/MESOS-2652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14622947#comment-14622947 ] Jie Yu commented on MESOS-2652: --- As originally suggested in this ticket, a preferred way is to use a flattened cgroups layout (easy to roll forward and roll back) and set the shares of a revocable cgroup to be very small. > Update Mesos containerizer to understand revocable cpu resources > > > Key: MESOS-2652 > URL: https://issues.apache.org/jira/browse/MESOS-2652 > Project: Mesos > Issue Type: Task >Reporter: Vinod Kone >Assignee: Ian Downes > Labels: twitter > Fix For: 0.23.0 > > Attachments: Abnormal performance with 3 additional revocable tasks > (1).png, Abnormal performance with 3 additional revocable tasks (2).png, > Abnormal performance with 3 additional revocable tasks (3).png, Abnormal > performance with 3 additional revocable tasks (4).png, Abnormal performance > with 3 additional revocable tasks (5).png, Abnormal performance with 3 > additional revocable tasks (6).png, Abnormal performance with 3 additional > revocable tasks (7).png, Performance improvement after reducing cpu.share to > 2 for revocable tasks (1).png, Performance improvement after reducing > cpu.share to 2 for revocable tasks (10).png, Performance improvement after > reducing cpu.share to 2 for revocable tasks (2).png, Performance improvement > after reducing cpu.share to 2 for revocable tasks (3).png, Performance > improvement after reducing cpu.share to 2 for revocable tasks (4).png, > Performance improvement after reducing cpu.share to 2 for revocable tasks > (5).png, Performance improvement after reducing cpu.share to 2 for revocable > tasks (6).png, Performance improvement after reducing cpu.share to 2 for > revocable tasks (7).png, Performance improvement after reducing cpu.share to > 2 for revocable tasks (8).png, Performance improvement after reducing > cpu.share to 2 for revocable tasks (9).png > > > The CPU isolator needs to properly set limits for revocable and non-revocable > containers. > The proposed strategy is to use a two-way split of the cpu cgroup hierarchy > -- normal (non-revocable) and low priority (revocable) subtrees -- and to use > a biased split of CFS cpu.shares across the subtrees, e.g., a 20:1 split > (TBD). Containers would be present in only one of the subtrees. CFS quotas > will *not* be set on subtree roots, only cpu.shares. Each container would set > CFS quota and shares as done currently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2652) Update Mesos containerizer to understand revocable cpu resources
[ https://issues.apache.org/jira/browse/MESOS-2652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14622804#comment-14622804 ] Jie Yu commented on MESOS-2652: --- Re-open this ticket because we observed that setting processes scheduler policy to SCHED_IDLE does not work as expected when cgroups are used. Here is my testing environment: (1) I used a widely used open source cpu benchmark for multi-processors, called Parsec (http://parsec.cs.princeton.edu/), to test cpu performance. The idea is to launch a job (using Aurora) with each instance continuously running Parsec benchmark and reporting statistics. (2) Each instance of the job uses 16 threads (by configuring the Parsec). Each instance of the job is scheduled on a box with 16 cores. That means no other regular job can land on those boxes. (3) Use a fixed resource estimator on each slave and launch revocable tasks using no_executor_framework. Each revocable task simply does a 'while(true)' burning cpus. There is one interesting observation: one instance of the benchmark job lands on a slave that happens to have 11 revocable tasks running (each uses 1 revocable cpu). All other slaves all have 8 revocable tasks running. And that instance of the benchmark job performs consistently worse than other instances. However, after I killed the 3 extra revocable tasks, the performance improves immediately and matches that of other instances. See the attached result. To be continued... > Update Mesos containerizer to understand revocable cpu resources > > > Key: MESOS-2652 > URL: https://issues.apache.org/jira/browse/MESOS-2652 > Project: Mesos > Issue Type: Task >Reporter: Vinod Kone >Assignee: Ian Downes > Labels: twitter > Fix For: 0.23.0 > > Attachments: Abnormal performance with 3 additional revocable tasks > (1).png, Abnormal performance with 3 additional revocable tasks (2).png, > Abnormal performance with 3 additional revocable tasks (3).png, Abnormal > performance with 3 additional revocable tasks (4).png, Abnormal performance > with 3 additional revocable tasks (5).png, Abnormal performance with 3 > additional revocable tasks (6).png, Abnormal performance with 3 additional > revocable tasks (7).png > > > The CPU isolator needs to properly set limits for revocable and non-revocable > containers. > The proposed strategy is to use a two-way split of the cpu cgroup hierarchy > -- normal (non-revocable) and low priority (revocable) subtrees -- and to use > a biased split of CFS cpu.shares across the subtrees, e.g., a 20:1 split > (TBD). Containers would be present in only one of the subtrees. CFS quotas > will *not* be set on subtree roots, only cpu.shares. Each container would set > CFS quota and shares as done currently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2652) Update Mesos containerizer to understand revocable cpu resources
[ https://issues.apache.org/jira/browse/MESOS-2652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14555265#comment-14555265 ] Joris Van Remoortere commented on MESOS-2652: - [~vinodkone]Sort of: An executor using some revocable resources is definitely BE. An executor using no revocable resources might be intended to be BE. therefore: An executor using no revocable resources is not always PR. > Update Mesos containerizer to understand revocable cpu resources > > > Key: MESOS-2652 > URL: https://issues.apache.org/jira/browse/MESOS-2652 > Project: Mesos > Issue Type: Task >Reporter: Vinod Kone >Assignee: Ian Downes > Labels: twitter > > The CPU isolator needs to properly set limits for revocable and non-revocable > containers. > The proposed strategy is to use a two-way split of the cpu cgroup hierarchy > -- normal (non-revocable) and low priority (revocable) subtrees -- and to use > a biased split of CFS cpu.shares across the subtrees, e.g., a 20:1 split > (TBD). Containers would be present in only one of the subtrees. CFS quotas > will *not* be set on subtree roots, only cpu.shares. Each container would set > CFS quota and shares as done currently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2652) Update Mesos containerizer to understand revocable cpu resources
[ https://issues.apache.org/jira/browse/MESOS-2652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14554840#comment-14554840 ] Vinod Kone commented on MESOS-2652: --- [~jvanremoortere] I agree. Are you saying that the fact that the executor is using some revocable resources is not a good enough signal that an executor is intended for BE? > Update Mesos containerizer to understand revocable cpu resources > > > Key: MESOS-2652 > URL: https://issues.apache.org/jira/browse/MESOS-2652 > Project: Mesos > Issue Type: Task >Reporter: Vinod Kone >Assignee: Ian Downes > Labels: twitter > > The CPU isolator needs to properly set limits for revocable and non-revocable > containers. > The proposed strategy is to use a two-way split of the cpu cgroup hierarchy > -- normal (non-revocable) and low priority (revocable) subtrees -- and to use > a biased split of CFS cpu.shares across the subtrees, e.g., a 20:1 split > (TBD). Containers would be present in only one of the subtrees. CFS quotas > will *not* be set on subtree roots, only cpu.shares. Each container would set > CFS quota and shares as done currently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2652) Update Mesos containerizer to understand revocable cpu resources
[ https://issues.apache.org/jira/browse/MESOS-2652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552772#comment-14552772 ] Joris Van Remoortere commented on MESOS-2652: - [~idownes][~vinodkone] I chatted with [~nnielsen] about the weird behavior if an executor that is *intended to be BE* is first started with purely non-revocable resources to start (due to the revocable resource it is interested in not being available currently). My thought was to be more explicit when an executor starts up that it is intended to be BE. This way we are not **guessing** about how we should be isolating this container. I agree with [~nnielsen] that we should still fail hard if we try to add revocable resources to a PR executor. I think being explicit about the executor (as well as the resources) reduces the surface areas for bugs and hard to diagnose isolation issues. Thoughts? > Update Mesos containerizer to understand revocable cpu resources > > > Key: MESOS-2652 > URL: https://issues.apache.org/jira/browse/MESOS-2652 > Project: Mesos > Issue Type: Task >Reporter: Vinod Kone >Assignee: Ian Downes > Labels: twitter > > The CPU isolator needs to properly set limits for revocable and non-revocable > containers. > The proposed strategy is to use a two-way split of the cpu cgroup hierarchy > -- normal (non-revocable) and low priority (revocable) subtrees -- and to use > a biased split of CFS cpu.shares across the subtrees, e.g., a 20:1 split > (TBD). Containers would be present in only one of the subtrees. CFS quotas > will *not* be set on subtree roots, only cpu.shares. Each container would set > CFS quota and shares as done currently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2652) Update Mesos containerizer to understand revocable cpu resources
[ https://issues.apache.org/jira/browse/MESOS-2652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14551462#comment-14551462 ] Joris Van Remoortere commented on MESOS-2652: - Review for setting core affinity: https://reviews.apache.org/r/34442 Will base the SCHED_OTHER over SCHED_IDLE pre-emption test on this. > Update Mesos containerizer to understand revocable cpu resources > > > Key: MESOS-2652 > URL: https://issues.apache.org/jira/browse/MESOS-2652 > Project: Mesos > Issue Type: Task >Reporter: Vinod Kone >Assignee: Ian Downes > Labels: twitter > > The CPU isolator needs to properly set limits for revocable and non-revocable > containers. > The proposed strategy is to use a two-way split of the cpu cgroup hierarchy > -- normal (non-revocable) and low priority (revocable) subtrees -- and to use > a biased split of CFS cpu.shares across the subtrees, e.g., a 20:1 split > (TBD). Containers would be present in only one of the subtrees. CFS quotas > will *not* be set on subtree roots, only cpu.shares. Each container would set > CFS quota and shares as done currently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2652) Update Mesos containerizer to understand revocable cpu resources
[ https://issues.apache.org/jira/browse/MESOS-2652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14551381#comment-14551381 ] Ian Downes commented on MESOS-2652: --- Borg does prod and non-prod as coarse prioritization bands but supports different priorities within each. > Update Mesos containerizer to understand revocable cpu resources > > > Key: MESOS-2652 > URL: https://issues.apache.org/jira/browse/MESOS-2652 > Project: Mesos > Issue Type: Task >Reporter: Vinod Kone >Assignee: Ian Downes > Labels: twitter > > The CPU isolator needs to properly set limits for revocable and non-revocable > containers. > The proposed strategy is to use a two-way split of the cpu cgroup hierarchy > -- normal (non-revocable) and low priority (revocable) subtrees -- and to use > a biased split of CFS cpu.shares across the subtrees, e.g., a 20:1 split > (TBD). Containers would be present in only one of the subtrees. CFS quotas > will *not* be set on subtree roots, only cpu.shares. Each container would set > CFS quota and shares as done currently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2652) Update Mesos containerizer to understand revocable cpu resources
[ https://issues.apache.org/jira/browse/MESOS-2652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14548342#comment-14548342 ] Timothy Chen commented on MESOS-2652: - I see, and you also set SCHED_IDLE on the revocable tasks right? I was just wondering if SCHED_IDLE becomes a limiting factor that easily any other SCHED_OTHER task that might not be more important can overwhelm the tasks running on overscribed resources, since there isn't a way to express task priorities when we launch anything. > Update Mesos containerizer to understand revocable cpu resources > > > Key: MESOS-2652 > URL: https://issues.apache.org/jira/browse/MESOS-2652 > Project: Mesos > Issue Type: Task >Reporter: Vinod Kone >Assignee: Ian Downes > Labels: twitter > > The CPU isolator needs to properly set limits for revocable and non-revocable > containers. > The proposed strategy is to use a two-way split of the cpu cgroup hierarchy > -- normal (non-revocable) and low priority (revocable) subtrees -- and to use > a biased split of CFS cpu.shares across the subtrees, e.g., a 20:1 split > (TBD). Containers would be present in only one of the subtrees. CFS quotas > will *not* be set on subtree roots, only cpu.shares. Each container would set > CFS quota and shares as done currently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2652) Update Mesos containerizer to understand revocable cpu resources
[ https://issues.apache.org/jira/browse/MESOS-2652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14548394#comment-14548394 ] Timothy Chen commented on MESOS-2652: - Just chatted with Ian offline, in the future we should consider expressing some priority from the frameworks even using non-revocable resource can put tasks on low priority as well, that it's a nice balance since I think cutting on [non]revocable might be too limiting. > Update Mesos containerizer to understand revocable cpu resources > > > Key: MESOS-2652 > URL: https://issues.apache.org/jira/browse/MESOS-2652 > Project: Mesos > Issue Type: Task >Reporter: Vinod Kone >Assignee: Ian Downes > Labels: twitter > > The CPU isolator needs to properly set limits for revocable and non-revocable > containers. > The proposed strategy is to use a two-way split of the cpu cgroup hierarchy > -- normal (non-revocable) and low priority (revocable) subtrees -- and to use > a biased split of CFS cpu.shares across the subtrees, e.g., a 20:1 split > (TBD). Containers would be present in only one of the subtrees. CFS quotas > will *not* be set on subtree roots, only cpu.shares. Each container would set > CFS quota and shares as done currently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2652) Update Mesos containerizer to understand revocable cpu resources
[ https://issues.apache.org/jira/browse/MESOS-2652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14548302#comment-14548302 ] Ian Downes commented on MESOS-2652: --- CFS bandwidth quota provides and upper bound on CPU time for a task. If the non-revocable workload is variable then we can increase utilization by removing that bound for revocable CPU, given that we immediately preempt for non-revocable. Then, we just uses cpu shares to balance between the revocable tasks. > Update Mesos containerizer to understand revocable cpu resources > > > Key: MESOS-2652 > URL: https://issues.apache.org/jira/browse/MESOS-2652 > Project: Mesos > Issue Type: Task >Reporter: Vinod Kone >Assignee: Ian Downes > Labels: twitter > > The CPU isolator needs to properly set limits for revocable and non-revocable > containers. > The proposed strategy is to use a two-way split of the cpu cgroup hierarchy > -- normal (non-revocable) and low priority (revocable) subtrees -- and to use > a biased split of CFS cpu.shares across the subtrees, e.g., a 20:1 split > (TBD). Containers would be present in only one of the subtrees. CFS quotas > will *not* be set on subtree roots, only cpu.shares. Each container would set > CFS quota and shares as done currently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2652) Update Mesos containerizer to understand revocable cpu resources
[ https://issues.apache.org/jira/browse/MESOS-2652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14546563#comment-14546563 ] Timothy Chen commented on MESOS-2652: - Can you clarify what you mean in your last sentence? What does batch style jobs mean here? and you're suggesting that we use cpu shares instead when there are batch style jobs running? > Update Mesos containerizer to understand revocable cpu resources > > > Key: MESOS-2652 > URL: https://issues.apache.org/jira/browse/MESOS-2652 > Project: Mesos > Issue Type: Task >Reporter: Vinod Kone >Assignee: Ian Downes > Labels: twitter > > The CPU isolator needs to properly set limits for revocable and non-revocable > containers. > The proposed strategy is to use a two-way split of the cpu cgroup hierarchy > -- normal (non-revocable) and low priority (revocable) subtrees -- and to use > a biased split of CFS cpu.shares across the subtrees, e.g., a 20:1 split > (TBD). Containers would be present in only one of the subtrees. CFS quotas > will *not* be set on subtree roots, only cpu.shares. Each container would set > CFS quota and shares as done currently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2652) Update Mesos containerizer to understand revocable cpu resources
[ https://issues.apache.org/jira/browse/MESOS-2652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14546470#comment-14546470 ] Ian Downes commented on MESOS-2652: --- Reviews for using SCHED_IDLE: https://reviews.apache.org/r/34309/ https://reviews.apache.org/r/34310/ > Update Mesos containerizer to understand revocable cpu resources > > > Key: MESOS-2652 > URL: https://issues.apache.org/jira/browse/MESOS-2652 > Project: Mesos > Issue Type: Task >Reporter: Vinod Kone >Assignee: Ian Downes > Labels: twitter > > The CPU isolator needs to properly set limits for revocable and non-revocable > containers. > The proposed strategy is to use a two-way split of the cpu cgroup hierarchy > -- normal (non-revocable) and low priority (revocable) subtrees -- and to use > a biased split of CFS cpu.shares across the subtrees, e.g., a 20:1 split > (TBD). Containers would be present in only one of the subtrees. CFS quotas > will *not* be set on subtree roots, only cpu.shares. Each container would set > CFS quota and shares as done currently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2652) Update Mesos containerizer to understand revocable cpu resources
[ https://issues.apache.org/jira/browse/MESOS-2652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14544685#comment-14544685 ] Ian Downes commented on MESOS-2652: --- Actually, a much better way to do this would be to use separate scheduling policies with CFS. Specifically, run normal tasks with SCHED_OTHER (default) and run tasks with revokable CPU under SCHED_IDLE. This creates separate run queues and tasks (in the Linux sense) on SCHED_IDLE will only run if there's nothing on the SCHED_OTHER queue, i.e., at the resolution of the scheduler we will always run tasks from non-revokable containers over tasks in revokable containers. Further, if non-revokable containers are running batch style jobs we could *not* use CFS bandwidth quotas for revokable containers and use only cpu shares to set relative weights. These containers would then balance idle cycles appropriately, consuming whatever is left after the needs of the non-revokable containers. > Update Mesos containerizer to understand revocable cpu resources > > > Key: MESOS-2652 > URL: https://issues.apache.org/jira/browse/MESOS-2652 > Project: Mesos > Issue Type: Task >Reporter: Vinod Kone >Assignee: Ian Downes > Labels: twitter > > The CPU isolator needs to properly set limits for revocable and non-revocable > containers. > The proposed strategy is to use a two-way split of the cpu cgroup hierarchy > -- normal (non-revocable) and low priority (revocable) subtrees -- and to use > a biased split of CFS cpu.shares across the subtrees, e.g., a 20:1 split > (TBD). Containers would be present in only one of the subtrees. CFS quotas > will *not* be set on subtree roots, only cpu.shares. Each container would set > CFS quota and shares as done currently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2652) Update Mesos containerizer to understand revocable cpu resources
[ https://issues.apache.org/jira/browse/MESOS-2652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14530993#comment-14530993 ] Ian Downes commented on MESOS-2652: --- IIUC that would require constantly updating cpu.shares values across all containers anytime a container changed/was added/removed to ensure the relative weights were preserved unless you have very extreme values. I recall Ben relating that [~kozyraki] had observed strange behavior with CFS with very small values? The two way split means the aggregate of the non-revocable containers dominates the revocable; then, within each subtree, there's a weighting (proportional to the cpu) between containers of the same type. We should investigate both options and determine have each behaves. > Update Mesos containerizer to understand revocable cpu resources > > > Key: MESOS-2652 > URL: https://issues.apache.org/jira/browse/MESOS-2652 > Project: Mesos > Issue Type: Task >Reporter: Vinod Kone > > The CPU isolator needs to properly set limits for revocable and non-revocable > containers. > The proposed strategy is to use a two-way split of the cpu cgroup hierarchy > -- normal (non-revocable) and low priority (revocable) subtrees -- and to use > a biased split of CFS cpu.shares across the subtrees, e.g., a 20:1 split > (TBD). Containers would be present in only one of the subtrees. CFS quotas > will *not* be set on subtree roots, only cpu.shares. Each container would set > CFS quota and shares as done currently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2652) Update Mesos containerizer to understand revocable cpu resources
[ https://issues.apache.org/jira/browse/MESOS-2652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14530971#comment-14530971 ] Jie Yu commented on MESOS-2652: --- Instead of using a hierarchical structure, I am wondering if it's possible to still keep the existing flattened structure? We can set cpu.shares of those revokable containers to be very small (or the shares of the normal container to be very large) so that it's still negligible even if there are many revokable containers. > Update Mesos containerizer to understand revocable cpu resources > > > Key: MESOS-2652 > URL: https://issues.apache.org/jira/browse/MESOS-2652 > Project: Mesos > Issue Type: Task >Reporter: Vinod Kone > > The CPU isolator needs to properly set limits for revocable and non-revocable > containers. > The proposed strategy is to use a two-way split of the cpu cgroup hierarchy > -- normal (non-revocable) and low priority (revocable) subtrees -- and to use > a biased split of CFS cpu.shares across the subtrees, e.g., a 20:1 split > (TBD). Containers would be present in only one of the subtrees. CFS quotas > will *not* be set on subtree roots, only cpu.shares. Each container would set > CFS quota and shares as done currently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)