[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers
[ https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427374#comment-13427374 ] Arun C Murthy commented on MAPREDUCE-4334: -- Maybe I haven't been able to communicate this clear enough, please let me try again: I'd strongly go for a model where platform-specific features (e.g. cgroups, setuid etc.) are supported via the native code and *build* system (autotool chain) so that we can, from the end-user perspective, automatically deal with them via a single controlling configuration knob i.e. yarn.nodemanager.container-executor in this case. The alternative, which is various Java interfaces are much worse since now you have to configure yarn.nodemanager.container-executor, the resource-enforcer etc. This can also have configuration errors such as TasksetEnforcer in RHEL6 or CgroupsEnforcer in RHEL5 etc. The native code is, simply, far simpler option which puts the onus on us and takes the burden away from the end-user or admin. Thoughts? > Add support for CPU isolation/monitoring of containers > -- > > Key: MAPREDUCE-4334 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334 > Project: Hadoop Map/Reduce > Issue Type: Sub-task >Reporter: Arun C Murthy >Assignee: Andrew Ferguson > Attachments: MAPREDUCE-4334-executor-v1.patch, > MAPREDUCE-4334-executor-v2.patch, MAPREDUCE-4334-executor-v3.patch, > MAPREDUCE-4334-executor-v4.patch, MAPREDUCE-4334-pre1.patch, > MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre2.patch, > MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-pre3.patch, > MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch, > mapreduce-4334-design-doc-v2.txt, mapreduce-4334-design-doc.txt > > > Once we get in MAPREDUCE-4327, it will be important to actually enforce > limits on CPU consumption of containers. > Several options spring to mind: > # taskset (RHEL5+) > # cgroups (RHEL6+) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers
[ https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427366#comment-13427366 ] Arun C Murthy commented on MAPREDUCE-4334: -- Alejandro, LCE accomplishes 2 things: # It serves as a 'root' tool with the setuid bit # It serves as the home for Linux-specific container maintenance code Now, for other platforms you have to add other ContainerExecutors anyway for e.g. branch-1-win has a WindowsTaskController which will be ported over to trunk as WindowsContainerExecutor. As, a result, I very much like to continue keeping the Linux-specific bits in LCE. Furthermore, with native code it is much, much easier to have platform-specific low-level code i.e. we can use autotools chain to resolve RHEL5 v/s RHEL6 etc. Doing that via Java plugins is very, very painful and leads to proliferation of interfaces and configurations. The native code is something we can deal very easily via Bigtop and other packaging projects. Thoughts? > Add support for CPU isolation/monitoring of containers > -- > > Key: MAPREDUCE-4334 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334 > Project: Hadoop Map/Reduce > Issue Type: Sub-task >Reporter: Arun C Murthy >Assignee: Andrew Ferguson > Attachments: MAPREDUCE-4334-executor-v1.patch, > MAPREDUCE-4334-executor-v2.patch, MAPREDUCE-4334-executor-v3.patch, > MAPREDUCE-4334-executor-v4.patch, MAPREDUCE-4334-pre1.patch, > MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre2.patch, > MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-pre3.patch, > MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch, > mapreduce-4334-design-doc-v2.txt, mapreduce-4334-design-doc.txt > > > Once we get in MAPREDUCE-4327, it will be important to actually enforce > limits on CPU consumption of containers. > Several options spring to mind: > # taskset (RHEL5+) > # cgroups (RHEL6+) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers
[ https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427367#comment-13427367 ] Arun C Murthy commented on MAPREDUCE-4334: -- Also, I'll add that since cgroups is Linux-specific anyway, I don't see how it will be used on other platforms i.e. Windows. > Add support for CPU isolation/monitoring of containers > -- > > Key: MAPREDUCE-4334 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334 > Project: Hadoop Map/Reduce > Issue Type: Sub-task >Reporter: Arun C Murthy >Assignee: Andrew Ferguson > Attachments: MAPREDUCE-4334-executor-v1.patch, > MAPREDUCE-4334-executor-v2.patch, MAPREDUCE-4334-executor-v3.patch, > MAPREDUCE-4334-executor-v4.patch, MAPREDUCE-4334-pre1.patch, > MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre2.patch, > MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-pre3.patch, > MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch, > mapreduce-4334-design-doc-v2.txt, mapreduce-4334-design-doc.txt > > > Once we get in MAPREDUCE-4327, it will be important to actually enforce > limits on CPU consumption of containers. > Several options spring to mind: > # taskset (RHEL5+) > # cgroups (RHEL6+) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers
[ https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427128#comment-13427128 ] Alejandro Abdelnur commented on MAPREDUCE-4334: --- Arun, if somebody is willing to install cgrulesengd/cgexec in the nodes then there is no need for super-user privileges;p plus, any CE could be used (unmodified) with a ResourceEnforcer injecting cgexec to the launcher invocation. This has also the benefit that if we add more resource dimensions (last bullet above), CE implementations would not need to change, only the ResourceEnforcer. Which means, no code duplication, the cgroup configuration logic lives once, in the ResourceEnforcer, as opposed to every CE that wants to support cgroups. Finally, I like the fact that with the ResourceEnforcer we are doing a clean separation of responsibilities between the ResourceEnforcer (configures) and the ContainerExecutor (executes), IMO this separation will simplify making improvements in each one of them without risk of mixing these 2 responsibilities. > Add support for CPU isolation/monitoring of containers > -- > > Key: MAPREDUCE-4334 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334 > Project: Hadoop Map/Reduce > Issue Type: Sub-task >Reporter: Arun C Murthy >Assignee: Andrew Ferguson > Attachments: MAPREDUCE-4334-executor-v1.patch, > MAPREDUCE-4334-executor-v2.patch, MAPREDUCE-4334-executor-v3.patch, > MAPREDUCE-4334-executor-v4.patch, MAPREDUCE-4334-pre1.patch, > MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre2.patch, > MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-pre3.patch, > MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch, > mapreduce-4334-design-doc-v2.txt, mapreduce-4334-design-doc.txt > > > Once we get in MAPREDUCE-4327, it will be important to actually enforce > limits on CPU consumption of containers. > Several options spring to mind: > # taskset (RHEL5+) > # cgroups (RHEL6+) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers
[ https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427120#comment-13427120 ] Arun C Murthy commented on MAPREDUCE-4334: -- Alejandro - I'm thinking that since *only* LCE can use cgroups (due to necessary super-user privs etc.), it's simpler to do minimal changes to LCE to create/encapsulate into cgroups. Thoughts? > Add support for CPU isolation/monitoring of containers > -- > > Key: MAPREDUCE-4334 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334 > Project: Hadoop Map/Reduce > Issue Type: Sub-task >Reporter: Arun C Murthy >Assignee: Andrew Ferguson > Attachments: MAPREDUCE-4334-executor-v1.patch, > MAPREDUCE-4334-executor-v2.patch, MAPREDUCE-4334-executor-v3.patch, > MAPREDUCE-4334-executor-v4.patch, MAPREDUCE-4334-pre1.patch, > MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre2.patch, > MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-pre3.patch, > MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch, > mapreduce-4334-design-doc-v2.txt, mapreduce-4334-design-doc.txt > > > Once we get in MAPREDUCE-4327, it will be important to actually enforce > limits on CPU consumption of containers. > Several options spring to mind: > # taskset (RHEL5+) > # cgroups (RHEL6+) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers
[ https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13426978#comment-13426978 ] Alejandro Abdelnur commented on MAPREDUCE-4334: --- I like to introduce the ResourceEnforcer interface for the following reasons: * It provides clean lifecycle hooks for initializing/configuring/cleanup cgroups, leaving to the LCE just the the actual binding. * It will work with multiple container executors as oposed to LCE only. * Makes the changes in the LCE minimal (IMO, the less logic with put in native code the better). * taskset could easily be implemented as a ResourceEnforcer. * If we eventually want to control other resources via cgroups (such as memory/disk/network), only the ResourceEnforcer would require changes. Fair enough? > Add support for CPU isolation/monitoring of containers > -- > > Key: MAPREDUCE-4334 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334 > Project: Hadoop Map/Reduce > Issue Type: Sub-task >Reporter: Arun C Murthy >Assignee: Andrew Ferguson > Attachments: MAPREDUCE-4334-executor-v1.patch, > MAPREDUCE-4334-executor-v2.patch, MAPREDUCE-4334-executor-v3.patch, > MAPREDUCE-4334-executor-v4.patch, MAPREDUCE-4334-pre1.patch, > MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre2.patch, > MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-pre3.patch, > MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch, > mapreduce-4334-design-doc-v2.txt, mapreduce-4334-design-doc.txt > > > Once we get in MAPREDUCE-4327, it will be important to actually enforce > limits on CPU consumption of containers. > Several options spring to mind: > # taskset (RHEL5+) > # cgroups (RHEL6+) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers
[ https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13426961#comment-13426961 ] Arun C Murthy commented on MAPREDUCE-4334: -- Thanks tucu, this is getting close. Please help me understand if the following (simpler) proposal will work: # NM calls LCE.launchContainer with the cpu-set. # LCE will create the necessary cgroup if necessary # LCE will launch the process within the cgroup Pros: This way, we avoid new interfaces such as ResourceEnforcer and we can also use taskset if necessary. Taskset should also work for DefaultContianerExecutor. Thoughts? > Add support for CPU isolation/monitoring of containers > -- > > Key: MAPREDUCE-4334 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334 > Project: Hadoop Map/Reduce > Issue Type: Sub-task >Reporter: Arun C Murthy >Assignee: Andrew Ferguson > Attachments: MAPREDUCE-4334-executor-v1.patch, > MAPREDUCE-4334-executor-v2.patch, MAPREDUCE-4334-executor-v3.patch, > MAPREDUCE-4334-executor-v4.patch, MAPREDUCE-4334-pre1.patch, > MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre2.patch, > MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-pre3.patch, > MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch, > mapreduce-4334-design-doc-v2.txt, mapreduce-4334-design-doc.txt > > > Once we get in MAPREDUCE-4327, it will be important to actually enforce > limits on CPU consumption of containers. > Several options spring to mind: > # taskset (RHEL5+) > # cgroups (RHEL6+) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers
[ https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13426957#comment-13426957 ] Alejandro Abdelnur commented on MAPREDUCE-4334: --- I was chatting offline with Arun about this JIRA. His key concern is that it should be possible to use cgroups without requiring the installation of additional packages and extra OS configuration. As the LinuxContainerExecutor already runs as root, we can leverage that to create the cgroup mounts. This means that the LinuxContainerExecutor is required to use cgroups with zero configuration. While typically the LinuxContainerExecutor is used in secure clusters, still it can be used in non-secure cluster always running as the mapred user (which would be the equivalent of the DefaultContainerExecutor). Given this how about the following proposal? This approach will not depend on cgexec binary being installed. * The LinuxContainerExecutor would have 2 new options. ** --cgroupsinit : This option will be used for initialization. When invoked with this option, the LCE will create the cgroup mount point would and give owmership of it to the yarn user. Then it will complete its execution. ** --cgroup : This option will be used for launching containers. When invoked with this option, the LCE will add the process to specified cgroup paramerer. * The ResourceEnforcer will have the following methods (exactly as in the latest patch): ** init(): called when the RM is initialized. ** preExecute(containerId, Resource): called before launching the container. ** wrapCommand(containerId, command): augments the execution command line before launching. ** postExecute(containerId): called after launching the container. * A default implementation of the ResourceEnforcer will do NOPs. * The CgroupsResourceEnforcer implementation will do the following: ** init(): call LCE --cgroupsinit ** preExecute(containerId, Resource): configure the cgroup with the assigned cpu resources. ** wrapCommand(containerId, command): augments regular LCE invocation with the -cgroup option. ** postExecute(containerId): any necessary cgroup clean up. > Add support for CPU isolation/monitoring of containers > -- > > Key: MAPREDUCE-4334 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334 > Project: Hadoop Map/Reduce > Issue Type: Sub-task >Reporter: Arun C Murthy >Assignee: Andrew Ferguson > Attachments: MAPREDUCE-4334-executor-v1.patch, > MAPREDUCE-4334-executor-v2.patch, MAPREDUCE-4334-executor-v3.patch, > MAPREDUCE-4334-executor-v4.patch, MAPREDUCE-4334-pre1.patch, > MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre2.patch, > MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-pre3.patch, > MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch, > mapreduce-4334-design-doc-v2.txt, mapreduce-4334-design-doc.txt > > > Once we get in MAPREDUCE-4327, it will be important to actually enforce > limits on CPU consumption of containers. > Several options spring to mind: > # taskset (RHEL5+) > # cgroups (RHEL6+) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers
[ https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425383#comment-13425383 ] Karthik Kambatla commented on MAPREDUCE-4334: - +1 on design - 2(b), and the patch looks good. > Add support for CPU isolation/monitoring of containers > -- > > Key: MAPREDUCE-4334 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334 > Project: Hadoop Map/Reduce > Issue Type: Sub-task >Reporter: Arun C Murthy >Assignee: Andrew Ferguson > Attachments: MAPREDUCE-4334-executor-v1.patch, > MAPREDUCE-4334-executor-v2.patch, MAPREDUCE-4334-executor-v3.patch, > MAPREDUCE-4334-executor-v4.patch, MAPREDUCE-4334-pre1.patch, > MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre2.patch, > MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-pre3.patch, > MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch, > mapreduce-4334-design-doc-v2.txt, mapreduce-4334-design-doc.txt > > > Once we get in MAPREDUCE-4327, it will be important to actually enforce > limits on CPU consumption of containers. > Several options spring to mind: > # taskset (RHEL5+) > # cgroups (RHEL6+) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers
[ https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425370#comment-13425370 ] Alejandro Abdelnur commented on MAPREDUCE-4334: --- I like the current patch, it does not add complexity and it will be trivial to wire it with MAPREDUCE-4327 once CPU units are part of resources. > Add support for CPU isolation/monitoring of containers > -- > > Key: MAPREDUCE-4334 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334 > Project: Hadoop Map/Reduce > Issue Type: Sub-task >Reporter: Arun C Murthy >Assignee: Andrew Ferguson > Attachments: MAPREDUCE-4334-executor-v1.patch, > MAPREDUCE-4334-executor-v2.patch, MAPREDUCE-4334-executor-v3.patch, > MAPREDUCE-4334-executor-v4.patch, MAPREDUCE-4334-pre1.patch, > MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre2.patch, > MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-pre3.patch, > MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch, > mapreduce-4334-design-doc-v2.txt, mapreduce-4334-design-doc.txt > > > Once we get in MAPREDUCE-4327, it will be important to actually enforce > limits on CPU consumption of containers. > Several options spring to mind: > # taskset (RHEL5+) > # cgroups (RHEL6+) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers
[ https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13423323#comment-13423323 ] Arun C Murthy commented on MAPREDUCE-4334: -- Andrew - I'll ask again. Can you please provide a simple writeup? I'm confused seeing new interfaces pop-up in every new patch. Thanks. > Add support for CPU isolation/monitoring of containers > -- > > Key: MAPREDUCE-4334 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334 > Project: Hadoop Map/Reduce > Issue Type: Sub-task >Reporter: Arun C Murthy >Assignee: Andrew Ferguson > Attachments: MAPREDUCE-4334-executor-v1.patch, > MAPREDUCE-4334-executor-v2.patch, MAPREDUCE-4334-pre1.patch, > MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre2.patch, > MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-pre3.patch, > MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch > > > Once we get in MAPREDUCE-4327, it will be important to actually enforce > limits on CPU consumption of containers. > Several options spring to mind: > # taskset (RHEL5+) > # cgroups (RHEL6+) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers
[ https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13423294#comment-13423294 ] Andrew Ferguson commented on MAPREDUCE-4334: Hi, bq. Why the ResourceEnforcer is bubble up all the way to the NodeManager instead just being instantiated & configured in the ContainerLauncher where it seems the use of before() & after() and then passed to the ContainerExecutor as a parameter in the launchContainer() method? the reason is because I was trying to pattern-match how the ContainerExecutor works, and the ContainerExecutor is instantiated by the NodeManager. If you think it makes more sense to break with the pattern and keep the ResourceEnforcer localized to the ContainersLauncher, then I certainly do that. thanks! I will incorporate your other comments into the patch. Andrew > Add support for CPU isolation/monitoring of containers > -- > > Key: MAPREDUCE-4334 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334 > Project: Hadoop Map/Reduce > Issue Type: Sub-task >Reporter: Arun C Murthy >Assignee: Andrew Ferguson > Attachments: MAPREDUCE-4334-executor-v1.patch, > MAPREDUCE-4334-executor-v2.patch, MAPREDUCE-4334-pre1.patch, > MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre2.patch, > MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-pre3.patch, > MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch > > > Once we get in MAPREDUCE-4327, it will be important to actually enforce > limits on CPU consumption of containers. > Several options spring to mind: > # taskset (RHEL5+) > # cgroups (RHEL6+) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers
[ https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13423151#comment-13423151 ] Alejandro Abdelnur commented on MAPREDUCE-4334: --- I like the approach much better than the previous patch. * Why the ResourceEnforcer is bubble up all the way to the NodeManager instead just being instantiated & configured in the ContainerLauncher where it seems the use of before() & after() and then passed to the ContainerExecutor as a parameter in the launchContainer() method? * The method names in the ResourceEnforcer seem a bit off. How about the following alternative names: before() -> preLaunch(), after() -> postLaunch() & commandPrefix -> wrapLauncherCommand() * Instead having an init(Configuration conf) method in the ResourceEnforcer why not make it implement Configurable and have an init() method. Then the configuration is set at instantiation by the ReflectionUtils.newInstance() ? > Add support for CPU isolation/monitoring of containers > -- > > Key: MAPREDUCE-4334 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334 > Project: Hadoop Map/Reduce > Issue Type: Sub-task >Reporter: Arun C Murthy >Assignee: Andrew Ferguson > Attachments: MAPREDUCE-4334-executor-v1.patch, > MAPREDUCE-4334-executor-v2.patch, MAPREDUCE-4334-pre1.patch, > MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre2.patch, > MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-pre3.patch, > MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch > > > Once we get in MAPREDUCE-4327, it will be important to actually enforce > limits on CPU consumption of containers. > Several options spring to mind: > # taskset (RHEL5+) > # cgroups (RHEL6+) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers
[ https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13422740#comment-13422740 ] Andrew Ferguson commented on MAPREDUCE-4334: Forgot to mention that this version requires the "cgexec" binary, which, while not required for cgroups, is commonly available. If we choose not to introduce a dependency on cgexec, then we can return to modifying the C code in LinuxContainerExecutor, as the previous version of this patch did. thanks, Andrew > Add support for CPU isolation/monitoring of containers > -- > > Key: MAPREDUCE-4334 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334 > Project: Hadoop Map/Reduce > Issue Type: Sub-task >Reporter: Arun C Murthy >Assignee: Andrew Ferguson > Attachments: MAPREDUCE-4334-executor-v1.patch, > MAPREDUCE-4334-executor-v2.patch, MAPREDUCE-4334-pre1.patch, > MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre2.patch, > MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-pre3.patch, > MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch > > > Once we get in MAPREDUCE-4327, it will be important to actually enforce > limits on CPU consumption of containers. > Several options spring to mind: > # taskset (RHEL5+) > # cgroups (RHEL6+) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers
[ https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13421908#comment-13421908 ] Andrew Ferguson commented on MAPREDUCE-4334: Hi Alejandro, thanks very much for looking at the patch & for the feedback. indeed, the patch should come with a no-op version which is enabled by default. (the current patch simply fails to find any cgroups if they are not configured, and then skips trying to use them.) I will update the patch tomorrow so it continues to have a lower impact on the codebase. thanks, Andrew > Add support for CPU isolation/monitoring of containers > -- > > Key: MAPREDUCE-4334 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334 > Project: Hadoop Map/Reduce > Issue Type: Sub-task >Reporter: Arun C Murthy >Assignee: Andrew Ferguson > Attachments: MAPREDUCE-4334-executor-v1.patch, > MAPREDUCE-4334-pre1.patch, MAPREDUCE-4334-pre2-with_cpu.patch, > MAPREDUCE-4334-pre2.patch, MAPREDUCE-4334-pre3-with_cpu.patch, > MAPREDUCE-4334-pre3.patch, MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch > > > Once we get in MAPREDUCE-4327, it will be important to actually enforce > limits on CPU consumption of containers. > Several options spring to mind: > # taskset (RHEL5+) > # cgroups (RHEL6+) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers
[ https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13421840#comment-13421840 ] Alejandro Abdelnur commented on MAPREDUCE-4334: --- Patch has TAB characters, it should not. Indentation should be 2 spaces. * ContainerExecutor.java Instead having 2 different ConcurrentMaps, why not having one holding a data structure for pidFiles and cgroupFiles? Why do we need read/write locsk when accessing a ConcurrentMap? * DefaultContainerExecutor.java The for loop adding the process ID to the cgroup should be within { }, even if it is a single line. * CgroupsCreator.java Shouldn't, at initialization, enabled/disable itself based on a config property that indicates if Cgroups are enabled or not? And if disabled all methods would be NOP? > Add support for CPU isolation/monitoring of containers > -- > > Key: MAPREDUCE-4334 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334 > Project: Hadoop Map/Reduce > Issue Type: Sub-task >Reporter: Arun C Murthy >Assignee: Andrew Ferguson > Attachments: MAPREDUCE-4334-executor-v1.patch, > MAPREDUCE-4334-pre1.patch, MAPREDUCE-4334-pre2-with_cpu.patch, > MAPREDUCE-4334-pre2.patch, MAPREDUCE-4334-pre3-with_cpu.patch, > MAPREDUCE-4334-pre3.patch, MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch > > > Once we get in MAPREDUCE-4327, it will be important to actually enforce > limits on CPU consumption of containers. > Several options spring to mind: > # taskset (RHEL5+) > # cgroups (RHEL6+) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers
[ https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13420933#comment-13420933 ] Andrew Ferguson commented on MAPREDUCE-4334: Hi Hari, In my experiments, there are usually 200-400ms between starting to create the cgroups and having the process completely inside them. This number is likely an upper-bound, as the experiments are in pseudo-distributed mode on a VM. Note that in the design represented by this patch, I move the process into the cgroup asynchronously, so the latency is not incurred while starting the process. However, in my reading of Arun's comments, he would prefer that the cgroups be created synchronously while starting the job. I am currently in the progress of making this change. While I suspect the cost may not be as high as 200-400ms, it will of course be non-zero. :-) cheers, Andrew > Add support for CPU isolation/monitoring of containers > -- > > Key: MAPREDUCE-4334 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334 > Project: Hadoop Map/Reduce > Issue Type: Sub-task >Reporter: Arun C Murthy >Assignee: Andrew Ferguson > Attachments: MAPREDUCE-4334-pre1.patch, > MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre2.patch, > MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-pre3.patch, > MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch > > > Once we get in MAPREDUCE-4327, it will be important to actually enforce > limits on CPU consumption of containers. > Several options spring to mind: > # taskset (RHEL5+) > # cgroups (RHEL6+) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers
[ https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13420366#comment-13420366 ] Hari Mankude commented on MAPREDUCE-4334: - Relevant information would be the performance impact of running maps and reduces in cgroups in terms of latency. Overall, this would be a very useful feature since it is possible to add fencing around cpu/io resources in addition to memory usage for MR tasks. > Add support for CPU isolation/monitoring of containers > -- > > Key: MAPREDUCE-4334 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334 > Project: Hadoop Map/Reduce > Issue Type: Sub-task >Reporter: Arun C Murthy >Assignee: Andrew Ferguson > Attachments: MAPREDUCE-4334-pre1.patch, > MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre2.patch, > MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-pre3.patch, > MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch > > > Once we get in MAPREDUCE-4327, it will be important to actually enforce > limits on CPU consumption of containers. > Several options spring to mind: > # taskset (RHEL5+) > # cgroups (RHEL6+) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers
[ https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13420363#comment-13420363 ] Bikas Saha commented on MAPREDUCE-4334: --- Aside from a design proposal I would be really interested in seeing how exactly cgroups work in the context of our typical workload. Say, take a bunch of typical mappers and reducers and run them in isolation. Then run them in isolation within cgroups. Is there a difference? Now run them concurrently with and without cgroups. What are the observations? These experiments may lead to expected or unexpected results and would be a great addition to the design pros and cons. Perhaps you have already run those experiments. If yes, care sharing the results. > Add support for CPU isolation/monitoring of containers > -- > > Key: MAPREDUCE-4334 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334 > Project: Hadoop Map/Reduce > Issue Type: Sub-task >Reporter: Arun C Murthy >Assignee: Andrew Ferguson > Attachments: MAPREDUCE-4334-pre1.patch, > MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre2.patch, > MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-pre3.patch, > MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch > > > Once we get in MAPREDUCE-4327, it will be important to actually enforce > limits on CPU consumption of containers. > Several options spring to mind: > # taskset (RHEL5+) > # cgroups (RHEL6+) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers
[ https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13420347#comment-13420347 ] Andrew Ferguson commented on MAPREDUCE-4334: Hi Arun, I feel like we've been discussing pros & cons for the length of this JIRA. :-) I think, perhaps, I proposed too large of a change across this issue and MAPREDUCE-4351: cgroups for cpu, cgroups for memory, a code refactoring, etc. Instead, I would like to make a smaller change, with just cgroups for CPUs and place them in each launcher's code, as you requested above. Perhaps a better re-factoring than I suggested with the ContainersMonitor will become clear afterwards. How does this sound to you? I was planning to finish it up on Monday. best, Andrew > Add support for CPU isolation/monitoring of containers > -- > > Key: MAPREDUCE-4334 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334 > Project: Hadoop Map/Reduce > Issue Type: Sub-task >Reporter: Arun C Murthy >Assignee: Andrew Ferguson > Attachments: MAPREDUCE-4334-pre1.patch, > MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre2.patch, > MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-pre3.patch, > MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch > > > Once we get in MAPREDUCE-4327, it will be important to actually enforce > limits on CPU consumption of containers. > Several options spring to mind: > # taskset (RHEL5+) > # cgroups (RHEL6+) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers
[ https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419952#comment-13419952 ] Arun C Murthy commented on MAPREDUCE-4334: -- bq. I disagree. Andrew, it seems we are stuck in the weeds debating minutia of the code. Let's take a step back. Can you please start by providing a writeup about your approach(es) and pros/cons? Thanks. > Add support for CPU isolation/monitoring of containers > -- > > Key: MAPREDUCE-4334 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334 > Project: Hadoop Map/Reduce > Issue Type: Sub-task >Reporter: Arun C Murthy >Assignee: Andrew Ferguson > Attachments: MAPREDUCE-4334-pre1.patch, > MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre2.patch, > MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-pre3.patch, > MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch > > > Once we get in MAPREDUCE-4327, it will be important to actually enforce > limits on CPU consumption of containers. > Several options spring to mind: > # taskset (RHEL5+) > # cgroups (RHEL6+) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers
[ https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13415528#comment-13415528 ] Andrew Ferguson commented on MAPREDUCE-4334: bq. Now, it seems like we should enhance the container-launch via LCE to just set the requisite cgroups or sched_affinity prior-to or right-after the container launch, rather than make them apis. That would be the safest, no? I disagree. That was the first approach I took for implementing this, but found it to be unsatisfactory for several reasons. See: https://issues.apache.org/jira/browse/MAPREDUCE-4334?focusedCommentId=13413913&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13413913 starting at "My first design for this..." > Add support for CPU isolation/monitoring of containers > -- > > Key: MAPREDUCE-4334 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334 > Project: Hadoop Map/Reduce > Issue Type: Sub-task >Reporter: Arun C Murthy >Assignee: Andrew Ferguson > Attachments: MAPREDUCE-4334-pre1.patch, > MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre2.patch, > MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-pre3.patch > > > Once we get in MAPREDUCE-4327, it will be important to actually enforce > limits on CPU consumption of containers. > Several options spring to mind: > # taskset (RHEL5+) > # cgroups (RHEL6+) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers
[ https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13415524#comment-13415524 ] Robert Joseph Evans commented on MAPREDUCE-4334: I agree with Bikas and Arun to a point. I can see some situations, like running a multi-tenent Hadoop cloud where you do want strict isolation. So that the people who are paying a premium to get consistent results from their part of the cluster never have to worry about someone else doing something really bad on another part of the cluster. Is this enough of a concern to make it the default, I would say no. Is it enough of a concern to make it an option that comes with and is maintained by Hadoop, that is TBD, I don't plan on running my clusters that way, but I am not the only Hadoop customer. Arun, didn't you mention something at Hadoop Summit about some discussions you had with people who want full VMs to run their containers in specifically for isolation purposes? As for memory spikes, at least on Linux I thought you could configure swap on Linux containers so that if a container goes over its budget, i.e. spikes, then it swaps to disk instead of launching the OOM killer. I could be wrong, I have not dug into it very much. > Add support for CPU isolation/monitoring of containers > -- > > Key: MAPREDUCE-4334 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334 > Project: Hadoop Map/Reduce > Issue Type: Sub-task >Reporter: Arun C Murthy >Assignee: Andrew Ferguson > Attachments: MAPREDUCE-4334-pre1.patch, > MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre2.patch, > MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-pre3.patch > > > Once we get in MAPREDUCE-4327, it will be important to actually enforce > limits on CPU consumption of containers. > Several options spring to mind: > # taskset (RHEL5+) > # cgroups (RHEL6+) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers
[ https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13415470#comment-13415470 ] Arun C Murthy commented on MAPREDUCE-4334: -- {quote} So, concretely, this is my proposal: recognize the LCE binary as the "hadoop root tool" the LCE will have two new functionalities: 1) sched_setaffinity and 2) creating cgroups in addition to the patch above, I will create 1) another pluggable ContainersMonitor which can use these new functions (sched_setaffinity) and 2) adapt the one above to optionally use the (creating cgroups) functionality of the "hadoop root tool" {quote} Thanks, looks like we finally are on the same page - it's what I've been proposing for a while now. Now, it seems like we should enhance the container-launch via LCE to just set the requisite cgroups or sched_affinity prior-to or right-after the container launch, rather than make them apis. That would be the safest, no? > Add support for CPU isolation/monitoring of containers > -- > > Key: MAPREDUCE-4334 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334 > Project: Hadoop Map/Reduce > Issue Type: Sub-task >Reporter: Arun C Murthy >Assignee: Andrew Ferguson > Attachments: MAPREDUCE-4334-pre1.patch, > MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre2.patch, > MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-pre3.patch > > > Once we get in MAPREDUCE-4327, it will be important to actually enforce > limits on CPU consumption of containers. > Several options spring to mind: > # taskset (RHEL5+) > # cgroups (RHEL6+) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers
[ https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13415468#comment-13415468 ] Arun C Murthy commented on MAPREDUCE-4334: -- Good points Bikas, I tend to agree with them. In the past we used OS limits (via ulimit) and had several issues with temporary spikes (particularly with Java processes forking) and hence we moved away from OS limits to custom built one which ignores spikes etc. > Add support for CPU isolation/monitoring of containers > -- > > Key: MAPREDUCE-4334 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334 > Project: Hadoop Map/Reduce > Issue Type: Sub-task >Reporter: Arun C Murthy >Assignee: Andrew Ferguson > Attachments: MAPREDUCE-4334-pre1.patch, > MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre2.patch, > MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-pre3.patch > > > Once we get in MAPREDUCE-4327, it will be important to actually enforce > limits on CPU consumption of containers. > Several options spring to mind: > # taskset (RHEL5+) > # cgroups (RHEL6+) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers
[ https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13415452#comment-13415452 ] Andrew Ferguson commented on MAPREDUCE-4334: Hi Bikas, thanks for thinking about this! Comments inline: bq. Somewhere in this thread it was mentioned controlling memory via OS. In my experience this is not an optimal choice because bq. 1) makes it hard to debug task failures due to memory issues. Abrupt OS termination or denial or more memory resulting in NPE/bad pointers etc. Its better to just monitor the memory and then enforce limits with clear error message saying - task was terminated because it used more memory than alloted. On Linux, enforcing memory limits via Cgroups feels a bit like simply running a process on a machine with less memory installed. When the memory allocation is pushing the threshold, the Linux OOM killer destroys the task. The patch above detects that the process has been killed and logs a error message indicating that the task was killed for consuming too many reousrces. bq. 2) due to different scenarios, tasks may have memory spikes or temporary increases. The OS will enforce tight limits but NodeManager monitoring can be more flexible and not terminate a task because it shot to 2.1GB instead of staying under 2. I would argue that the strict enforcement of Cgroups is exactly the behavior we want because it provides isolation. If two containers are running on a node with 4 GB of RAM, and each are using 2 GB, and one happens to spike to 3 GB momentarily, the spiking container should suffer -- if we continue monitoring the memory as done today, then the well-behaved container might suffer by being swapped-out to make room for the spiking container. I believe the spiking concern is mitigated by the fact that Cgroups allows you to set both a physical memory limit, and a virtual memory limit (which my patch above makes use of). For example, I set the physical memory limit to say, 1 GB of RAM, and the virtual memory limit to 2.1 GB. When a process momentarily spikes above it's 1 GB of RAM, it will be allocated memory from swap without a problem. This is configurable by the already extant "yarn.nodemanager.vmem-pmem-ratio" setting. bq. Disk scheduling and monitoring would be a hard to achieve goal with multiple writers to disk spinning things their own way and expecting something that will likely not happen. Sure, it is tricky, and the feasibility depends on the semantics YARN promises applications. However, the Linux Completely Fair Queuing I/O scheduler has semantics which are quite similar to the semantics I'm proposing we promise for CPUs (proportional weights). The blkio Cgroup subsystem already today provides both proportional sharing and throttling: http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/ch-Subsystems_and_Tunable_Parameters.html#sec-blkio bq. Network scheduling and monitoring shares choke points at multiple levels beyond the machines and trying to optimally and proportionally use the network tends to be a problem thats better served globally. YARN is a global scheduler. Linux traffic controls [1], in combination with the network controller for Cgroups, can be used to implement the results of Seawall [2], FairCloud [3], and similar projects. There are many datacenter designs these days; some will be a perfect match for end-host-only bandwidth control, and others an imperfect match. While end-host-only bandwidth control is not a magic bullet, I strongly believe that it is both useful enough, and easy enough to implement, to warrant pursuit. bq. My 2 cents would be to limit this to just CPU for now. It is. However, I believe the patch above is easily extensible to other resources (you can see for yourself that there is a small difference between the memory-only patch, and the memory+cpu patch). bq. Based on the comments above, I would agree that we need to make sure platform specific stuff should not leak into the code so that other platforms (imminently Windows) can support this stuff. Totally agree. That's why I proposed making it pluggable with MAPREDUCE-4351. bq. An alternative to pluggable ContainersMonitor would be to make CPU management a pluggable component of ContainersManager. My POV is that ContainersManager manages the resources of containers and has logic that will be common across platforms. The tools it uses will change. Eg. ProcfsBaseProcessTree is the tool used to monitor and manage memory. I can see that being changed to a MemoryMonitor interface with platform specific implementations. Thats whats happening on the Windows port in branch 1. I can see a CPUMonitor interface for CPU. Or maybe a ResourceMonitor that has methods for both memory and CPU. I'm afraid I'm a bit confused by your suggestion here -- ContainersMonitor is a
[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers
[ https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13414567#comment-13414567 ] Bikas Saha commented on MAPREDUCE-4334: --- Somewhere in this thread it was mentioned controlling memory via OS. In my experience this is not an optimal choice because 1) makes it hard to debug task failures due to memory issues. Abrupt OS termination or denial or more memory resulting in NPE/bad pointers etc. Its better to just monitor the memory and then enforce limits with clear error message saying - task was terminated because it used more memory than alloted. 2) due to different scenarios, tasks may have memory spikes or temporary increases. The OS will enforce tight limits but NodeManager monitoring can be more flexible and not terminate a task because it shot to 2.1GB instead of staying under 2. Disk scheduling and monitoring would be a hard to achieve goal with multiple writers to disk spinning things their own way and expecting something that will likely not happen. Network scheduling and monitoring shares choke points at multiple levels beyond the machines and trying to optimally and proportionally use the network tends to be a problem thats better served globally. My 2 cents would be to limit this to just CPU for now. Based on the comments above, I would agree that we need to make sure platform specific stuff should not leak into the code so that other platforms (imminently Windows) can support this stuff. An alternative to pluggable ContainersMonitor would be to make CPU management a pluggable component of ContainersManager. My POV is that ContainersManager manages the resources of containers and has logic that will be common across platforms. The tools it uses will change. Eg. ProcfsBaseProcessTree is the tool used to monitor and manage memory. I can see that being changed to a MemoryMonitor interface with platform specific implementations. Thats whats happening on the Windows port in branch 1. I can see a CPUMonitor interface for CPU. Or maybe a ResourceMonitor that has methods for both memory and CPU. > Add support for CPU isolation/monitoring of containers > -- > > Key: MAPREDUCE-4334 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334 > Project: Hadoop Map/Reduce > Issue Type: Sub-task >Reporter: Arun C Murthy >Assignee: Andrew Ferguson > Attachments: MAPREDUCE-4334-pre1.patch, > MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre2.patch, > MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-pre3.patch > > > Once we get in MAPREDUCE-4327, it will be important to actually enforce > limits on CPU consumption of containers. > Several options spring to mind: > # taskset (RHEL5+) > # cgroups (RHEL6+) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers
[ https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13414108#comment-13414108 ] Andrew Ferguson commented on MAPREDUCE-4334: Arun -- I think we might be talking past each other, as we agree that both cgroups and taskset should be available. BTW, it turns out the sched_setaffinity() syscall does not require root if it is applied to a process you own. Therefore, if you are running with the DefaultContainerExecutor, you can still use sched_setaffinity, which is excellent. I think this is the matrix of possible use cases: 1) launch container as user & use sched_setaffinity / taskset / CPU pinning 2) launch container as user & use cgroups completely managed by Hadoop 3) launch container as user & use cgroups managed by the cluster operator 4) launch container as Hadoop & use sched_setaffinity / taskset / CPU pinning 5) launch container as Hadoop & use cgroups completely managed by Hadoop 6) launch container as Hadoop & use cgroups managed by the cluster operator Cases 1, 2, 3 and 5 require root privs. Cases 3 and 6 are covered by the patch above. I'm happy to expand the LCE into a "hadoop root tool" which can be used in cases 1, 2, 3, and 5. In my mind, the design question is how to cover all six cases with the most amount of code re-use. Today, we have two important ContainerManager subsystems: the Launcher and the Monitor. Today, reforce enforcement is entirely done within the Monitor. The question is, where should new resource enforcement be done? I think the answer is still "in the Monitor" even though, in some use cases, it needs access to root privs. To get access to those privs, it can call the LCE binary (aka the "hadoop root tool"), just as the java-side of the LCE does today. So, concretely, this is my proposal: - recognize the LCE binary as the "hadoop root tool" - the LCE will have two new functionalities: 1) sched_setaffinity and 2) creating cgroups - in addition to the patch above, I will create 1) another pluggable ContainersMonitor which can use these new functions (sched_setaffinity) and 2) adapt the one above to optionally use the (creating cgroups) functionality of the "hadoop root tool" how does that sound? > Add support for CPU isolation/monitoring of containers > -- > > Key: MAPREDUCE-4334 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334 > Project: Hadoop Map/Reduce > Issue Type: Sub-task >Reporter: Arun C Murthy >Assignee: Andrew Ferguson > Attachments: MAPREDUCE-4334-pre1.patch, > MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre2.patch, > MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-pre3.patch > > > Once we get in MAPREDUCE-4327, it will be important to actually enforce > limits on CPU consumption of containers. > Several options spring to mind: > # taskset (RHEL5+) > # cgroups (RHEL6+) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers
[ https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13414053#comment-13414053 ] Arun C Murthy commented on MAPREDUCE-4334: -- Andrew - please don't this the wrong way, I certainly am *not* trying to debate taskset v/s cgroups. All I'm saying is 'we need both' for the dominant platforms: RHEL5 and RHEL6. I perfectly understand that you might not have the time or the inclination to do both, and I'm happy to help, personally - supporting just RHEL6 isn't enough. Given that, we have two options: # Admin-setup cgroups (outside YARN) # YARN handles it on it's own via LCE Now the pros of using LCE: # It already exists! Hence it doesn't require any *new* operational requirements. # It's consistent for both technologies/platforms we need to support: taskset/RHEL5 and cgroups/RHEL6. # Even better, we can use the same for any platform in the future e.g. WindowsContainerExecutor (for e.g. we already have WindowsTaskController in branch-1-win and would need to get ported to branch-2 soon). # It's *much lesser* overhead on admins - they don't have to create cgroups upfront, they don't have to mount them to get them to survive reboots etc. Cons: # Need LCE for non-secure setups. We actually did support LTC without security in branch-1 at some point, happy to discuss. In the alternate (admin-setup groups) we will _still_ need LCE (or worse, *another* setuid script) to support taskset. To me that is a very bad choice. As a result, using LCE seems like a significantly superior alternative. Some other comments: bq. In my mind, the LCE is for starting processes, and should stick to doing that. Not true at all, we already use it for container cleanup etc. {quote} 4) For cgroups, we could have a second ContainersMonitor plugin which uses a setuid root binary to also mount & create cgroups, freeing the admin from managing them at all. 5) For taskset, we can implement a ContainersMonitor which uses a setuid root binary (potentially the LCE, but perhaps better if it's something else, just to keep the security footprint down) to pin processes to CPUs. This ContainersMonitor will also need the memory enforcement code from the current ContainersMonitorImpl {quote} Like I said above, have two ways to do the same when we can do with one *existing* component i.e. LCE seems like a clear choice. I understand you might not have time to port your work via LCE, I'm happy to either help or take up that work. > Add support for CPU isolation/monitoring of containers > -- > > Key: MAPREDUCE-4334 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334 > Project: Hadoop Map/Reduce > Issue Type: Sub-task >Reporter: Arun C Murthy >Assignee: Andrew Ferguson > Attachments: MAPREDUCE-4334-pre1.patch, > MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre2.patch, > MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-pre3.patch > > > Once we get in MAPREDUCE-4327, it will be important to actually enforce > limits on CPU consumption of containers. > Several options spring to mind: > # taskset (RHEL5+) > # cgroups (RHEL6+) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers
[ https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13413913#comment-13413913 ] Andrew Ferguson commented on MAPREDUCE-4334: hi all, I think there are pros and cons to both approaches, which I will try to outline below. Cgroups: - they provide a coherent path for future resource management: network bandwidth, CPU upper- and lower-bounds, block I/O priorities and limits, etc. [1] - can be integrated with resource management for other applications, drawing upon a single resource budget for a group of users - cgroup's hierarchies are key to this. in a taskset-only world, the NM would need to be given a fixed allocation of the node's CPUs to manage - cgroups are not persistent across reboots. this is unfortunate. however, 1) anyone using them needs to mount them on startup, so they will need to make a change to their startup process already, and 2) there are extensive, cross-distro tools to create and manage cgroups automatically on reboot (RHEL 6 has great docs on them [2]) - some clusters are already using Cgroups, without any support from Hadoop/YARN. for example, StumbleUpon [3] Taskset: - compatible with RHEL 5 - does not require changes to node startup - can be implemented with a SUID root binary, as LCE is today My first design for this JIRA had the LCE create the cgroups. This turned out to be the wrong approach for several reasons: - What if I wanted to use the regular container executor with cgroups? An admin may not allow me to have a setuid root binary, but may be willing to create a cgroup hierarchy for me (after all, this is one advantage of the hierarchy: delegation) - Conversely, what if I wanted to use the LCE without cgroups? - There needs to be a part of the NM responsible for deleting unused cgroups, and the other tasks of a ContainersManager I described in MAPREDUCE-4351. Some of those are specific to how resource enforcement is being done; it seemed best to keep that code together in the ContainersManager, then spread across a ContainersManager and the LCE. - Putting the resource enforcement "smarts" in the ContainersMonitor (which is already receiving events from the RM), allows it to dynamically adjust the resource enforcement - On startup, the JVM can appear to be using twice as much memory as it actually is (see comment in ContainersMonitorImpl.java). By starting the JVM within the cgroup, rather than allowing it to start outside the cgroup and moving it into the cgroup with a ContainersMonitor as my patch above does, the kernel may kill the JVM inadvertently. I really like the flexibility of keeping the LCE and resource enforcement separate. In my mind, the LCE is for starting processes, and should stick to doing that. Resource enforcement is a separate job. My recommendation is the following: 1) Keep the LCE as it is. 2) Support pluggable ContainersMonitors (MAPREDUCE-4351) 3) For cgroups, we can start with the patch above. It is best for admins who already use cgroups on their nodes and want to have YARN take advantage of them. (This is the point of the yarn.nodemanger.cgroups.path config option I added) 4) For cgroups, we could have a second ContainersMonitor plugin which uses a setuid root binary to also mount & create cgroups, freeing the admin from managing them at all. 5) For taskset, we can implement a ContainersMonitor which uses a setuid root binary (potentially the LCE, but perhaps better if it's something else, just to keep the security footprint down) to pin processes to CPUs. This ContainersMonitor will also need the memory enforcement code from the current ContainersMonitorImpl I've done 1-3 (well, #1 is a freebie :-) ... and I can definitely do #5 as well. Arun, does this design appeal to you? [1] http://www.linux-kongress.org/2010/slides/seyfried-cgroups-linux-kongress-2010-presentation.pdf [2] https://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/index.html [3] http://inchoate-clatter.blogspot.com/2012/03/hbase-ops-automation.html > Add support for CPU isolation/monitoring of containers > -- > > Key: MAPREDUCE-4334 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334 > Project: Hadoop Map/Reduce > Issue Type: Sub-task >Reporter: Arun C Murthy >Assignee: Andrew Ferguson > Attachments: MAPREDUCE-4334-pre1.patch, > MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre2.patch, > MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-pre3.patch > > > Once we get in MAPREDUCE-4327, it will be important to actually enforce > limits on CPU consumption of containers. > Several options spring to mind: > # taskset (RHEL5+) > # cgroups (RHEL6+) -- This message is automatically gener
[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers
[ https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13413518#comment-13413518 ] Arun C Murthy commented on MAPREDUCE-4334: -- bq. Clearly, we need to support taskset for platforms on which cgroups isn't supported e.g. RHEL5. For taskset you need super-user privs - would you prefer packages to do it too? I meant to say: for taskset we clearly need to go via LCE at runtime. > Add support for CPU isolation/monitoring of containers > -- > > Key: MAPREDUCE-4334 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334 > Project: Hadoop Map/Reduce > Issue Type: Sub-task >Reporter: Arun C Murthy >Assignee: Andrew Ferguson > Attachments: MAPREDUCE-4334-pre1.patch, > MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre2.patch, > MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-pre3.patch > > > Once we get in MAPREDUCE-4327, it will be important to actually enforce > limits on CPU consumption of containers. > Several options spring to mind: > # taskset (RHEL5+) > # cgroups (RHEL6+) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers
[ https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13413517#comment-13413517 ] Arun C Murthy commented on MAPREDUCE-4334: -- Also, it does look like cgroups might not be persisted across reboots - just makes it much worse to deal with in that case. > Add support for CPU isolation/monitoring of containers > -- > > Key: MAPREDUCE-4334 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334 > Project: Hadoop Map/Reduce > Issue Type: Sub-task >Reporter: Arun C Murthy >Assignee: Andrew Ferguson > Attachments: MAPREDUCE-4334-pre1.patch, > MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre2.patch, > MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-pre3.patch > > > Once we get in MAPREDUCE-4327, it will be important to actually enforce > limits on CPU consumption of containers. > Several options spring to mind: > # taskset (RHEL5+) > # cgroups (RHEL6+) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers
[ https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13413515#comment-13413515 ] Arun C Murthy commented on MAPREDUCE-4334: -- bq. How is it onerous? Packages could easily do this as part of the install on platforms where it's supported. This doesn't make sense. What if CPU isolation is disabled? Do you still want 'packages' to make it part of the install? Clearly, we need to support taskset for platforms on which cgroups isn't supported e.g. RHEL5. For taskset you need super-user privs - would you prefer packages to do it too? Yes, LTC is a pain, but using it consistently (e.g. for both cgroups and taskset) seems better than having multiple steps forced on the admin (LCE + cgroups + taskset etc.). > Add support for CPU isolation/monitoring of containers > -- > > Key: MAPREDUCE-4334 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334 > Project: Hadoop Map/Reduce > Issue Type: Sub-task >Reporter: Arun C Murthy >Assignee: Andrew Ferguson > Attachments: MAPREDUCE-4334-pre1.patch, > MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre2.patch, > MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-pre3.patch > > > Once we get in MAPREDUCE-4327, it will be important to actually enforce > limits on CPU consumption of containers. > Several options spring to mind: > # taskset (RHEL5+) > # cgroups (RHEL6+) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers
[ https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13413499#comment-13413499 ] Todd Lipcon commented on MAPREDUCE-4334: bq. Preventing such onerous requirements on cluster setup is a key goal - something which initially led to creation of LinuxTaskController etc. How is it onerous? Packages could easily do this as part of the install on platforms where it's supported. It seems equivalent to the installation of the LTC itself, which requires root to make it setuid, right? Andrew: do the cgroups persist cross-reboot, or does that cgcreate command need to go the startup scripts? > Add support for CPU isolation/monitoring of containers > -- > > Key: MAPREDUCE-4334 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334 > Project: Hadoop Map/Reduce > Issue Type: Sub-task >Reporter: Arun C Murthy >Assignee: Andrew Ferguson > Attachments: MAPREDUCE-4334-pre1.patch, > MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre2.patch, > MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-pre3.patch > > > Once we get in MAPREDUCE-4327, it will be important to actually enforce > limits on CPU consumption of containers. > Several options spring to mind: > # taskset (RHEL5+) > # cgroups (RHEL6+) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers
[ https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13413438#comment-13413438 ] Arun C Murthy commented on MAPREDUCE-4334: -- Andrew, thanks, I missed that comment. I'm concerned asking admins to setup croups etc. via cgcreate prior to deploying Hadoop clusters etc., particularly on all nodes, is almost a non-starter. Preventing such onerous requirements on cluster setup is a key goal - something which initially led to creation of LinuxTaskController etc. I'd strongly urge we implement this functionality via LinuxContainerExecutor - there-by allowing us to write low-level platform specific code (RHEL5 v/s RHEL6 etc.) in a single place and not rely on tedious Java code for the same. Thoughts? > Add support for CPU isolation/monitoring of containers > -- > > Key: MAPREDUCE-4334 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334 > Project: Hadoop Map/Reduce > Issue Type: Sub-task >Reporter: Arun C Murthy >Assignee: Andrew Ferguson > Attachments: MAPREDUCE-4334-pre1.patch, > MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre2.patch, > MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-pre3.patch > > > Once we get in MAPREDUCE-4327, it will be important to actually enforce > limits on CPU consumption of containers. > Several options spring to mind: > # taskset (RHEL5+) > # cgroups (RHEL6+) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers
[ https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13413396#comment-13413396 ] Andrew Ferguson commented on MAPREDUCE-4334: @Arun: no, the NM does not need superuser privs. in my comment above [1], the line "$ sudo cgcreate -a hadoop_user_name -g memory:hadoop-yarn" is run when installing Hadoop. This creates a branch of the memory hierarchy called "hadoop-yarn" which is owned by the user "hadoop_user_name" (which would be the user running the NM). This allows the NM to create and move cgroups without superuser privs. The one complication is only the superuser or the owner of a process may move a process into a cgroup. As the LinuxContainerExecutor runs processes under different user accounts, we will need to either augment it, or use a similar tool to move such processes into a cgroup created by the NM user. Let me know if you'd like further clarification. [1] https://issues.apache.org/jira/browse/MAPREDUCE-4334?focusedCommentId=13399014&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13399014 > Add support for CPU isolation/monitoring of containers > -- > > Key: MAPREDUCE-4334 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334 > Project: Hadoop Map/Reduce > Issue Type: Sub-task >Reporter: Arun C Murthy >Assignee: Andrew Ferguson > Attachments: MAPREDUCE-4334-pre1.patch, > MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre2.patch, > MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-pre3.patch > > > Once we get in MAPREDUCE-4327, it will be important to actually enforce > limits on CPU consumption of containers. > Several options spring to mind: > # taskset (RHEL5+) > # cgroups (RHEL6+) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers
[ https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13413380#comment-13413380 ] Arun C Murthy commented on MAPREDUCE-4334: -- Andrew, what are the security implications here? Does the NM need superuser privs to create/move cgroups? > Add support for CPU isolation/monitoring of containers > -- > > Key: MAPREDUCE-4334 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334 > Project: Hadoop Map/Reduce > Issue Type: Sub-task >Reporter: Arun C Murthy >Assignee: Andrew Ferguson > Attachments: MAPREDUCE-4334-pre1.patch, > MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre2.patch, > MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-pre3.patch > > > Once we get in MAPREDUCE-4327, it will be important to actually enforce > limits on CPU consumption of containers. > Several options spring to mind: > # taskset (RHEL5+) > # cgroups (RHEL6+) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers
[ https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13396261#comment-13396261 ] Andrew Ferguson commented on MAPREDUCE-4334: ok, putting all of this in the ContainerExecutor is not the way to go, as it precludes use of secure Hadoop's Linux container-executor. In my new design, ContainerMonitor will be a pluggable component, just as ContainerExecutor is now. Then, we can provide a ContainerMonitor which uses cgroups to control resource usage, rather than the existing ContainerMonitor (to be renamed as "DefaultContainerMonitor"). This has several advantages: 1) allows us to keep existing ContainerMonitor for users who can't use cgroups (eg, users without root access during Hadoop setup) 2) ContainerMonitor already receives an event when it's time to stop monitoring, which we can use as notification to delete the container's cgroup 3) ContainerMonitor receives the resource limits already; no need to calculate them based on the configs 4) A pluggable ContainerMonitor paves the way for ContainerMonitors on other platforms I will first open a sub-task to make ContainerMonitor pluggable. The only trouble spot with this design is that it's not possible to move another non-root user's process into a cgroup. I plan to extend the secure container-executor to be able to make such a move. Please let me know if you have any feedback about this proposal. thank you, Andrew > Add support for CPU isolation/monitoring of containers > -- > > Key: MAPREDUCE-4334 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334 > Project: Hadoop Map/Reduce > Issue Type: Sub-task >Reporter: Arun C Murthy >Assignee: Arun C Murthy > > Once we get in MAPREDUCE-4327, it will be important to actually enforce > limits on CPU consumption of containers. > Several options spring to mind: > # taskset (RHEL5+) > # cgroups (RHEL6+) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers
[ https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13296036#comment-13296036 ] Andrew Ferguson commented on MAPREDUCE-4334: Hi Arun, I've thought some more about implementing taskset since our chat at the YARN meet-up. One benefit of cgroups is they're "set it and forget it" -- in the ContainerExecutor, we simply place the new task in the appropriate cgroup, and the kernel will take care of the rest. This would allow us to ditch the ContainersMonitor infrastructure. On the other hand, with taskset, we will need to do the CPU scheduling ourselves. Say I have two cores and start with two processes, A (requested 0.5 cores) and B (requested 0.5 cores). I can start by putting them both on core 1 for efficiency, or I can put them on separate cores for higher utilization. But if process C (requested 1 core) comes along, I will need to set A & B to the same core. This is just a simple scenario, but more cores and processes will likely grow a complicated CPU scheduler inside the NodeManager (ContainersMonitorImpl is probably the right place, since it is already monitoring container resource usage). tl;dr -- I believe cgroups requires only local state when launching containers, while taskset requires us to maintain global state. thoughts? thanks! Andrew > Add support for CPU isolation/monitoring of containers > -- > > Key: MAPREDUCE-4334 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334 > Project: Hadoop Map/Reduce > Issue Type: Sub-task >Reporter: Arun C Murthy >Assignee: Arun C Murthy > > Once we get in MAPREDUCE-4327, it will be important to actually enforce > limits on CPU consumption of containers. > Several options spring to mind: > # taskset (RHEL5+) > # cgroups (RHEL6+) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers
[ https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293777#comment-13293777 ] Andrew Ferguson commented on MAPREDUCE-4334: hi Arun, I've actually been looking into this recently myself, and would be happy to take the lead on it. So far, I've been focusing on cgroups as they also provide memory containment, and provide a path for managing future resources as well. although taskset is available on RHEL5, it's not capable of isolating fractions of a CPU. while cgroups' memory support gives an upper-bound on the amount of memory tasks can consume, the RHEL6 cpu support is actually a lower-bound. until CFS bandwidth control [1] is more widespread, we can place tasks judiciously to create guarantees, building on cgroups to ensure the lower-bounds. best, Andrew [1] for a quick overview: http://lwn.net/Articles/428230/ ... more in-depth discussion here: http://www.kernel.org/doc/ols/2010/ols2010-pages-245-254.pdf > Add support for CPU isolation/monitoring of containers > -- > > Key: MAPREDUCE-4334 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334 > Project: Hadoop Map/Reduce > Issue Type: Sub-task >Reporter: Arun C Murthy >Assignee: Arun C Murthy > > Once we get in MAPREDUCE-4327, it will be important to actually enforce > limits on CPU consumption of containers. > Several options spring to mind: > # taskset (RHEL5+) > # cgroups (RHEL6+) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira