[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers

2012-08-02 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427374#comment-13427374
 ] 

Arun C Murthy commented on MAPREDUCE-4334:
--

Maybe I haven't been able to communicate this clear enough, please let me try 
again:

I'd strongly go for a model where platform-specific features (e.g. cgroups, 
setuid etc.) are supported via the native code and *build* system (autotool 
chain) so that we can, from the end-user perspective, automatically deal with 
them via a single controlling configuration knob i.e. 
yarn.nodemanager.container-executor in this case. 

The alternative, which is various Java interfaces are much worse since now you 
have to configure yarn.nodemanager.container-executor, the resource-enforcer 
etc. This can also have configuration errors such as TasksetEnforcer in RHEL6 
or CgroupsEnforcer in RHEL5 etc.

The native code is, simply, far simpler option which puts the onus on us and 
takes the burden away from the end-user or admin.

Thoughts?

> Add support for CPU isolation/monitoring of containers
> --
>
> Key: MAPREDUCE-4334
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Arun C Murthy
>Assignee: Andrew Ferguson
> Attachments: MAPREDUCE-4334-executor-v1.patch, 
> MAPREDUCE-4334-executor-v2.patch, MAPREDUCE-4334-executor-v3.patch, 
> MAPREDUCE-4334-executor-v4.patch, MAPREDUCE-4334-pre1.patch, 
> MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre2.patch, 
> MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-pre3.patch, 
> MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch, 
> mapreduce-4334-design-doc-v2.txt, mapreduce-4334-design-doc.txt
>
>
> Once we get in MAPREDUCE-4327, it will be important to actually enforce 
> limits on CPU consumption of containers. 
> Several options spring to mind:
> # taskset (RHEL5+)
> # cgroups (RHEL6+)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers

2012-08-02 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427366#comment-13427366
 ] 

Arun C Murthy commented on MAPREDUCE-4334:
--

Alejandro, LCE accomplishes 2 things:
# It serves as a 'root' tool with the setuid bit
# It serves as the home for Linux-specific container maintenance code

Now, for other platforms you have to add other ContainerExecutors anyway for 
e.g. branch-1-win has a WindowsTaskController which will be ported over to 
trunk as WindowsContainerExecutor.

As, a result, I very much like to continue keeping the Linux-specific bits in 
LCE. Furthermore, with native code it is much, much easier to have 
platform-specific low-level code i.e. we can use autotools chain to resolve 
RHEL5 v/s RHEL6 etc. Doing that via Java plugins is very, very painful and 
leads to proliferation of interfaces and configurations. The native code is 
something we can deal very easily via Bigtop and other packaging projects.

Thoughts?

> Add support for CPU isolation/monitoring of containers
> --
>
> Key: MAPREDUCE-4334
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Arun C Murthy
>Assignee: Andrew Ferguson
> Attachments: MAPREDUCE-4334-executor-v1.patch, 
> MAPREDUCE-4334-executor-v2.patch, MAPREDUCE-4334-executor-v3.patch, 
> MAPREDUCE-4334-executor-v4.patch, MAPREDUCE-4334-pre1.patch, 
> MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre2.patch, 
> MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-pre3.patch, 
> MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch, 
> mapreduce-4334-design-doc-v2.txt, mapreduce-4334-design-doc.txt
>
>
> Once we get in MAPREDUCE-4327, it will be important to actually enforce 
> limits on CPU consumption of containers. 
> Several options spring to mind:
> # taskset (RHEL5+)
> # cgroups (RHEL6+)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers

2012-08-02 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427367#comment-13427367
 ] 

Arun C Murthy commented on MAPREDUCE-4334:
--

Also, I'll add that since cgroups is Linux-specific anyway, I don't see how it 
will be used on other platforms i.e. Windows.

> Add support for CPU isolation/monitoring of containers
> --
>
> Key: MAPREDUCE-4334
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Arun C Murthy
>Assignee: Andrew Ferguson
> Attachments: MAPREDUCE-4334-executor-v1.patch, 
> MAPREDUCE-4334-executor-v2.patch, MAPREDUCE-4334-executor-v3.patch, 
> MAPREDUCE-4334-executor-v4.patch, MAPREDUCE-4334-pre1.patch, 
> MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre2.patch, 
> MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-pre3.patch, 
> MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch, 
> mapreduce-4334-design-doc-v2.txt, mapreduce-4334-design-doc.txt
>
>
> Once we get in MAPREDUCE-4327, it will be important to actually enforce 
> limits on CPU consumption of containers. 
> Several options spring to mind:
> # taskset (RHEL5+)
> # cgroups (RHEL6+)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers

2012-08-01 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427128#comment-13427128
 ] 

Alejandro Abdelnur commented on MAPREDUCE-4334:
---

Arun, if somebody is willing to install cgrulesengd/cgexec in the nodes then 
there is no need for super-user privileges;p plus, any CE could be used 
(unmodified) with a ResourceEnforcer injecting cgexec to the launcher 
invocation. This has also the benefit that if we add more resource dimensions 
(last bullet above), CE implementations would not need to change, only the 
ResourceEnforcer. Which means, no code duplication, the cgroup configuration 
logic lives once, in the ResourceEnforcer, as opposed to every CE that wants to 
support cgroups. Finally, I like the fact that with the ResourceEnforcer we are 
doing a clean separation of responsibilities between the ResourceEnforcer 
(configures) and the ContainerExecutor (executes), IMO this separation will 
simplify making improvements in each one of them without risk of mixing these 2 
responsibilities.

> Add support for CPU isolation/monitoring of containers
> --
>
> Key: MAPREDUCE-4334
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Arun C Murthy
>Assignee: Andrew Ferguson
> Attachments: MAPREDUCE-4334-executor-v1.patch, 
> MAPREDUCE-4334-executor-v2.patch, MAPREDUCE-4334-executor-v3.patch, 
> MAPREDUCE-4334-executor-v4.patch, MAPREDUCE-4334-pre1.patch, 
> MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre2.patch, 
> MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-pre3.patch, 
> MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch, 
> mapreduce-4334-design-doc-v2.txt, mapreduce-4334-design-doc.txt
>
>
> Once we get in MAPREDUCE-4327, it will be important to actually enforce 
> limits on CPU consumption of containers. 
> Several options spring to mind:
> # taskset (RHEL5+)
> # cgroups (RHEL6+)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers

2012-08-01 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427120#comment-13427120
 ] 

Arun C Murthy commented on MAPREDUCE-4334:
--

Alejandro - I'm thinking that since *only* LCE can use cgroups (due to 
necessary super-user privs etc.), it's simpler to do minimal changes to LCE to 
create/encapsulate into cgroups. Thoughts?

> Add support for CPU isolation/monitoring of containers
> --
>
> Key: MAPREDUCE-4334
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Arun C Murthy
>Assignee: Andrew Ferguson
> Attachments: MAPREDUCE-4334-executor-v1.patch, 
> MAPREDUCE-4334-executor-v2.patch, MAPREDUCE-4334-executor-v3.patch, 
> MAPREDUCE-4334-executor-v4.patch, MAPREDUCE-4334-pre1.patch, 
> MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre2.patch, 
> MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-pre3.patch, 
> MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch, 
> mapreduce-4334-design-doc-v2.txt, mapreduce-4334-design-doc.txt
>
>
> Once we get in MAPREDUCE-4327, it will be important to actually enforce 
> limits on CPU consumption of containers. 
> Several options spring to mind:
> # taskset (RHEL5+)
> # cgroups (RHEL6+)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers

2012-08-01 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13426978#comment-13426978
 ] 

Alejandro Abdelnur commented on MAPREDUCE-4334:
---

I like to introduce the ResourceEnforcer interface for the following reasons:

* It provides clean lifecycle hooks for initializing/configuring/cleanup 
cgroups, leaving to the LCE just the the actual binding.
* It will work with multiple container executors as oposed to LCE only.
* Makes the changes in the LCE minimal (IMO, the less logic with put in native 
code the better).
* taskset could easily be implemented as a ResourceEnforcer.
* If we eventually want to control other resources via cgroups (such as 
memory/disk/network), only the ResourceEnforcer would require changes.

Fair enough?


> Add support for CPU isolation/monitoring of containers
> --
>
> Key: MAPREDUCE-4334
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Arun C Murthy
>Assignee: Andrew Ferguson
> Attachments: MAPREDUCE-4334-executor-v1.patch, 
> MAPREDUCE-4334-executor-v2.patch, MAPREDUCE-4334-executor-v3.patch, 
> MAPREDUCE-4334-executor-v4.patch, MAPREDUCE-4334-pre1.patch, 
> MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre2.patch, 
> MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-pre3.patch, 
> MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch, 
> mapreduce-4334-design-doc-v2.txt, mapreduce-4334-design-doc.txt
>
>
> Once we get in MAPREDUCE-4327, it will be important to actually enforce 
> limits on CPU consumption of containers. 
> Several options spring to mind:
> # taskset (RHEL5+)
> # cgroups (RHEL6+)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers

2012-08-01 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13426961#comment-13426961
 ] 

Arun C Murthy commented on MAPREDUCE-4334:
--

Thanks tucu, this is getting close.

Please help me understand if the following (simpler) proposal will work:

# NM calls LCE.launchContainer with the cpu-set.
# LCE will create the necessary cgroup if necessary
# LCE will launch the process within the cgroup

Pros: This way, we avoid new interfaces such as ResourceEnforcer and we can 
also use taskset if necessary. Taskset should also work for 
DefaultContianerExecutor.

Thoughts?





> Add support for CPU isolation/monitoring of containers
> --
>
> Key: MAPREDUCE-4334
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Arun C Murthy
>Assignee: Andrew Ferguson
> Attachments: MAPREDUCE-4334-executor-v1.patch, 
> MAPREDUCE-4334-executor-v2.patch, MAPREDUCE-4334-executor-v3.patch, 
> MAPREDUCE-4334-executor-v4.patch, MAPREDUCE-4334-pre1.patch, 
> MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre2.patch, 
> MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-pre3.patch, 
> MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch, 
> mapreduce-4334-design-doc-v2.txt, mapreduce-4334-design-doc.txt
>
>
> Once we get in MAPREDUCE-4327, it will be important to actually enforce 
> limits on CPU consumption of containers. 
> Several options spring to mind:
> # taskset (RHEL5+)
> # cgroups (RHEL6+)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers

2012-08-01 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13426957#comment-13426957
 ] 

Alejandro Abdelnur commented on MAPREDUCE-4334:
---

I was chatting offline with Arun about this JIRA. His key concern is that it 
should be possible to use cgroups without requiring the installation of 
additional packages and extra OS configuration. As the LinuxContainerExecutor 
already runs as root, we can leverage that to create the cgroup mounts. This 
means that the LinuxContainerExecutor is required to use cgroups with zero 
configuration. While typically the LinuxContainerExecutor is used in secure 
clusters, still it can be used in non-secure cluster always running as the 
mapred user (which would be the equivalent of the DefaultContainerExecutor).

Given this how about the following proposal?

This approach will not depend on cgexec binary being installed.

* The LinuxContainerExecutor would have 2 new options. 
** --cgroupsinit : This option will be used for initialization. When 
invoked with this option, the LCE will create the cgroup mount point would and 
give owmership of it to the yarn user. Then it will complete its execution.
** --cgroup : This option will be used for launching containers. When 
invoked with this option, the LCE will add the process to specified cgroup 
paramerer.

* The ResourceEnforcer will have the following methods (exactly as in the 
latest patch):
** init(): called when the RM is initialized.
** preExecute(containerId, Resource): called before launching the container.
** wrapCommand(containerId, command): augments the execution command line 
before launching.
** postExecute(containerId): called after launching the container.

* A default implementation of the ResourceEnforcer will do NOPs.

* The CgroupsResourceEnforcer implementation will do the following:
** init(): call LCE --cgroupsinit
** preExecute(containerId, Resource): configure the cgroup with the assigned 
cpu resources.
** wrapCommand(containerId, command): augments regular LCE invocation with the 
-cgroup option.
** postExecute(containerId): any necessary cgroup clean up.

> Add support for CPU isolation/monitoring of containers
> --
>
> Key: MAPREDUCE-4334
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Arun C Murthy
>Assignee: Andrew Ferguson
> Attachments: MAPREDUCE-4334-executor-v1.patch, 
> MAPREDUCE-4334-executor-v2.patch, MAPREDUCE-4334-executor-v3.patch, 
> MAPREDUCE-4334-executor-v4.patch, MAPREDUCE-4334-pre1.patch, 
> MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre2.patch, 
> MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-pre3.patch, 
> MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch, 
> mapreduce-4334-design-doc-v2.txt, mapreduce-4334-design-doc.txt
>
>
> Once we get in MAPREDUCE-4327, it will be important to actually enforce 
> limits on CPU consumption of containers. 
> Several options spring to mind:
> # taskset (RHEL5+)
> # cgroups (RHEL6+)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers

2012-07-30 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425383#comment-13425383
 ] 

Karthik Kambatla commented on MAPREDUCE-4334:
-

+1 on design - 2(b), and the patch looks good.

> Add support for CPU isolation/monitoring of containers
> --
>
> Key: MAPREDUCE-4334
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Arun C Murthy
>Assignee: Andrew Ferguson
> Attachments: MAPREDUCE-4334-executor-v1.patch, 
> MAPREDUCE-4334-executor-v2.patch, MAPREDUCE-4334-executor-v3.patch, 
> MAPREDUCE-4334-executor-v4.patch, MAPREDUCE-4334-pre1.patch, 
> MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre2.patch, 
> MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-pre3.patch, 
> MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch, 
> mapreduce-4334-design-doc-v2.txt, mapreduce-4334-design-doc.txt
>
>
> Once we get in MAPREDUCE-4327, it will be important to actually enforce 
> limits on CPU consumption of containers. 
> Several options spring to mind:
> # taskset (RHEL5+)
> # cgroups (RHEL6+)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers

2012-07-30 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425370#comment-13425370
 ] 

Alejandro Abdelnur commented on MAPREDUCE-4334:
---

I like the current patch, it does not add complexity and it will be trivial to 
wire it with MAPREDUCE-4327 once CPU units are part of resources.

> Add support for CPU isolation/monitoring of containers
> --
>
> Key: MAPREDUCE-4334
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Arun C Murthy
>Assignee: Andrew Ferguson
> Attachments: MAPREDUCE-4334-executor-v1.patch, 
> MAPREDUCE-4334-executor-v2.patch, MAPREDUCE-4334-executor-v3.patch, 
> MAPREDUCE-4334-executor-v4.patch, MAPREDUCE-4334-pre1.patch, 
> MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre2.patch, 
> MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-pre3.patch, 
> MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch, 
> mapreduce-4334-design-doc-v2.txt, mapreduce-4334-design-doc.txt
>
>
> Once we get in MAPREDUCE-4327, it will be important to actually enforce 
> limits on CPU consumption of containers. 
> Several options spring to mind:
> # taskset (RHEL5+)
> # cgroups (RHEL6+)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers

2012-07-26 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13423323#comment-13423323
 ] 

Arun C Murthy commented on MAPREDUCE-4334:
--

Andrew - I'll ask again.

Can you please provide a simple writeup? I'm confused seeing new interfaces 
pop-up in every new patch. Thanks.

> Add support for CPU isolation/monitoring of containers
> --
>
> Key: MAPREDUCE-4334
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Arun C Murthy
>Assignee: Andrew Ferguson
> Attachments: MAPREDUCE-4334-executor-v1.patch, 
> MAPREDUCE-4334-executor-v2.patch, MAPREDUCE-4334-pre1.patch, 
> MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre2.patch, 
> MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-pre3.patch, 
> MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch
>
>
> Once we get in MAPREDUCE-4327, it will be important to actually enforce 
> limits on CPU consumption of containers. 
> Several options spring to mind:
> # taskset (RHEL5+)
> # cgroups (RHEL6+)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers

2012-07-26 Thread Andrew Ferguson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13423294#comment-13423294
 ] 

Andrew Ferguson commented on MAPREDUCE-4334:


Hi,

bq. Why the ResourceEnforcer is bubble up all the way to the NodeManager 
instead just being instantiated & configured in the ContainerLauncher where it 
seems the use of before() & after() and then passed to the ContainerExecutor as 
a parameter in the launchContainer() method?

the reason is because I was trying to pattern-match how the ContainerExecutor 
works, and the ContainerExecutor is instantiated by the NodeManager. If you 
think it makes more sense to break with the pattern and keep the 
ResourceEnforcer localized to the ContainersLauncher, then I certainly do that.

thanks! I will incorporate your other comments into the patch.

Andrew

> Add support for CPU isolation/monitoring of containers
> --
>
> Key: MAPREDUCE-4334
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Arun C Murthy
>Assignee: Andrew Ferguson
> Attachments: MAPREDUCE-4334-executor-v1.patch, 
> MAPREDUCE-4334-executor-v2.patch, MAPREDUCE-4334-pre1.patch, 
> MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre2.patch, 
> MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-pre3.patch, 
> MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch
>
>
> Once we get in MAPREDUCE-4327, it will be important to actually enforce 
> limits on CPU consumption of containers. 
> Several options spring to mind:
> # taskset (RHEL5+)
> # cgroups (RHEL6+)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers

2012-07-26 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13423151#comment-13423151
 ] 

Alejandro Abdelnur commented on MAPREDUCE-4334:
---

I like the approach much better than the previous patch.

* Why the ResourceEnforcer is bubble up all the way to the NodeManager instead 
just being instantiated & configured in the ContainerLauncher where it seems 
the use of before() & after() and then passed to the ContainerExecutor as a 
parameter in the launchContainer() method?

* The method names in the ResourceEnforcer seem a bit off. How about the 
following alternative names: before() -> preLaunch(), after() -> postLaunch() & 
commandPrefix -> wrapLauncherCommand()

* Instead having an init(Configuration conf) method in the ResourceEnforcer why 
not make it implement Configurable and have an init() method. Then the 
configuration is set at instantiation by the ReflectionUtils.newInstance() ?


> Add support for CPU isolation/monitoring of containers
> --
>
> Key: MAPREDUCE-4334
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Arun C Murthy
>Assignee: Andrew Ferguson
> Attachments: MAPREDUCE-4334-executor-v1.patch, 
> MAPREDUCE-4334-executor-v2.patch, MAPREDUCE-4334-pre1.patch, 
> MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre2.patch, 
> MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-pre3.patch, 
> MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch
>
>
> Once we get in MAPREDUCE-4327, it will be important to actually enforce 
> limits on CPU consumption of containers. 
> Several options spring to mind:
> # taskset (RHEL5+)
> # cgroups (RHEL6+)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers

2012-07-25 Thread Andrew Ferguson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13422740#comment-13422740
 ] 

Andrew Ferguson commented on MAPREDUCE-4334:


Forgot to mention that this version requires the "cgexec" binary, which, while 
not required for cgroups, is commonly available. If we choose not to introduce 
a dependency on cgexec, then we can return to modifying the C code in 
LinuxContainerExecutor, as the previous version of this patch did.

thanks,
Andrew

> Add support for CPU isolation/monitoring of containers
> --
>
> Key: MAPREDUCE-4334
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Arun C Murthy
>Assignee: Andrew Ferguson
> Attachments: MAPREDUCE-4334-executor-v1.patch, 
> MAPREDUCE-4334-executor-v2.patch, MAPREDUCE-4334-pre1.patch, 
> MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre2.patch, 
> MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-pre3.patch, 
> MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch
>
>
> Once we get in MAPREDUCE-4327, it will be important to actually enforce 
> limits on CPU consumption of containers. 
> Several options spring to mind:
> # taskset (RHEL5+)
> # cgroups (RHEL6+)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers

2012-07-24 Thread Andrew Ferguson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13421908#comment-13421908
 ] 

Andrew Ferguson commented on MAPREDUCE-4334:


Hi Alejandro,

thanks very much for looking at the patch & for the feedback. indeed, the patch 
should come with a no-op version which is enabled by default. (the current 
patch simply fails to find any cgroups if they are not configured, and then 
skips trying to use them.)

I will update the patch tomorrow so it continues to have a lower impact on the 
codebase.


thanks,
Andrew

> Add support for CPU isolation/monitoring of containers
> --
>
> Key: MAPREDUCE-4334
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Arun C Murthy
>Assignee: Andrew Ferguson
> Attachments: MAPREDUCE-4334-executor-v1.patch, 
> MAPREDUCE-4334-pre1.patch, MAPREDUCE-4334-pre2-with_cpu.patch, 
> MAPREDUCE-4334-pre2.patch, MAPREDUCE-4334-pre3-with_cpu.patch, 
> MAPREDUCE-4334-pre3.patch, MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch
>
>
> Once we get in MAPREDUCE-4327, it will be important to actually enforce 
> limits on CPU consumption of containers. 
> Several options spring to mind:
> # taskset (RHEL5+)
> # cgroups (RHEL6+)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers

2012-07-24 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13421840#comment-13421840
 ] 

Alejandro Abdelnur commented on MAPREDUCE-4334:
---

Patch has TAB characters, it should not. Indentation should be 2 spaces.

* ContainerExecutor.java

Instead having 2 different ConcurrentMaps, why not having one holding a data 
structure for pidFiles and cgroupFiles?

Why do we need read/write locsk when accessing a ConcurrentMap? 

* DefaultContainerExecutor.java

The for loop adding the process ID to the cgroup should be within { }, even if 
it is a single line.

* CgroupsCreator.java

Shouldn't, at initialization, enabled/disable itself based on a config property 
that indicates if Cgroups are enabled or not? And if disabled all methods would 
be NOP?








> Add support for CPU isolation/monitoring of containers
> --
>
> Key: MAPREDUCE-4334
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Arun C Murthy
>Assignee: Andrew Ferguson
> Attachments: MAPREDUCE-4334-executor-v1.patch, 
> MAPREDUCE-4334-pre1.patch, MAPREDUCE-4334-pre2-with_cpu.patch, 
> MAPREDUCE-4334-pre2.patch, MAPREDUCE-4334-pre3-with_cpu.patch, 
> MAPREDUCE-4334-pre3.patch, MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch
>
>
> Once we get in MAPREDUCE-4327, it will be important to actually enforce 
> limits on CPU consumption of containers. 
> Several options spring to mind:
> # taskset (RHEL5+)
> # cgroups (RHEL6+)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers

2012-07-23 Thread Andrew Ferguson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13420933#comment-13420933
 ] 

Andrew Ferguson commented on MAPREDUCE-4334:


Hi Hari,

In my experiments, there are usually 200-400ms between starting to create the 
cgroups and having the process completely inside them. This number is likely an 
upper-bound, as the experiments are in pseudo-distributed mode on a VM.

Note that in the design represented by this patch, I move the process into the 
cgroup asynchronously, so the latency is not incurred while starting the 
process. However, in my reading of Arun's comments, he would prefer that the 
cgroups be created synchronously while starting the job. I am currently in the 
progress of making this change. While I suspect the cost may not be as high as 
200-400ms, it will of course be non-zero. :-)

cheers,
Andrew

> Add support for CPU isolation/monitoring of containers
> --
>
> Key: MAPREDUCE-4334
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Arun C Murthy
>Assignee: Andrew Ferguson
> Attachments: MAPREDUCE-4334-pre1.patch, 
> MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre2.patch, 
> MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-pre3.patch, 
> MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch
>
>
> Once we get in MAPREDUCE-4327, it will be important to actually enforce 
> limits on CPU consumption of containers. 
> Several options spring to mind:
> # taskset (RHEL5+)
> # cgroups (RHEL6+)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers

2012-07-22 Thread Hari Mankude (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13420366#comment-13420366
 ] 

Hari Mankude commented on MAPREDUCE-4334:
-

Relevant information would be the performance impact of running maps and 
reduces in cgroups in terms of latency. 

Overall, this would be a very useful feature since it is possible to add 
fencing around cpu/io resources in addition to memory usage for MR tasks.

> Add support for CPU isolation/monitoring of containers
> --
>
> Key: MAPREDUCE-4334
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Arun C Murthy
>Assignee: Andrew Ferguson
> Attachments: MAPREDUCE-4334-pre1.patch, 
> MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre2.patch, 
> MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-pre3.patch, 
> MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch
>
>
> Once we get in MAPREDUCE-4327, it will be important to actually enforce 
> limits on CPU consumption of containers. 
> Several options spring to mind:
> # taskset (RHEL5+)
> # cgroups (RHEL6+)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers

2012-07-22 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13420363#comment-13420363
 ] 

Bikas Saha commented on MAPREDUCE-4334:
---

Aside from a design proposal I would be really interested in seeing how exactly 
cgroups work in the context of our typical workload. Say, take a bunch of 
typical mappers and reducers and run them in isolation. Then run them in 
isolation within cgroups. Is there a difference? Now run them concurrently with 
and without cgroups. What are the observations? These experiments may lead to 
expected or unexpected results and would be a great addition to the design pros 
and cons. Perhaps you have already run those experiments. If yes, care sharing 
the results.

> Add support for CPU isolation/monitoring of containers
> --
>
> Key: MAPREDUCE-4334
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Arun C Murthy
>Assignee: Andrew Ferguson
> Attachments: MAPREDUCE-4334-pre1.patch, 
> MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre2.patch, 
> MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-pre3.patch, 
> MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch
>
>
> Once we get in MAPREDUCE-4327, it will be important to actually enforce 
> limits on CPU consumption of containers. 
> Several options spring to mind:
> # taskset (RHEL5+)
> # cgroups (RHEL6+)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers

2012-07-22 Thread Andrew Ferguson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13420347#comment-13420347
 ] 

Andrew Ferguson commented on MAPREDUCE-4334:


Hi Arun,

I feel like we've been discussing pros & cons for the length of this JIRA. :-)  
I think, perhaps, I proposed too large of a change across this issue and 
MAPREDUCE-4351: cgroups for cpu, cgroups for memory, a code refactoring, etc.

Instead, I would like to make a smaller change, with just cgroups for CPUs and 
place them in each launcher's code, as you requested above. Perhaps a better 
re-factoring than I suggested with the ContainersMonitor will become clear 
afterwards.

How does this sound to you? I was planning to finish it up on Monday.

best,
Andrew

> Add support for CPU isolation/monitoring of containers
> --
>
> Key: MAPREDUCE-4334
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Arun C Murthy
>Assignee: Andrew Ferguson
> Attachments: MAPREDUCE-4334-pre1.patch, 
> MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre2.patch, 
> MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-pre3.patch, 
> MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch
>
>
> Once we get in MAPREDUCE-4327, it will be important to actually enforce 
> limits on CPU consumption of containers. 
> Several options spring to mind:
> # taskset (RHEL5+)
> # cgroups (RHEL6+)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers

2012-07-21 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419952#comment-13419952
 ] 

Arun C Murthy commented on MAPREDUCE-4334:
--

bq. I disagree. 

Andrew, it seems we are stuck in the weeds debating minutia of the code.

Let's take a step back. 

Can you please start by providing a writeup about your approach(es) and 
pros/cons? Thanks.



> Add support for CPU isolation/monitoring of containers
> --
>
> Key: MAPREDUCE-4334
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Arun C Murthy
>Assignee: Andrew Ferguson
> Attachments: MAPREDUCE-4334-pre1.patch, 
> MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre2.patch, 
> MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-pre3.patch, 
> MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch
>
>
> Once we get in MAPREDUCE-4327, it will be important to actually enforce 
> limits on CPU consumption of containers. 
> Several options spring to mind:
> # taskset (RHEL5+)
> # cgroups (RHEL6+)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers

2012-07-16 Thread Andrew Ferguson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13415528#comment-13415528
 ] 

Andrew Ferguson commented on MAPREDUCE-4334:


bq. Now, it seems like we should enhance the container-launch via LCE to just 
set the requisite cgroups or sched_affinity prior-to or right-after the 
container launch, rather than make them apis. That would be the safest, no?

I disagree. That was the first approach I took for implementing this, but found 
it to be unsatisfactory for several reasons. See: 
https://issues.apache.org/jira/browse/MAPREDUCE-4334?focusedCommentId=13413913&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13413913
 starting at "My first design for this..."

> Add support for CPU isolation/monitoring of containers
> --
>
> Key: MAPREDUCE-4334
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Arun C Murthy
>Assignee: Andrew Ferguson
> Attachments: MAPREDUCE-4334-pre1.patch, 
> MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre2.patch, 
> MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-pre3.patch
>
>
> Once we get in MAPREDUCE-4327, it will be important to actually enforce 
> limits on CPU consumption of containers. 
> Several options spring to mind:
> # taskset (RHEL5+)
> # cgroups (RHEL6+)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers

2012-07-16 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13415524#comment-13415524
 ] 

Robert Joseph Evans commented on MAPREDUCE-4334:


I agree with Bikas and Arun to a point.  I can see some situations, like 
running a multi-tenent Hadoop cloud where you do want strict isolation.  So 
that the people who are paying a premium to get consistent results from their 
part of the cluster never have to worry about someone else doing something 
really bad on another part of the cluster.  Is this enough of a concern to make 
it the default, I would say no.  Is it enough of a concern to make it an option 
that comes with and is maintained by Hadoop, that is TBD, I don't plan on 
running my clusters that way, but I am not the only Hadoop customer.  Arun, 
didn't you mention something at Hadoop Summit about some discussions you had 
with people who want full VMs to run their containers in specifically for 
isolation purposes?

As for memory spikes, at least on Linux I thought you could configure swap on 
Linux containers so that if a container goes over its budget, i.e. spikes, then 
it swaps to disk instead of launching the OOM killer. I could be wrong, I have 
not dug into it very much.

> Add support for CPU isolation/monitoring of containers
> --
>
> Key: MAPREDUCE-4334
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Arun C Murthy
>Assignee: Andrew Ferguson
> Attachments: MAPREDUCE-4334-pre1.patch, 
> MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre2.patch, 
> MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-pre3.patch
>
>
> Once we get in MAPREDUCE-4327, it will be important to actually enforce 
> limits on CPU consumption of containers. 
> Several options spring to mind:
> # taskset (RHEL5+)
> # cgroups (RHEL6+)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers

2012-07-16 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13415470#comment-13415470
 ] 

Arun C Murthy commented on MAPREDUCE-4334:
--

{quote}
So, concretely, this is my proposal:
recognize the LCE binary as the "hadoop root tool"
the LCE will have two new functionalities: 1) sched_setaffinity and 2) creating 
cgroups
in addition to the patch above, I will create 1) another pluggable 
ContainersMonitor which can use these new functions (sched_setaffinity) and 2) 
adapt the one above to optionally use the (creating cgroups) functionality of 
the "hadoop root tool"
{quote}

Thanks, looks like we finally are on the same page - it's what I've been 
proposing for a while now.

Now, it seems like we should enhance the container-launch via LCE to just set 
the requisite cgroups or sched_affinity prior-to or right-after the container 
launch, rather than make them apis. That would be the safest, no?



> Add support for CPU isolation/monitoring of containers
> --
>
> Key: MAPREDUCE-4334
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Arun C Murthy
>Assignee: Andrew Ferguson
> Attachments: MAPREDUCE-4334-pre1.patch, 
> MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre2.patch, 
> MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-pre3.patch
>
>
> Once we get in MAPREDUCE-4327, it will be important to actually enforce 
> limits on CPU consumption of containers. 
> Several options spring to mind:
> # taskset (RHEL5+)
> # cgroups (RHEL6+)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers

2012-07-16 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13415468#comment-13415468
 ] 

Arun C Murthy commented on MAPREDUCE-4334:
--

Good points Bikas, I tend to agree with them.

In the past we used OS limits (via ulimit) and had several issues with 
temporary spikes (particularly with Java processes forking) and hence we moved 
away from OS limits to custom built one which ignores spikes etc.

> Add support for CPU isolation/monitoring of containers
> --
>
> Key: MAPREDUCE-4334
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Arun C Murthy
>Assignee: Andrew Ferguson
> Attachments: MAPREDUCE-4334-pre1.patch, 
> MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre2.patch, 
> MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-pre3.patch
>
>
> Once we get in MAPREDUCE-4327, it will be important to actually enforce 
> limits on CPU consumption of containers. 
> Several options spring to mind:
> # taskset (RHEL5+)
> # cgroups (RHEL6+)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers

2012-07-16 Thread Andrew Ferguson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13415452#comment-13415452
 ] 

Andrew Ferguson commented on MAPREDUCE-4334:


Hi Bikas, thanks for thinking about this! Comments inline:

bq. Somewhere in this thread it was mentioned controlling memory via OS. In my 
experience this is not an optimal choice because
bq. 1) makes it hard to debug task failures due to memory issues. Abrupt OS 
termination or denial or more memory resulting in NPE/bad pointers etc. Its 
better to just monitor the memory and then enforce limits with clear error 
message saying - task was terminated because it used more memory than alloted.

On Linux, enforcing memory limits via Cgroups feels a bit like simply running a 
process on a machine with less memory installed. When the memory allocation is 
pushing the threshold, the Linux OOM killer destroys the task. The patch above 
detects that the process has been killed and logs a error message indicating 
that the task was killed for consuming too many reousrces.

bq. 2) due to different scenarios, tasks may have memory spikes or temporary 
increases. The OS will enforce tight limits but NodeManager monitoring can be 
more flexible and not terminate a task because it shot to 2.1GB instead of 
staying under 2.

I would argue that the strict enforcement of Cgroups is exactly the behavior we 
want because it provides isolation. If two containers are running on a node 
with 4 GB of RAM, and each are using 2 GB, and one happens to spike to 3 GB 
momentarily, the spiking container should suffer -- if we continue monitoring 
the memory as done today, then the well-behaved container might suffer by being 
swapped-out to make room for the spiking container.

I believe the spiking concern is mitigated by the fact that Cgroups allows you 
to set both a physical memory limit, and a virtual memory limit (which my patch 
above makes use of). For example, I set the physical memory limit to say, 1 GB 
of RAM, and the virtual memory limit to 2.1 GB. When a process momentarily 
spikes above it's 1 GB of RAM, it will be allocated memory from swap without a 
problem. This is configurable by the already extant 
"yarn.nodemanager.vmem-pmem-ratio" setting.

bq. Disk scheduling and monitoring would be a hard to achieve goal with 
multiple writers to disk spinning things their own way and expecting something 
that will likely not happen.

Sure, it is tricky, and the feasibility depends on the semantics YARN promises 
applications. However, the Linux Completely Fair Queuing I/O scheduler has 
semantics which are quite similar to the semantics I'm proposing we promise for 
CPUs (proportional weights). The blkio Cgroup subsystem already today provides 
both proportional sharing and throttling: 
http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/ch-Subsystems_and_Tunable_Parameters.html#sec-blkio

bq. Network scheduling and monitoring shares choke points at multiple levels 
beyond the machines and trying to optimally and proportionally use the network 
tends to be a problem thats better served globally.

YARN is a global scheduler. Linux traffic controls [1], in combination with the 
network controller for Cgroups, can be used to implement the results of Seawall 
[2], FairCloud [3], and similar projects. There are many datacenter designs 
these days; some will be a perfect match for end-host-only bandwidth control, 
and others an imperfect match. While end-host-only bandwidth control is not a 
magic bullet, I strongly believe that it is both useful enough, and easy enough 
to implement, to warrant pursuit.


bq. My 2 cents would be to limit this to just CPU for now.

It is. However, I believe the patch above is easily extensible to other 
resources (you can see for yourself that there is a small difference between 
the memory-only patch, and the memory+cpu patch).

bq. Based on the comments above, I would agree that we need to make sure 
platform specific stuff should not leak into the code so that other platforms 
(imminently Windows) can support this stuff.

Totally agree. That's why I proposed making it pluggable with MAPREDUCE-4351.

bq. An alternative to pluggable ContainersMonitor would be to make CPU 
management a pluggable component of ContainersManager. My POV is that 
ContainersManager manages the resources of containers and has logic that will 
be common across platforms. The tools it uses will change. Eg. 
ProcfsBaseProcessTree is the tool used to monitor and manage memory. I can see 
that being changed to a MemoryMonitor interface with platform specific 
implementations. Thats whats happening on the Windows port in branch 1. I can 
see a CPUMonitor interface for CPU. Or maybe a ResourceMonitor that has methods 
for both memory and CPU.

I'm afraid I'm a bit confused by your suggestion here -- ContainersMonitor is 
a

[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers

2012-07-14 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13414567#comment-13414567
 ] 

Bikas Saha commented on MAPREDUCE-4334:
---

Somewhere in this thread it was mentioned controlling memory via OS. In my 
experience this is not an optimal choice because
1) makes it hard to debug task failures due to memory issues. Abrupt OS 
termination or denial or more memory resulting in NPE/bad pointers etc. Its 
better to just monitor the memory and then enforce limits with clear error 
message saying - task was terminated because it used more memory than alloted.
2) due to different scenarios, tasks may have memory spikes or temporary 
increases. The OS will enforce tight limits but NodeManager monitoring can be 
more flexible and not terminate a task because it shot to 2.1GB instead of 
staying under 2. 

Disk scheduling and monitoring would be a hard to achieve goal with multiple 
writers to disk spinning things their own way and expecting something that will 
likely not happen. Network scheduling and monitoring shares choke points at 
multiple levels beyond the machines and trying to optimally and proportionally 
use the network tends to be a problem thats better served globally. 

My 2 cents would be to limit this to just CPU for now. Based on the comments 
above, I would agree that we need to make sure platform specific stuff should 
not leak into the code so that other platforms (imminently Windows) can support 
this stuff. 
An alternative to pluggable ContainersMonitor would be to make CPU management a 
pluggable component of ContainersManager. My POV is that ContainersManager 
manages the resources of containers and has logic that will be common across 
platforms. The tools it uses will change. Eg. ProcfsBaseProcessTree is the tool 
used to monitor and manage memory. I can see that being changed to a 
MemoryMonitor interface with platform specific implementations. Thats whats 
happening on the Windows port in branch 1. I can see a CPUMonitor interface for 
CPU. Or maybe a ResourceMonitor that has methods for both memory and CPU.

> Add support for CPU isolation/monitoring of containers
> --
>
> Key: MAPREDUCE-4334
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Arun C Murthy
>Assignee: Andrew Ferguson
> Attachments: MAPREDUCE-4334-pre1.patch, 
> MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre2.patch, 
> MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-pre3.patch
>
>
> Once we get in MAPREDUCE-4327, it will be important to actually enforce 
> limits on CPU consumption of containers. 
> Several options spring to mind:
> # taskset (RHEL5+)
> # cgroups (RHEL6+)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers

2012-07-13 Thread Andrew Ferguson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13414108#comment-13414108
 ] 

Andrew Ferguson commented on MAPREDUCE-4334:


Arun -- I think we might be talking past each other, as we agree that both 
cgroups and taskset should be available.

BTW, it turns out the sched_setaffinity() syscall does not require root if it 
is applied to a process you own. Therefore, if you are running with the 
DefaultContainerExecutor, you can still use sched_setaffinity, which is 
excellent.


I think this is the matrix of possible use cases:
1) launch container as user & use sched_setaffinity / taskset / CPU pinning
2) launch container as user & use cgroups completely managed by Hadoop
3) launch container as user & use cgroups managed by the cluster operator
4) launch container as Hadoop & use sched_setaffinity / taskset / CPU pinning
5) launch container as Hadoop & use cgroups completely managed by Hadoop
6) launch container as Hadoop & use cgroups managed by the cluster operator

Cases 1, 2, 3 and 5 require root privs.

Cases 3 and 6 are covered by the patch above.

I'm happy to expand the LCE into a "hadoop root tool" which can be used in 
cases 1, 2, 3, and 5.

In my mind, the design question is how to cover all six cases with the most 
amount of code re-use.

Today, we have two important ContainerManager subsystems: the Launcher and the 
Monitor. Today, reforce enforcement is entirely done within the Monitor. The 
question is, where should new resource enforcement be done? I think the answer 
is still "in the Monitor" even though, in some use cases, it needs access to 
root privs. To get access to those privs, it can call the LCE binary (aka the 
"hadoop root tool"), just as the java-side of the LCE does today.

So, concretely, this is my proposal:
- recognize the LCE binary as the "hadoop root tool"
- the LCE will have two new functionalities: 1) sched_setaffinity and 2) 
creating cgroups
- in addition to the patch above, I will create 1) another pluggable 
ContainersMonitor which can use these new functions (sched_setaffinity) and 2) 
adapt the one above to optionally use the (creating cgroups) functionality of 
the "hadoop root tool"

how does that sound?




> Add support for CPU isolation/monitoring of containers
> --
>
> Key: MAPREDUCE-4334
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Arun C Murthy
>Assignee: Andrew Ferguson
> Attachments: MAPREDUCE-4334-pre1.patch, 
> MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre2.patch, 
> MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-pre3.patch
>
>
> Once we get in MAPREDUCE-4327, it will be important to actually enforce 
> limits on CPU consumption of containers. 
> Several options spring to mind:
> # taskset (RHEL5+)
> # cgroups (RHEL6+)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers

2012-07-13 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13414053#comment-13414053
 ] 

Arun C Murthy commented on MAPREDUCE-4334:
--

Andrew - please don't this the wrong way, I certainly am *not* trying to debate 
taskset v/s cgroups. All I'm saying is 'we need both' for the dominant 
platforms: RHEL5 and RHEL6. I perfectly understand that you might not have the 
time or the inclination to do both, and I'm happy to help, personally - 
supporting just RHEL6 isn't enough.

Given that, we have two options:
# Admin-setup cgroups (outside YARN) 
# YARN handles it on it's own via LCE

Now the pros of using LCE:
# It already exists! Hence it doesn't require any *new* operational 
requirements. 
# It's consistent for both technologies/platforms we need to support: 
taskset/RHEL5 and cgroups/RHEL6. 
# Even better, we can use the same for any platform in the future e.g. 
WindowsContainerExecutor (for e.g. we already have WindowsTaskController in 
branch-1-win and would need to get ported to branch-2 soon).
# It's *much lesser* overhead on admins - they don't have to create cgroups 
upfront, they don't have to mount them to get them to survive reboots etc.

Cons:
# Need LCE for non-secure setups. We actually did support LTC without security 
in branch-1 at some point, happy to discuss.

In the alternate (admin-setup groups) we will _still_ need LCE (or worse, 
*another* setuid script) to support taskset. To me that is a very bad choice.

As a result, using LCE seems like a significantly superior alternative.



Some other comments:

bq. In my mind, the LCE is for starting processes, and should stick to doing 
that. 

Not true at all, we already use it for container cleanup etc. 

{quote}
4) For cgroups, we could have a second ContainersMonitor plugin which uses a 
setuid root binary to also mount & create cgroups, freeing the admin from 
managing them at all.
5) For taskset, we can implement a ContainersMonitor which uses a setuid root 
binary (potentially the LCE, but perhaps better if it's something else, just to 
keep the security footprint down) to pin processes to CPUs. This 
ContainersMonitor will also need the memory enforcement code from the current 
ContainersMonitorImpl
{quote}

Like I said above, have two ways to do the same when we can do with one 
*existing* component i.e. LCE seems like a clear choice.

I understand you might not have time to port your work via LCE, I'm happy to 
either help or take up that work.

> Add support for CPU isolation/monitoring of containers
> --
>
> Key: MAPREDUCE-4334
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Arun C Murthy
>Assignee: Andrew Ferguson
> Attachments: MAPREDUCE-4334-pre1.patch, 
> MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre2.patch, 
> MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-pre3.patch
>
>
> Once we get in MAPREDUCE-4327, it will be important to actually enforce 
> limits on CPU consumption of containers. 
> Several options spring to mind:
> # taskset (RHEL5+)
> # cgroups (RHEL6+)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers

2012-07-13 Thread Andrew Ferguson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13413913#comment-13413913
 ] 

Andrew Ferguson commented on MAPREDUCE-4334:


hi all, I think there are pros and cons to both approaches, which I will try to 
outline below.

Cgroups:
- they provide a coherent path for future resource management: network 
bandwidth, CPU upper- and lower-bounds, block I/O priorities and limits, etc. 
[1]
- can be integrated with resource management for other applications, drawing 
upon a single resource budget for a group of users
- cgroup's hierarchies are key to this. in a taskset-only world, the NM would 
need to be given a fixed allocation of the node's CPUs to manage
- cgroups are not persistent across reboots. this is unfortunate. however, 1) 
anyone using them needs to mount them on startup, so they will need to make a 
change to their startup process already, and 2) there are extensive, 
cross-distro tools to create and manage cgroups automatically on reboot (RHEL 6 
has great docs on them [2])
- some clusters are already using Cgroups, without any support from 
Hadoop/YARN. for example, StumbleUpon [3]


Taskset:
- compatible with RHEL 5
- does not require changes to node startup
- can be implemented with a SUID root binary, as LCE is today


My first design for this JIRA had the LCE create the cgroups. This turned out 
to be the wrong approach for several reasons:
- What if I wanted to use the regular container executor with cgroups? An admin 
may not allow me to have a setuid root binary, but may be willing to create a 
cgroup hierarchy for me (after all, this is one advantage of the hierarchy: 
delegation)
- Conversely, what if I wanted to use the LCE without cgroups?
- There needs to be a part of the NM responsible for deleting unused cgroups, 
and the other tasks of a ContainersManager I described in MAPREDUCE-4351. Some 
of those are specific to how resource enforcement is being done; it seemed best 
to keep that code together in the ContainersManager, then spread across a 
ContainersManager and the LCE.
- Putting the resource enforcement "smarts" in the ContainersMonitor (which is 
already receiving events from the RM), allows it to dynamically adjust the 
resource enforcement
- On startup, the JVM can appear to be using twice as much memory as it 
actually is (see comment in ContainersMonitorImpl.java). By starting the JVM 
within the cgroup, rather than allowing it to start outside the cgroup and 
moving it into the cgroup with a ContainersMonitor as my patch above does, the 
kernel may kill the JVM inadvertently.
 

I really like the flexibility of keeping the LCE and resource enforcement 
separate. In my mind, the LCE is for starting processes, and should stick to 
doing that. Resource enforcement is a separate job.



My recommendation is the following:
1) Keep the LCE as it is.
2) Support pluggable ContainersMonitors (MAPREDUCE-4351)
3) For cgroups, we can start with the patch above. It is best for admins who 
already use cgroups on their nodes and want to have YARN take advantage of 
them. (This is the point of the yarn.nodemanger.cgroups.path config option I 
added)
4) For cgroups, we could have a second ContainersMonitor plugin which uses a 
setuid root binary to also mount & create cgroups, freeing the admin from 
managing them at all.
5) For taskset, we can implement a ContainersMonitor which uses a setuid root 
binary (potentially the LCE, but perhaps better if it's something else, just to 
keep the security footprint down) to pin processes to CPUs. This 
ContainersMonitor will also need the memory enforcement code from the current 
ContainersMonitorImpl


I've done 1-3 (well, #1 is a freebie :-) ... and I can definitely do #5 as well.


Arun, does this design appeal to you?



[1] 
http://www.linux-kongress.org/2010/slides/seyfried-cgroups-linux-kongress-2010-presentation.pdf
[2] 
https://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/index.html
[3] http://inchoate-clatter.blogspot.com/2012/03/hbase-ops-automation.html


> Add support for CPU isolation/monitoring of containers
> --
>
> Key: MAPREDUCE-4334
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Arun C Murthy
>Assignee: Andrew Ferguson
> Attachments: MAPREDUCE-4334-pre1.patch, 
> MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre2.patch, 
> MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-pre3.patch
>
>
> Once we get in MAPREDUCE-4327, it will be important to actually enforce 
> limits on CPU consumption of containers. 
> Several options spring to mind:
> # taskset (RHEL5+)
> # cgroups (RHEL6+)

--
This message is automatically gener

[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers

2012-07-12 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13413518#comment-13413518
 ] 

Arun C Murthy commented on MAPREDUCE-4334:
--

bq. Clearly, we need to support taskset for platforms on which cgroups isn't 
supported e.g. RHEL5. For taskset you need super-user privs - would you prefer 
packages to do it too?

I meant to say: for taskset we clearly need to go via LCE at runtime.

> Add support for CPU isolation/monitoring of containers
> --
>
> Key: MAPREDUCE-4334
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Arun C Murthy
>Assignee: Andrew Ferguson
> Attachments: MAPREDUCE-4334-pre1.patch, 
> MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre2.patch, 
> MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-pre3.patch
>
>
> Once we get in MAPREDUCE-4327, it will be important to actually enforce 
> limits on CPU consumption of containers. 
> Several options spring to mind:
> # taskset (RHEL5+)
> # cgroups (RHEL6+)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers

2012-07-12 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13413517#comment-13413517
 ] 

Arun C Murthy commented on MAPREDUCE-4334:
--

Also, it does look like cgroups might not be persisted across reboots - just 
makes it much worse to deal with in that case.

> Add support for CPU isolation/monitoring of containers
> --
>
> Key: MAPREDUCE-4334
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Arun C Murthy
>Assignee: Andrew Ferguson
> Attachments: MAPREDUCE-4334-pre1.patch, 
> MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre2.patch, 
> MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-pre3.patch
>
>
> Once we get in MAPREDUCE-4327, it will be important to actually enforce 
> limits on CPU consumption of containers. 
> Several options spring to mind:
> # taskset (RHEL5+)
> # cgroups (RHEL6+)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers

2012-07-12 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13413515#comment-13413515
 ] 

Arun C Murthy commented on MAPREDUCE-4334:
--

bq. How is it onerous? Packages could easily do this as part of the install on 
platforms where it's supported.

This doesn't make sense. What if CPU isolation is disabled? Do you still want 
'packages' to make it part of the install?

Clearly, we need to support taskset for platforms on which cgroups isn't 
supported e.g. RHEL5. For taskset you need super-user privs - would you prefer 
packages to do it too?

Yes, LTC is a pain, but using it consistently (e.g. for both cgroups and 
taskset) seems better than having multiple steps forced on the admin (LCE + 
cgroups + taskset etc.).

> Add support for CPU isolation/monitoring of containers
> --
>
> Key: MAPREDUCE-4334
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Arun C Murthy
>Assignee: Andrew Ferguson
> Attachments: MAPREDUCE-4334-pre1.patch, 
> MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre2.patch, 
> MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-pre3.patch
>
>
> Once we get in MAPREDUCE-4327, it will be important to actually enforce 
> limits on CPU consumption of containers. 
> Several options spring to mind:
> # taskset (RHEL5+)
> # cgroups (RHEL6+)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers

2012-07-12 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13413499#comment-13413499
 ] 

Todd Lipcon commented on MAPREDUCE-4334:


bq. Preventing such onerous requirements on cluster setup is a key goal - 
something which initially led to creation of LinuxTaskController etc.

How is it onerous? Packages could easily do this as part of the install on 
platforms where it's supported.

It seems equivalent to the installation of the LTC itself, which requires root 
to make it setuid, right?

Andrew: do the cgroups persist cross-reboot, or does that cgcreate command need 
to go the startup scripts?

> Add support for CPU isolation/monitoring of containers
> --
>
> Key: MAPREDUCE-4334
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Arun C Murthy
>Assignee: Andrew Ferguson
> Attachments: MAPREDUCE-4334-pre1.patch, 
> MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre2.patch, 
> MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-pre3.patch
>
>
> Once we get in MAPREDUCE-4327, it will be important to actually enforce 
> limits on CPU consumption of containers. 
> Several options spring to mind:
> # taskset (RHEL5+)
> # cgroups (RHEL6+)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers

2012-07-12 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13413438#comment-13413438
 ] 

Arun C Murthy commented on MAPREDUCE-4334:
--

Andrew, thanks, I missed that comment.

I'm concerned asking admins to setup croups etc. via cgcreate prior to 
deploying Hadoop clusters etc., particularly on all nodes, is almost a 
non-starter.

Preventing such onerous requirements on cluster setup is a key goal - something 
which initially led to creation of LinuxTaskController etc.

I'd strongly urge we implement this functionality via LinuxContainerExecutor - 
there-by allowing us to write low-level platform specific code (RHEL5 v/s RHEL6 
etc.) in a single place and not rely on tedious Java code for the same.

Thoughts?

> Add support for CPU isolation/monitoring of containers
> --
>
> Key: MAPREDUCE-4334
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Arun C Murthy
>Assignee: Andrew Ferguson
> Attachments: MAPREDUCE-4334-pre1.patch, 
> MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre2.patch, 
> MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-pre3.patch
>
>
> Once we get in MAPREDUCE-4327, it will be important to actually enforce 
> limits on CPU consumption of containers. 
> Several options spring to mind:
> # taskset (RHEL5+)
> # cgroups (RHEL6+)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers

2012-07-12 Thread Andrew Ferguson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13413396#comment-13413396
 ] 

Andrew Ferguson commented on MAPREDUCE-4334:


@Arun:  no, the NM does not need superuser privs. in my comment above [1], the 
line "$ sudo cgcreate -a hadoop_user_name -g memory:hadoop-yarn" is run when 
installing Hadoop. This creates a branch of the memory hierarchy called 
"hadoop-yarn" which is owned by the user "hadoop_user_name" (which would be the 
user running the NM). This allows the NM to create and move cgroups without 
superuser privs.

The one complication is only the superuser or the owner of a process may move a 
process into a cgroup. As the LinuxContainerExecutor runs processes under 
different user accounts, we will need to either augment it, or use a similar 
tool to move such processes into a cgroup created by the NM user. 

Let me know if you'd like further clarification.



[1] 
https://issues.apache.org/jira/browse/MAPREDUCE-4334?focusedCommentId=13399014&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13399014

> Add support for CPU isolation/monitoring of containers
> --
>
> Key: MAPREDUCE-4334
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Arun C Murthy
>Assignee: Andrew Ferguson
> Attachments: MAPREDUCE-4334-pre1.patch, 
> MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre2.patch, 
> MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-pre3.patch
>
>
> Once we get in MAPREDUCE-4327, it will be important to actually enforce 
> limits on CPU consumption of containers. 
> Several options spring to mind:
> # taskset (RHEL5+)
> # cgroups (RHEL6+)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers

2012-07-12 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13413380#comment-13413380
 ] 

Arun C Murthy commented on MAPREDUCE-4334:
--

Andrew, what are the security implications here? Does the NM need superuser 
privs to create/move cgroups?

> Add support for CPU isolation/monitoring of containers
> --
>
> Key: MAPREDUCE-4334
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Arun C Murthy
>Assignee: Andrew Ferguson
> Attachments: MAPREDUCE-4334-pre1.patch, 
> MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre2.patch, 
> MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-pre3.patch
>
>
> Once we get in MAPREDUCE-4327, it will be important to actually enforce 
> limits on CPU consumption of containers. 
> Several options spring to mind:
> # taskset (RHEL5+)
> # cgroups (RHEL6+)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers

2012-06-18 Thread Andrew Ferguson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13396261#comment-13396261
 ] 

Andrew Ferguson commented on MAPREDUCE-4334:


ok, putting all of this in the ContainerExecutor is not the way to go, as it 
precludes use of secure Hadoop's Linux container-executor.

In my new design, ContainerMonitor will be a pluggable component, just as 
ContainerExecutor is now. Then, we can provide a ContainerMonitor which uses 
cgroups to control resource usage, rather than the existing ContainerMonitor 
(to be renamed as "DefaultContainerMonitor"). This has several advantages:
1) allows us to keep existing ContainerMonitor for users who can't use cgroups 
(eg, users without root access during Hadoop setup)
2) ContainerMonitor already receives an event when it's time to stop 
monitoring, which we can use as notification to delete the container's cgroup
3) ContainerMonitor receives the resource limits already; no need to calculate 
them based on the configs
4) A pluggable ContainerMonitor paves the way for ContainerMonitors on other 
platforms

I will first open a sub-task to make ContainerMonitor pluggable.

The only trouble spot with this design is that it's not possible to move 
another non-root user's process into a cgroup. I plan to extend the secure 
container-executor to be able to make such a move.

Please let me know if you have any feedback about this proposal.


thank you,
Andrew

> Add support for CPU isolation/monitoring of containers
> --
>
> Key: MAPREDUCE-4334
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Arun C Murthy
>Assignee: Arun C Murthy
>
> Once we get in MAPREDUCE-4327, it will be important to actually enforce 
> limits on CPU consumption of containers. 
> Several options spring to mind:
> # taskset (RHEL5+)
> # cgroups (RHEL6+)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers

2012-06-15 Thread Andrew Ferguson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13296036#comment-13296036
 ] 

Andrew Ferguson commented on MAPREDUCE-4334:


Hi Arun,

I've thought some more about implementing taskset since our chat at the YARN 
meet-up.

One benefit of cgroups is they're "set it and forget it" -- in the 
ContainerExecutor, we simply place the new task in the appropriate cgroup, and 
the kernel will take care of the rest. This would allow us to ditch the 
ContainersMonitor infrastructure.

On the other hand, with taskset, we will need to do the CPU scheduling 
ourselves. Say I have two cores and start with two processes, A (requested 0.5 
cores) and B (requested 0.5 cores). I can start by putting them both on core 1 
for efficiency, or I can put them on separate cores for higher utilization. But 
if process C (requested 1 core) comes along, I will need to set A & B to the 
same core. This is just a simple scenario, but more cores and processes will 
likely grow a complicated CPU scheduler inside the NodeManager 
(ContainersMonitorImpl is probably the right place, since it is already 
monitoring container resource usage).


tl;dr -- I believe cgroups requires only local state when launching containers, 
while taskset requires us to maintain global state.

thoughts?

thanks!
Andrew

> Add support for CPU isolation/monitoring of containers
> --
>
> Key: MAPREDUCE-4334
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Arun C Murthy
>Assignee: Arun C Murthy
>
> Once we get in MAPREDUCE-4327, it will be important to actually enforce 
> limits on CPU consumption of containers. 
> Several options spring to mind:
> # taskset (RHEL5+)
> # cgroups (RHEL6+)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers

2012-06-12 Thread Andrew Ferguson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293777#comment-13293777
 ] 

Andrew Ferguson commented on MAPREDUCE-4334:


hi Arun,

I've actually been looking into this recently myself, and would be happy to 
take the lead on it. So far, I've been focusing on cgroups as they also provide 
memory containment, and provide a path for managing future resources as well. 
although taskset is available on RHEL5, it's not capable of isolating fractions 
of a CPU.

while cgroups' memory support gives an upper-bound on the amount of memory 
tasks can consume, the RHEL6 cpu support is actually a lower-bound. until CFS 
bandwidth control [1] is more widespread, we can place tasks judiciously to 
create guarantees, building on cgroups to ensure the lower-bounds.


best,
Andrew


[1] for a quick overview: http://lwn.net/Articles/428230/ ... more in-depth 
discussion here: http://www.kernel.org/doc/ols/2010/ols2010-pages-245-254.pdf

> Add support for CPU isolation/monitoring of containers
> --
>
> Key: MAPREDUCE-4334
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Arun C Murthy
>Assignee: Arun C Murthy
>
> Once we get in MAPREDUCE-4327, it will be important to actually enforce 
> limits on CPU consumption of containers. 
> Several options spring to mind:
> # taskset (RHEL5+)
> # cgroups (RHEL6+)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira