Re: Review Request 71944: Set container process's OOM score adjust.

2020-03-18 Thread Greg Mann

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71944/#review219991
---


Ship it!




Ship It!

- Greg Mann


On March 16, 2020, 9:10 a.m., Qian Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71944/
> ---
> 
> (Updated March 16, 2020, 9:10 a.m.)
> 
> 
> Review request for mesos, Andrei Budnik and Greg Mann.
> 
> 
> Bugs: MESOS-10048
> https://issues.apache.org/jira/browse/MESOS-10048
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Set container process's OOM score adjust.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/mesos/isolators/cgroups/cgroups.cpp 
> bf2a4d8d587969568f451f63e4f619e9c49f3642 
>   src/slave/containerizer/mesos/isolators/cgroups/subsystem.hpp 
> a311ab4495f71bedacd2e99c84c765f0e5fe99d3 
>   src/slave/containerizer/mesos/isolators/cgroups/subsystem.cpp 
> dc6c7aa1c998c30c8b17db04a38e7a1e28a6a6c1 
>   src/slave/containerizer/mesos/isolators/cgroups/subsystems/devices.hpp 
> c62deec4b1cd749dba5fe71b901e0353806a0805 
>   src/slave/containerizer/mesos/isolators/cgroups/subsystems/devices.cpp 
> ac2e66b570bb84b43c4a3e8f19b40e5fcea71a4a 
>   src/slave/containerizer/mesos/isolators/cgroups/subsystems/memory.hpp 
> 27d88e91fb784179effd54781f84000fe85c13eb 
>   src/slave/containerizer/mesos/isolators/cgroups/subsystems/memory.cpp 
> 0896d37761a11f55ba4b866d235c3bd2b79dcfba 
>   src/slave/containerizer/mesos/isolators/cgroups/subsystems/net_cls.hpp 
> 06531072f445d4ec978ebaf5ec5e4a2475517d05 
>   src/slave/containerizer/mesos/isolators/cgroups/subsystems/net_cls.cpp 
> ec2ce67e54387f26aa11c00d4c7f85f0807a127b 
>   src/slave/containerizer/mesos/isolators/cgroups/subsystems/perf_event.hpp 
> 2c865aca35084a5db567b5f95c8c57bb6e1d5634 
>   src/slave/containerizer/mesos/isolators/cgroups/subsystems/perf_event.cpp 
> 180afc936798c2fa4de0deef080276cf7cc94199 
>   src/slave/containerizer/mesos/utils.hpp 
> bfd07e28c78fc140e395ffccd11d65545bf007fc 
>   src/slave/containerizer/mesos/utils.cpp 
> d9964f0b0e8ad8a41e5f5490a1c28bba9a972ae0 
> 
> 
> Diff: https://reviews.apache.org/r/71944/diff/8/
> 
> 
> Testing
> ---
> 
> sudo make check
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>



Re: Review Request 71944: Set container process's OOM score adjust.

2020-03-16 Thread Qian Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71944/
---

(Updated March 16, 2020, 5:10 p.m.)


Review request for mesos, Andrei Budnik and Greg Mann.


Changes
---

Rebased.


Bugs: MESOS-10048
https://issues.apache.org/jira/browse/MESOS-10048


Repository: mesos


Description
---

Set container process's OOM score adjust.


Diffs (updated)
-

  src/slave/containerizer/mesos/isolators/cgroups/cgroups.cpp 
bf2a4d8d587969568f451f63e4f619e9c49f3642 
  src/slave/containerizer/mesos/isolators/cgroups/subsystem.hpp 
a311ab4495f71bedacd2e99c84c765f0e5fe99d3 
  src/slave/containerizer/mesos/isolators/cgroups/subsystem.cpp 
dc6c7aa1c998c30c8b17db04a38e7a1e28a6a6c1 
  src/slave/containerizer/mesos/isolators/cgroups/subsystems/devices.hpp 
c62deec4b1cd749dba5fe71b901e0353806a0805 
  src/slave/containerizer/mesos/isolators/cgroups/subsystems/devices.cpp 
ac2e66b570bb84b43c4a3e8f19b40e5fcea71a4a 
  src/slave/containerizer/mesos/isolators/cgroups/subsystems/memory.hpp 
27d88e91fb784179effd54781f84000fe85c13eb 
  src/slave/containerizer/mesos/isolators/cgroups/subsystems/memory.cpp 
0896d37761a11f55ba4b866d235c3bd2b79dcfba 
  src/slave/containerizer/mesos/isolators/cgroups/subsystems/net_cls.hpp 
06531072f445d4ec978ebaf5ec5e4a2475517d05 
  src/slave/containerizer/mesos/isolators/cgroups/subsystems/net_cls.cpp 
ec2ce67e54387f26aa11c00d4c7f85f0807a127b 
  src/slave/containerizer/mesos/isolators/cgroups/subsystems/perf_event.hpp 
2c865aca35084a5db567b5f95c8c57bb6e1d5634 
  src/slave/containerizer/mesos/isolators/cgroups/subsystems/perf_event.cpp 
180afc936798c2fa4de0deef080276cf7cc94199 
  src/slave/containerizer/mesos/utils.hpp 
bfd07e28c78fc140e395ffccd11d65545bf007fc 
  src/slave/containerizer/mesos/utils.cpp 
d9964f0b0e8ad8a41e5f5490a1c28bba9a972ae0 


Diff: https://reviews.apache.org/r/71944/diff/8/

Changes: https://reviews.apache.org/r/71944/diff/7-8/


Testing
---

sudo make check


Thanks,

Qian Zhang



Re: Review Request 71944: Set container process's OOM score adjust.

2020-03-16 Thread Qian Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71944/
---

(Updated March 16, 2020, 3:17 p.m.)


Review request for mesos, Andrei Budnik and Greg Mann.


Changes
---

Updated some comments in the code.


Bugs: MESOS-10048
https://issues.apache.org/jira/browse/MESOS-10048


Repository: mesos


Description
---

Set container process's OOM score adjust.


Diffs (updated)
-

  src/slave/containerizer/mesos/isolators/cgroups/cgroups.cpp 
bf2a4d8d587969568f451f63e4f619e9c49f3642 
  src/slave/containerizer/mesos/isolators/cgroups/subsystem.hpp 
a311ab4495f71bedacd2e99c84c765f0e5fe99d3 
  src/slave/containerizer/mesos/isolators/cgroups/subsystem.cpp 
dc6c7aa1c998c30c8b17db04a38e7a1e28a6a6c1 
  src/slave/containerizer/mesos/isolators/cgroups/subsystems/devices.hpp 
c62deec4b1cd749dba5fe71b901e0353806a0805 
  src/slave/containerizer/mesos/isolators/cgroups/subsystems/devices.cpp 
ac2e66b570bb84b43c4a3e8f19b40e5fcea71a4a 
  src/slave/containerizer/mesos/isolators/cgroups/subsystems/memory.hpp 
27d88e91fb784179effd54781f84000fe85c13eb 
  src/slave/containerizer/mesos/isolators/cgroups/subsystems/memory.cpp 
0896d37761a11f55ba4b866d235c3bd2b79dcfba 
  src/slave/containerizer/mesos/isolators/cgroups/subsystems/net_cls.hpp 
06531072f445d4ec978ebaf5ec5e4a2475517d05 
  src/slave/containerizer/mesos/isolators/cgroups/subsystems/net_cls.cpp 
ec2ce67e54387f26aa11c00d4c7f85f0807a127b 
  src/slave/containerizer/mesos/isolators/cgroups/subsystems/perf_event.hpp 
2c865aca35084a5db567b5f95c8c57bb6e1d5634 
  src/slave/containerizer/mesos/isolators/cgroups/subsystems/perf_event.cpp 
180afc936798c2fa4de0deef080276cf7cc94199 
  src/slave/containerizer/mesos/utils.hpp 
bfd07e28c78fc140e395ffccd11d65545bf007fc 
  src/slave/containerizer/mesos/utils.cpp 
d9964f0b0e8ad8a41e5f5490a1c28bba9a972ae0 


Diff: https://reviews.apache.org/r/71944/diff/7/

Changes: https://reviews.apache.org/r/71944/diff/6-7/


Testing
---

sudo make check


Thanks,

Qian Zhang



Re: Review Request 71944: Set container process's OOM score adjust.

2020-03-16 Thread Qian Zhang


> On March 2, 2020, 9:50 p.m., Greg Mann wrote:
> > src/slave/containerizer/mesos/isolators/cgroups/subsystems/memory.cpp
> > Lines 222 (patched)
> > 
> >
> > Why don't we want to set the OOM score adjustment for custom executor 
> > containers?

For custom executor (actually same to default executor), if its soft limit < 
hard limit, that means it must launch some nested containers whose 
`share_cgroups` is false and soft limit < hard limit, in which case we will set 
OOM score adjustment for each nested container, and we assume the customer 
executor itself will not consume much memory, so we just leave its OOM score 
adjustment as 0.

If the tasks launched by custom executor have `share_cgroups` as true, then 
custom executor's soft limit must be equal to its hard limit (since no resource 
limits can be specified for those tasks), so here we will still leave custom 
executor's OOM score adjustment as 0, and this ensures backward compatibility.


- Qian


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71944/#review219718
---


On March 11, 2020, 5:40 p.m., Qian Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71944/
> ---
> 
> (Updated March 11, 2020, 5:40 p.m.)
> 
> 
> Review request for mesos, Andrei Budnik and Greg Mann.
> 
> 
> Bugs: MESOS-10048
> https://issues.apache.org/jira/browse/MESOS-10048
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Set container process's OOM score adjust.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/mesos/isolators/cgroups/cgroups.cpp 
> b12b73d8e0161d448075378765e77867521de04e 
>   src/slave/containerizer/mesos/isolators/cgroups/subsystem.hpp 
> a311ab4495f71bedacd2e99c84c765f0e5fe99d3 
>   src/slave/containerizer/mesos/isolators/cgroups/subsystem.cpp 
> dc6c7aa1c998c30c8b17db04a38e7a1e28a6a6c1 
>   src/slave/containerizer/mesos/isolators/cgroups/subsystems/devices.hpp 
> c62deec4b1cd749dba5fe71b901e0353806a0805 
>   src/slave/containerizer/mesos/isolators/cgroups/subsystems/devices.cpp 
> ac2e66b570bb84b43c4a3e8f19b40e5fcea71a4a 
>   src/slave/containerizer/mesos/isolators/cgroups/subsystems/memory.hpp 
> 27d88e91fb784179effd54781f84000fe85c13eb 
>   src/slave/containerizer/mesos/isolators/cgroups/subsystems/memory.cpp 
> 0896d37761a11f55ba4b866d235c3bd2b79dcfba 
>   src/slave/containerizer/mesos/isolators/cgroups/subsystems/net_cls.hpp 
> 06531072f445d4ec978ebaf5ec5e4a2475517d05 
>   src/slave/containerizer/mesos/isolators/cgroups/subsystems/net_cls.cpp 
> ec2ce67e54387f26aa11c00d4c7f85f0807a127b 
>   src/slave/containerizer/mesos/isolators/cgroups/subsystems/perf_event.hpp 
> 2c865aca35084a5db567b5f95c8c57bb6e1d5634 
>   src/slave/containerizer/mesos/isolators/cgroups/subsystems/perf_event.cpp 
> 180afc936798c2fa4de0deef080276cf7cc94199 
>   src/slave/containerizer/mesos/utils.hpp 
> bfd07e28c78fc140e395ffccd11d65545bf007fc 
>   src/slave/containerizer/mesos/utils.cpp 
> d9964f0b0e8ad8a41e5f5490a1c28bba9a972ae0 
> 
> 
> Diff: https://reviews.apache.org/r/71944/diff/6/
> 
> 
> Testing
> ---
> 
> sudo make check
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>



Re: Review Request 71944: Set container process's OOM score adjust.

2020-03-11 Thread Qian Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71944/
---

(Updated March 11, 2020, 5:40 p.m.)


Review request for mesos, Andrei Budnik and Greg Mann.


Changes
---

Addressed review comments.


Bugs: MESOS-10048
https://issues.apache.org/jira/browse/MESOS-10048


Repository: mesos


Description
---

Set container process's OOM score adjust.


Diffs (updated)
-

  src/slave/containerizer/mesos/isolators/cgroups/cgroups.cpp 
b12b73d8e0161d448075378765e77867521de04e 
  src/slave/containerizer/mesos/isolators/cgroups/subsystem.hpp 
a311ab4495f71bedacd2e99c84c765f0e5fe99d3 
  src/slave/containerizer/mesos/isolators/cgroups/subsystem.cpp 
dc6c7aa1c998c30c8b17db04a38e7a1e28a6a6c1 
  src/slave/containerizer/mesos/isolators/cgroups/subsystems/devices.hpp 
c62deec4b1cd749dba5fe71b901e0353806a0805 
  src/slave/containerizer/mesos/isolators/cgroups/subsystems/devices.cpp 
ac2e66b570bb84b43c4a3e8f19b40e5fcea71a4a 
  src/slave/containerizer/mesos/isolators/cgroups/subsystems/memory.hpp 
27d88e91fb784179effd54781f84000fe85c13eb 
  src/slave/containerizer/mesos/isolators/cgroups/subsystems/memory.cpp 
0896d37761a11f55ba4b866d235c3bd2b79dcfba 
  src/slave/containerizer/mesos/isolators/cgroups/subsystems/net_cls.hpp 
06531072f445d4ec978ebaf5ec5e4a2475517d05 
  src/slave/containerizer/mesos/isolators/cgroups/subsystems/net_cls.cpp 
ec2ce67e54387f26aa11c00d4c7f85f0807a127b 
  src/slave/containerizer/mesos/isolators/cgroups/subsystems/perf_event.hpp 
2c865aca35084a5db567b5f95c8c57bb6e1d5634 
  src/slave/containerizer/mesos/isolators/cgroups/subsystems/perf_event.cpp 
180afc936798c2fa4de0deef080276cf7cc94199 
  src/slave/containerizer/mesos/utils.hpp 
bfd07e28c78fc140e395ffccd11d65545bf007fc 
  src/slave/containerizer/mesos/utils.cpp 
d9964f0b0e8ad8a41e5f5490a1c28bba9a972ae0 


Diff: https://reviews.apache.org/r/71944/diff/6/

Changes: https://reviews.apache.org/r/71944/diff/5-6/


Testing
---

sudo make check


Thanks,

Qian Zhang



Re: Review Request 71944: Set container process's OOM score adjust.

2020-03-02 Thread Greg Mann

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71944/#review219718
---




src/slave/containerizer/mesos/isolators/cgroups/subsystems/memory.cpp
Lines 222 (patched)


Why don't we want to set the OOM score adjustment for custom executor 
containers?



src/slave/containerizer/mesos/isolators/cgroups/subsystems/memory.cpp
Lines 233-244 (patched)


It looks like you could get rid of this conditional for `if 
(oomScoreAdj.isSome())`, make `oomScoreAdj` a raw integer, and do all of this 
work within the conditional directly above this?


- Greg Mann


On Feb. 26, 2020, 12:19 p.m., Qian Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71944/
> ---
> 
> (Updated Feb. 26, 2020, 12:19 p.m.)
> 
> 
> Review request for mesos, Andrei Budnik and Greg Mann.
> 
> 
> Bugs: MESOS-10048
> https://issues.apache.org/jira/browse/MESOS-10048
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Set container process's OOM score adjust.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/mesos/isolators/cgroups/cgroups.cpp 
> b12b73d8e0161d448075378765e77867521de04e 
>   src/slave/containerizer/mesos/isolators/cgroups/subsystem.hpp 
> a311ab4495f71bedacd2e99c84c765f0e5fe99d3 
>   src/slave/containerizer/mesos/isolators/cgroups/subsystem.cpp 
> dc6c7aa1c998c30c8b17db04a38e7a1e28a6a6c1 
>   src/slave/containerizer/mesos/isolators/cgroups/subsystems/devices.hpp 
> c62deec4b1cd749dba5fe71b901e0353806a0805 
>   src/slave/containerizer/mesos/isolators/cgroups/subsystems/devices.cpp 
> ac2e66b570bb84b43c4a3e8f19b40e5fcea71a4a 
>   src/slave/containerizer/mesos/isolators/cgroups/subsystems/memory.hpp 
> 27d88e91fb784179effd54781f84000fe85c13eb 
>   src/slave/containerizer/mesos/isolators/cgroups/subsystems/memory.cpp 
> 0896d37761a11f55ba4b866d235c3bd2b79dcfba 
>   src/slave/containerizer/mesos/isolators/cgroups/subsystems/net_cls.hpp 
> 06531072f445d4ec978ebaf5ec5e4a2475517d05 
>   src/slave/containerizer/mesos/isolators/cgroups/subsystems/net_cls.cpp 
> ec2ce67e54387f26aa11c00d4c7f85f0807a127b 
>   src/slave/containerizer/mesos/isolators/cgroups/subsystems/perf_event.hpp 
> 2c865aca35084a5db567b5f95c8c57bb6e1d5634 
>   src/slave/containerizer/mesos/isolators/cgroups/subsystems/perf_event.cpp 
> 180afc936798c2fa4de0deef080276cf7cc94199 
> 
> 
> Diff: https://reviews.apache.org/r/71944/diff/5/
> 
> 
> Testing
> ---
> 
> sudo make check
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>



Re: Review Request 71944: Set container process's OOM score adjust.

2020-02-26 Thread Qian Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71944/
---

(Updated Feb. 26, 2020, 8:19 p.m.)


Review request for mesos, Andrei Budnik and Greg Mann.


Changes
---

Changed `totalMem` from member varaible to a static variable.


Bugs: MESOS-10048
https://issues.apache.org/jira/browse/MESOS-10048


Repository: mesos


Description
---

Set container process's OOM score adjust.


Diffs (updated)
-

  src/slave/containerizer/mesos/isolators/cgroups/cgroups.cpp 
b12b73d8e0161d448075378765e77867521de04e 
  src/slave/containerizer/mesos/isolators/cgroups/subsystem.hpp 
a311ab4495f71bedacd2e99c84c765f0e5fe99d3 
  src/slave/containerizer/mesos/isolators/cgroups/subsystem.cpp 
dc6c7aa1c998c30c8b17db04a38e7a1e28a6a6c1 
  src/slave/containerizer/mesos/isolators/cgroups/subsystems/devices.hpp 
c62deec4b1cd749dba5fe71b901e0353806a0805 
  src/slave/containerizer/mesos/isolators/cgroups/subsystems/devices.cpp 
ac2e66b570bb84b43c4a3e8f19b40e5fcea71a4a 
  src/slave/containerizer/mesos/isolators/cgroups/subsystems/memory.hpp 
27d88e91fb784179effd54781f84000fe85c13eb 
  src/slave/containerizer/mesos/isolators/cgroups/subsystems/memory.cpp 
0896d37761a11f55ba4b866d235c3bd2b79dcfba 
  src/slave/containerizer/mesos/isolators/cgroups/subsystems/net_cls.hpp 
06531072f445d4ec978ebaf5ec5e4a2475517d05 
  src/slave/containerizer/mesos/isolators/cgroups/subsystems/net_cls.cpp 
ec2ce67e54387f26aa11c00d4c7f85f0807a127b 
  src/slave/containerizer/mesos/isolators/cgroups/subsystems/perf_event.hpp 
2c865aca35084a5db567b5f95c8c57bb6e1d5634 
  src/slave/containerizer/mesos/isolators/cgroups/subsystems/perf_event.cpp 
180afc936798c2fa4de0deef080276cf7cc94199 


Diff: https://reviews.apache.org/r/71944/diff/5/

Changes: https://reviews.apache.org/r/71944/diff/4-5/


Testing
---

sudo make check


Thanks,

Qian Zhang



Re: Review Request 71944: Set container process's OOM score adjust.

2020-01-15 Thread Qian Zhang


> On Jan. 8, 2020, 7:07 a.m., Greg Mann wrote:
> > src/slave/containerizer/mesos/isolators/cgroups/subsystems/memory.cpp
> > Lines 199 (patched)
> > 
> >
> > Do we really want to do this? My concern is that this will make any 
> > non-Mesos-task processes on the node (networking and security components, 
> > for example) more likely to be OOM-killed than Mesos tasks. Perhaps we 
> > should only set the OOM score adjustment for burstable tasks. What do you 
> > think?
> 
> Qian Zhang wrote:
> I think it depends on which one is in higher priority and more important, 
> guaranteed task or non-Mesos-task processes? In Kubernetes implementation 
> (https://github.com/kubernetes/kubernetes/blob/v1.16.2/pkg/kubelet/qos/policy.go#L51:L53),
>  the OOM score adjust of guaranteed container is set to -998, and kubelet's 
> OOM score adjust is set to -998 too, I think we should do the same to protect 
> guaranteed containers and Mesos agent, what do you think?
> 
> Greg Mann wrote:
> One significant difference in the Kubernetes case is that they have 
> user-space code which kills pod processes to reclaim memory when necessary. 
> Consequently, there will be less impact if the OOM killer shows a strong 
> preference against killing guaranteed tasks.
> 
> My intuition is that we should not set the OOM score adjustment for 
> non-bursting processes. Even if we leave it at zero, guaranteed tasks will 
> still be treated preferentially with respect to bursting tasks, since all 
> bursting tasks will have an adjustment greater than zero.
> 
> Qian Zhang wrote:
> I agree that guaranteed tasks will be treated preferentially with respect 
> to bursting tasks, but I am thinking about the guaranteed tasks v.s. the 
> non-Mesos-tasks, let's say two guaranteed tasks running on a node, and each 
> of them's memory request/limit is half of the node's memory, and both of them 
> has almost used all of its memory request/limit, so their OOM score will be 
> very high (like 490+). Now if a non-mesos-task (e.g., a system component or 
> even Mesos agent itself) tries to use a lot of memory suddenly, the node will 
> be short of memory, and then OOM killer will definitely kill one of the two 
> guaranteed tasks since their OOM score are the top 2 in the node. But I do 
> not think K8s will have this issue since the guaranteed containers OOM score 
> adjust is -998.
> 
> Qian Zhang wrote:
> And even we think about the case guaranteed tasks v.s. burstable tasks, I 
> think it is also a bit risky if we leave guaranteed task's OOM score adjust 
> to 0. For example, one guaranteed task (T1) and one burstable task (T2) 
> running on a node, each of them's memory request is half of the node's 
> memory. T1 has almost used all of its memory request/limit, so its OOM score 
> will be something like 490+. T2 uses very little memory, so its OOM score 
> will be a bit beyond 500 (like 510). I think in this case the OOM scores of 
> T1 and T2 are too close, the actual OOM score is calculated in a more complex 
> way, so I am afraid there will be a moment that the OOM score of T1 is even 
> higher than T2, that's why I think it is a bit risky.
> 
> Qian Zhang wrote:
> Just want to add one point, it seems a small amount (30) will be 
> subtracted from the OOM score of root-owned processes, so in the above 
> example, if T2 is owned by root but T1 is owned by a normal user, it might be 
> possible T2 get a smaller OOM score that T1.
> 
> Greg Mann wrote:
> In the case above of tasks T1 and T2, I don't think we need to provide 
> any guarantee of which process is killed first in this case. If neither task 
> is above its memory request, then I think it's OK for the OOM killer to 
> decide which one is killed first. The resource limits feature doesn't add a 
> notion of priority, like "guaranteed" vs. "burstable", I think we just want 
> to make sure that tasks which have exceeded their memory request are killed 
> preferentially. So I think it's OK to leave the OOM score adjustment of 
> non-burstable tasks at zero.

Second thought, I agree with you that we leave the OOM score adjustment of 
non-burstable tasks at zero for backward compatibility.


- Qian


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71944/#review219158
---


On Jan. 15, 2020, 10:20 p.m., Qian Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71944/
> ---
> 
> (Updated Jan. 15, 2020, 10:20 p.m.)
> 
> 
> Review request for mesos, Andrei Budnik and Greg Mann.
> 
> 
> Bugs: MESOS-10048
> https://issues.apache.org/jira/browse/MESOS-10048
> 
> 

Re: Review Request 71944: Set container process's OOM score adjust.

2020-01-15 Thread Qian Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71944/
---

(Updated Jan. 15, 2020, 10:20 p.m.)


Review request for mesos, Andrei Budnik and Greg Mann.


Changes
---

Addressed review comments.


Bugs: MESOS-10048
https://issues.apache.org/jira/browse/MESOS-10048


Repository: mesos


Description
---

Set container process's OOM score adjust.


Diffs (updated)
-

  src/slave/containerizer/mesos/isolators/cgroups/cgroups.cpp 
b12b73d8e0161d448075378765e77867521de04e 
  src/slave/containerizer/mesos/isolators/cgroups/subsystem.hpp 
a311ab4495f71bedacd2e99c84c765f0e5fe99d3 
  src/slave/containerizer/mesos/isolators/cgroups/subsystem.cpp 
dc6c7aa1c998c30c8b17db04a38e7a1e28a6a6c1 
  src/slave/containerizer/mesos/isolators/cgroups/subsystems/devices.hpp 
c62deec4b1cd749dba5fe71b901e0353806a0805 
  src/slave/containerizer/mesos/isolators/cgroups/subsystems/devices.cpp 
ac2e66b570bb84b43c4a3e8f19b40e5fcea71a4a 
  src/slave/containerizer/mesos/isolators/cgroups/subsystems/memory.hpp 
27d88e91fb784179effd54781f84000fe85c13eb 
  src/slave/containerizer/mesos/isolators/cgroups/subsystems/memory.cpp 
0896d37761a11f55ba4b866d235c3bd2b79dcfba 
  src/slave/containerizer/mesos/isolators/cgroups/subsystems/net_cls.hpp 
06531072f445d4ec978ebaf5ec5e4a2475517d05 
  src/slave/containerizer/mesos/isolators/cgroups/subsystems/net_cls.cpp 
ec2ce67e54387f26aa11c00d4c7f85f0807a127b 
  src/slave/containerizer/mesos/isolators/cgroups/subsystems/perf_event.hpp 
2c865aca35084a5db567b5f95c8c57bb6e1d5634 
  src/slave/containerizer/mesos/isolators/cgroups/subsystems/perf_event.cpp 
180afc936798c2fa4de0deef080276cf7cc94199 


Diff: https://reviews.apache.org/r/71944/diff/4/

Changes: https://reviews.apache.org/r/71944/diff/3-4/


Testing
---

sudo make check


Thanks,

Qian Zhang



Re: Review Request 71944: Set container process's OOM score adjust.

2020-01-13 Thread Greg Mann


> On Jan. 7, 2020, 11:07 p.m., Greg Mann wrote:
> > src/slave/containerizer/mesos/isolators/cgroups/subsystems/memory.cpp
> > Lines 199 (patched)
> > 
> >
> > Do we really want to do this? My concern is that this will make any 
> > non-Mesos-task processes on the node (networking and security components, 
> > for example) more likely to be OOM-killed than Mesos tasks. Perhaps we 
> > should only set the OOM score adjustment for burstable tasks. What do you 
> > think?
> 
> Qian Zhang wrote:
> I think it depends on which one is in higher priority and more important, 
> guaranteed task or non-Mesos-task processes? In Kubernetes implementation 
> (https://github.com/kubernetes/kubernetes/blob/v1.16.2/pkg/kubelet/qos/policy.go#L51:L53),
>  the OOM score adjust of guaranteed container is set to -998, and kubelet's 
> OOM score adjust is set to -998 too, I think we should do the same to protect 
> guaranteed containers and Mesos agent, what do you think?
> 
> Greg Mann wrote:
> One significant difference in the Kubernetes case is that they have 
> user-space code which kills pod processes to reclaim memory when necessary. 
> Consequently, there will be less impact if the OOM killer shows a strong 
> preference against killing guaranteed tasks.
> 
> My intuition is that we should not set the OOM score adjustment for 
> non-bursting processes. Even if we leave it at zero, guaranteed tasks will 
> still be treated preferentially with respect to bursting tasks, since all 
> bursting tasks will have an adjustment greater than zero.
> 
> Qian Zhang wrote:
> I agree that guaranteed tasks will be treated preferentially with respect 
> to bursting tasks, but I am thinking about the guaranteed tasks v.s. the 
> non-Mesos-tasks, let's say two guaranteed tasks running on a node, and each 
> of them's memory request/limit is half of the node's memory, and both of them 
> has almost used all of its memory request/limit, so their OOM score will be 
> very high (like 490+). Now if a non-mesos-task (e.g., a system component or 
> even Mesos agent itself) tries to use a lot of memory suddenly, the node will 
> be short of memory, and then OOM killer will definitely kill one of the two 
> guaranteed tasks since their OOM score are the top 2 in the node. But I do 
> not think K8s will have this issue since the guaranteed containers OOM score 
> adjust is -998.
> 
> Qian Zhang wrote:
> And even we think about the case guaranteed tasks v.s. burstable tasks, I 
> think it is also a bit risky if we leave guaranteed task's OOM score adjust 
> to 0. For example, one guaranteed task (T1) and one burstable task (T2) 
> running on a node, each of them's memory request is half of the node's 
> memory. T1 has almost used all of its memory request/limit, so its OOM score 
> will be something like 490+. T2 uses very little memory, so its OOM score 
> will be a bit beyond 500 (like 510). I think in this case the OOM scores of 
> T1 and T2 are too close, the actual OOM score is calculated in a more complex 
> way, so I am afraid there will be a moment that the OOM score of T1 is even 
> higher than T2, that's why I think it is a bit risky.
> 
> Qian Zhang wrote:
> Just want to add one point, it seems a small amount (30) will be 
> subtracted from the OOM score of root-owned processes, so in the above 
> example, if T2 is owned by root but T1 is owned by a normal user, it might be 
> possible T2 get a smaller OOM score that T1.

In the case above of tasks T1 and T2, I don't think we need to provide any 
guarantee of which process is killed first in this case. If neither task is 
above its memory request, then I think it's OK for the OOM killer to decide 
which one is killed first. The resource limits feature doesn't add a notion of 
priority, like "guaranteed" vs. "burstable", I think we just want to make sure 
that tasks which have exceeded their memory request are killed preferentially. 
So I think it's OK to leave the OOM score adjustment of non-burstable tasks at 
zero.


- Greg


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71944/#review219158
---


On Jan. 8, 2020, 3:28 p.m., Qian Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71944/
> ---
> 
> (Updated Jan. 8, 2020, 3:28 p.m.)
> 
> 
> Review request for mesos, Andrei Budnik and Greg Mann.
> 
> 
> Bugs: MESOS-10048
> https://issues.apache.org/jira/browse/MESOS-10048
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Set container process's OOM score adjust.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/mesos/isolators/cgroups/sub

Re: Review Request 71944: Set container process's OOM score adjust.

2020-01-09 Thread Qian Zhang


> On Jan. 8, 2020, 7:07 a.m., Greg Mann wrote:
> > src/slave/containerizer/mesos/isolators/cgroups/subsystems/memory.cpp
> > Lines 199 (patched)
> > 
> >
> > Do we really want to do this? My concern is that this will make any 
> > non-Mesos-task processes on the node (networking and security components, 
> > for example) more likely to be OOM-killed than Mesos tasks. Perhaps we 
> > should only set the OOM score adjustment for burstable tasks. What do you 
> > think?
> 
> Qian Zhang wrote:
> I think it depends on which one is in higher priority and more important, 
> guaranteed task or non-Mesos-task processes? In Kubernetes implementation 
> (https://github.com/kubernetes/kubernetes/blob/v1.16.2/pkg/kubelet/qos/policy.go#L51:L53),
>  the OOM score adjust of guaranteed container is set to -998, and kubelet's 
> OOM score adjust is set to -998 too, I think we should do the same to protect 
> guaranteed containers and Mesos agent, what do you think?
> 
> Greg Mann wrote:
> One significant difference in the Kubernetes case is that they have 
> user-space code which kills pod processes to reclaim memory when necessary. 
> Consequently, there will be less impact if the OOM killer shows a strong 
> preference against killing guaranteed tasks.
> 
> My intuition is that we should not set the OOM score adjustment for 
> non-bursting processes. Even if we leave it at zero, guaranteed tasks will 
> still be treated preferentially with respect to bursting tasks, since all 
> bursting tasks will have an adjustment greater than zero.
> 
> Qian Zhang wrote:
> I agree that guaranteed tasks will be treated preferentially with respect 
> to bursting tasks, but I am thinking about the guaranteed tasks v.s. the 
> non-Mesos-tasks, let's say two guaranteed tasks running on a node, and each 
> of them's memory request/limit is half of the node's memory, and both of them 
> has almost used all of its memory request/limit, so their OOM score will be 
> very high (like 490+). Now if a non-mesos-task (e.g., a system component or 
> even Mesos agent itself) tries to use a lot of memory suddenly, the node will 
> be short of memory, and then OOM killer will definitely kill one of the two 
> guaranteed tasks since their OOM score are the top 2 in the node. But I do 
> not think K8s will have this issue since the guaranteed containers OOM score 
> adjust is -998.
> 
> Qian Zhang wrote:
> And even we think about the case guaranteed tasks v.s. burstable tasks, I 
> think it is also a bit risky if we leave guaranteed task's OOM score adjust 
> to 0. For example, one guaranteed task (T1) and one burstable task (T2) 
> running on a node, each of them's memory request is half of the node's 
> memory. T1 has almost used all of its memory request/limit, so its OOM score 
> will be something like 490+. T2 uses very little memory, so its OOM score 
> will be a bit beyond 500 (like 510). I think in this case the OOM scores of 
> T1 and T2 are too close, the actual OOM score is calculated in a more complex 
> way, so I am afraid there will be a moment that the OOM score of T1 is even 
> higher than T2, that's why I think it is a bit risky.

Just want to add one point, it seems a small amount (30) will be subtracted 
from the OOM score of root-owned processes, so in the above example, if T2 is 
owned by root but T1 is owned by a normal user, it might be possible T2 get a 
smaller OOM score that T1.


- Qian


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71944/#review219158
---


On Jan. 8, 2020, 11:28 p.m., Qian Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71944/
> ---
> 
> (Updated Jan. 8, 2020, 11:28 p.m.)
> 
> 
> Review request for mesos, Andrei Budnik and Greg Mann.
> 
> 
> Bugs: MESOS-10048
> https://issues.apache.org/jira/browse/MESOS-10048
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Set container process's OOM score adjust.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/mesos/isolators/cgroups/subsystems/memory.hpp 
> 27d88e91fb784179effd54781f84000fe85c13eb 
>   src/slave/containerizer/mesos/isolators/cgroups/subsystems/memory.cpp 
> 0896d37761a11f55ba4b866d235c3bd2b79dcfba 
> 
> 
> Diff: https://reviews.apache.org/r/71944/diff/3/
> 
> 
> Testing
> ---
> 
> sudo make check
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>



Re: Review Request 71944: Set container process's OOM score adjust.

2020-01-09 Thread Qian Zhang


> On Jan. 8, 2020, 7:07 a.m., Greg Mann wrote:
> > src/slave/containerizer/mesos/isolators/cgroups/subsystems/memory.cpp
> > Lines 199 (patched)
> > 
> >
> > Do we really want to do this? My concern is that this will make any 
> > non-Mesos-task processes on the node (networking and security components, 
> > for example) more likely to be OOM-killed than Mesos tasks. Perhaps we 
> > should only set the OOM score adjustment for burstable tasks. What do you 
> > think?
> 
> Qian Zhang wrote:
> I think it depends on which one is in higher priority and more important, 
> guaranteed task or non-Mesos-task processes? In Kubernetes implementation 
> (https://github.com/kubernetes/kubernetes/blob/v1.16.2/pkg/kubelet/qos/policy.go#L51:L53),
>  the OOM score adjust of guaranteed container is set to -998, and kubelet's 
> OOM score adjust is set to -998 too, I think we should do the same to protect 
> guaranteed containers and Mesos agent, what do you think?
> 
> Greg Mann wrote:
> One significant difference in the Kubernetes case is that they have 
> user-space code which kills pod processes to reclaim memory when necessary. 
> Consequently, there will be less impact if the OOM killer shows a strong 
> preference against killing guaranteed tasks.
> 
> My intuition is that we should not set the OOM score adjustment for 
> non-bursting processes. Even if we leave it at zero, guaranteed tasks will 
> still be treated preferentially with respect to bursting tasks, since all 
> bursting tasks will have an adjustment greater than zero.
> 
> Qian Zhang wrote:
> I agree that guaranteed tasks will be treated preferentially with respect 
> to bursting tasks, but I am thinking about the guaranteed tasks v.s. the 
> non-Mesos-tasks, let's say two guaranteed tasks running on a node, and each 
> of them's memory request/limit is half of the node's memory, and both of them 
> has almost used all of its memory request/limit, so their OOM score will be 
> very high (like 490+). Now if a non-mesos-task (e.g., a system component or 
> even Mesos agent itself) tries to use a lot of memory suddenly, the node will 
> be short of memory, and then OOM killer will definitely kill one of the two 
> guaranteed tasks since their OOM score are the top 2 in the node. But I do 
> not think K8s will have this issue since the guaranteed containers OOM score 
> adjust is -998.

And even we think about the case guaranteed tasks v.s. burstable tasks, I think 
it is also a bit risky if we leave guaranteed task's OOM score adjust to 0. For 
example, one guaranteed task (T1) and one burstable task (T2) running on a 
node, each of them's memory request is half of the node's memory. T1 has almost 
used all of its memory request/limit, so its OOM score will be something like 
490+. T2 uses very little memory, so its OOM score will be a bit beyond 500 
(like 510). I think in this case the OOM scores of T1 and T2 are too close, the 
actual OOM score is calculated in a more complex way, so I am afraid there will 
be a moment that the OOM score of T1 is even higher than T2, that's why I think 
it is a bit risky.


- Qian


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71944/#review219158
---


On Jan. 8, 2020, 11:28 p.m., Qian Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71944/
> ---
> 
> (Updated Jan. 8, 2020, 11:28 p.m.)
> 
> 
> Review request for mesos, Andrei Budnik and Greg Mann.
> 
> 
> Bugs: MESOS-10048
> https://issues.apache.org/jira/browse/MESOS-10048
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Set container process's OOM score adjust.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/mesos/isolators/cgroups/subsystems/memory.hpp 
> 27d88e91fb784179effd54781f84000fe85c13eb 
>   src/slave/containerizer/mesos/isolators/cgroups/subsystems/memory.cpp 
> 0896d37761a11f55ba4b866d235c3bd2b79dcfba 
> 
> 
> Diff: https://reviews.apache.org/r/71944/diff/3/
> 
> 
> Testing
> ---
> 
> sudo make check
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>



Re: Review Request 71944: Set container process's OOM score adjust.

2020-01-09 Thread Qian Zhang


> On Jan. 8, 2020, 7:07 a.m., Greg Mann wrote:
> > src/slave/containerizer/mesos/isolators/cgroups/subsystems/memory.cpp
> > Lines 199 (patched)
> > 
> >
> > Do we really want to do this? My concern is that this will make any 
> > non-Mesos-task processes on the node (networking and security components, 
> > for example) more likely to be OOM-killed than Mesos tasks. Perhaps we 
> > should only set the OOM score adjustment for burstable tasks. What do you 
> > think?
> 
> Qian Zhang wrote:
> I think it depends on which one is in higher priority and more important, 
> guaranteed task or non-Mesos-task processes? In Kubernetes implementation 
> (https://github.com/kubernetes/kubernetes/blob/v1.16.2/pkg/kubelet/qos/policy.go#L51:L53),
>  the OOM score adjust of guaranteed container is set to -998, and kubelet's 
> OOM score adjust is set to -998 too, I think we should do the same to protect 
> guaranteed containers and Mesos agent, what do you think?
> 
> Greg Mann wrote:
> One significant difference in the Kubernetes case is that they have 
> user-space code which kills pod processes to reclaim memory when necessary. 
> Consequently, there will be less impact if the OOM killer shows a strong 
> preference against killing guaranteed tasks.
> 
> My intuition is that we should not set the OOM score adjustment for 
> non-bursting processes. Even if we leave it at zero, guaranteed tasks will 
> still be treated preferentially with respect to bursting tasks, since all 
> bursting tasks will have an adjustment greater than zero.

I agree that guaranteed tasks will be treated preferentially with respect to 
bursting tasks, but I am thinking about the guaranteed tasks v.s. the 
non-Mesos-tasks, let's say two guaranteed tasks running on a node, and each of 
them's memory request/limit is half of the node's memory, and both of them has 
almost used all of its memory request/limit, so their OOM score will be very 
high (like 490+). Now if a non-mesos-task (e.g., a system component or even 
Mesos agent itself) tries to use a lot of memory suddenly, the node will be 
short of memory, and then OOM killer will definitely kill one of the two 
guaranteed tasks since their OOM score are the top 2 in the node. But I do not 
think K8s will have this issue since the guaranteed containers OOM score adjust 
is -998.


- Qian


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71944/#review219158
---


On Jan. 8, 2020, 11:28 p.m., Qian Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71944/
> ---
> 
> (Updated Jan. 8, 2020, 11:28 p.m.)
> 
> 
> Review request for mesos, Andrei Budnik and Greg Mann.
> 
> 
> Bugs: MESOS-10048
> https://issues.apache.org/jira/browse/MESOS-10048
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Set container process's OOM score adjust.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/mesos/isolators/cgroups/subsystems/memory.hpp 
> 27d88e91fb784179effd54781f84000fe85c13eb 
>   src/slave/containerizer/mesos/isolators/cgroups/subsystems/memory.cpp 
> 0896d37761a11f55ba4b866d235c3bd2b79dcfba 
> 
> 
> Diff: https://reviews.apache.org/r/71944/diff/3/
> 
> 
> Testing
> ---
> 
> sudo make check
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>



Re: Review Request 71944: Set container process's OOM score adjust.

2020-01-09 Thread Greg Mann


> On Jan. 7, 2020, 11:07 p.m., Greg Mann wrote:
> > src/slave/containerizer/mesos/isolators/cgroups/subsystems/memory.cpp
> > Lines 199 (patched)
> > 
> >
> > Do we really want to do this? My concern is that this will make any 
> > non-Mesos-task processes on the node (networking and security components, 
> > for example) more likely to be OOM-killed than Mesos tasks. Perhaps we 
> > should only set the OOM score adjustment for burstable tasks. What do you 
> > think?
> 
> Qian Zhang wrote:
> I think it depends on which one is in higher priority and more important, 
> guaranteed task or non-Mesos-task processes? In Kubernetes implementation 
> (https://github.com/kubernetes/kubernetes/blob/v1.16.2/pkg/kubelet/qos/policy.go#L51:L53),
>  the OOM score adjust of guaranteed container is set to -998, and kubelet's 
> OOM score adjust is set to -998 too, I think we should do the same to protect 
> guaranteed containers and Mesos agent, what do you think?

One significant difference in the Kubernetes case is that they have user-space 
code which kills pod processes to reclaim memory when necessary. Consequently, 
there will be less impact if the OOM killer shows a strong preference against 
killing guaranteed tasks.

My intuition is that we should not set the OOM score adjustment for 
non-bursting processes. Even if we leave it at zero, guaranteed tasks will 
still be treated preferentially with respect to bursting tasks, since all 
bursting tasks will have an adjustment greater than zero.


- Greg


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71944/#review219158
---


On Jan. 8, 2020, 3:28 p.m., Qian Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71944/
> ---
> 
> (Updated Jan. 8, 2020, 3:28 p.m.)
> 
> 
> Review request for mesos, Andrei Budnik and Greg Mann.
> 
> 
> Bugs: MESOS-10048
> https://issues.apache.org/jira/browse/MESOS-10048
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Set container process's OOM score adjust.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/mesos/isolators/cgroups/subsystems/memory.hpp 
> 27d88e91fb784179effd54781f84000fe85c13eb 
>   src/slave/containerizer/mesos/isolators/cgroups/subsystems/memory.cpp 
> 0896d37761a11f55ba4b866d235c3bd2b79dcfba 
> 
> 
> Diff: https://reviews.apache.org/r/71944/diff/3/
> 
> 
> Testing
> ---
> 
> sudo make check
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>



Re: Review Request 71944: Set container process's OOM score adjust.

2020-01-08 Thread Qian Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71944/#review219189
---




src/slave/containerizer/mesos/isolators/cgroups/subsystems/memory.cpp
Lines 202-207 (patched)


Unfortunately, this may set the OOM score adjust of default executor to a 
very big value (e.g., 999) if the task to be launched by the default executor 
has a small memory request. This is too risky, we'd better to set default 
executor's OOM score adjust to something like -998 to protect it.


- Qian Zhang


On Jan. 8, 2020, 11:28 p.m., Qian Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71944/
> ---
> 
> (Updated Jan. 8, 2020, 11:28 p.m.)
> 
> 
> Review request for mesos, Andrei Budnik and Greg Mann.
> 
> 
> Bugs: MESOS-10048
> https://issues.apache.org/jira/browse/MESOS-10048
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Set container process's OOM score adjust.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/mesos/isolators/cgroups/subsystems/memory.hpp 
> 27d88e91fb784179effd54781f84000fe85c13eb 
>   src/slave/containerizer/mesos/isolators/cgroups/subsystems/memory.cpp 
> 0896d37761a11f55ba4b866d235c3bd2b79dcfba 
> 
> 
> Diff: https://reviews.apache.org/r/71944/diff/3/
> 
> 
> Testing
> ---
> 
> sudo make check
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>



Re: Review Request 71944: Set container process's OOM score adjust.

2020-01-08 Thread Qian Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71944/
---

(Updated Jan. 8, 2020, 11:28 p.m.)


Review request for mesos, Andrei Budnik and Greg Mann.


Changes
---

Addressed review comments.


Bugs: MESOS-10048
https://issues.apache.org/jira/browse/MESOS-10048


Repository: mesos


Description
---

Set container process's OOM score adjust.


Diffs (updated)
-

  src/slave/containerizer/mesos/isolators/cgroups/subsystems/memory.hpp 
27d88e91fb784179effd54781f84000fe85c13eb 
  src/slave/containerizer/mesos/isolators/cgroups/subsystems/memory.cpp 
0896d37761a11f55ba4b866d235c3bd2b79dcfba 


Diff: https://reviews.apache.org/r/71944/diff/3/

Changes: https://reviews.apache.org/r/71944/diff/2-3/


Testing
---

sudo make check


Thanks,

Qian Zhang



Re: Review Request 71944: Set container process's OOM score adjust.

2020-01-08 Thread Qian Zhang


> On Jan. 8, 2020, 7:07 a.m., Greg Mann wrote:
> > src/slave/containerizer/mesos/isolators/cgroups/subsystems/memory.cpp
> > Lines 199 (patched)
> > 
> >
> > Do we really want to do this? My concern is that this will make any 
> > non-Mesos-task processes on the node (networking and security components, 
> > for example) more likely to be OOM-killed than Mesos tasks. Perhaps we 
> > should only set the OOM score adjustment for burstable tasks. What do you 
> > think?

I think it depends on which one is in higher priority and more important, 
guaranteed task or non-Mesos-task processes? In Kubernetes implementation 
(https://github.com/kubernetes/kubernetes/blob/v1.16.2/pkg/kubelet/qos/policy.go#L51:L53),
 the OOM score adjust of guaranteed container is set to -998, and kubelet's OOM 
score adjust is set to -998 too, I think we should do the same to protect 
guaranteed containers and Mesos agent, what do you think?


- Qian


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71944/#review219158
---


On Jan. 5, 2020, 10:06 p.m., Qian Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71944/
> ---
> 
> (Updated Jan. 5, 2020, 10:06 p.m.)
> 
> 
> Review request for mesos, Andrei Budnik and Greg Mann.
> 
> 
> Bugs: MESOS-10048
> https://issues.apache.org/jira/browse/MESOS-10048
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Set container process's OOM score adjust.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/mesos/isolators/cgroups/subsystems/memory.hpp 
> 27d88e91fb784179effd54781f84000fe85c13eb 
>   src/slave/containerizer/mesos/isolators/cgroups/subsystems/memory.cpp 
> 0896d37761a11f55ba4b866d235c3bd2b79dcfba 
> 
> 
> Diff: https://reviews.apache.org/r/71944/diff/2/
> 
> 
> Testing
> ---
> 
> sudo make check
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>



Re: Review Request 71944: Set container process's OOM score adjust.

2020-01-07 Thread Greg Mann

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71944/#review219158
---




src/slave/containerizer/mesos/isolators/cgroups/subsystems/memory.cpp
Lines 192-195 (patched)


This comment simplifies the OOM score a bit; adjustments are also made 
based on how long the process has been running and which capabilities it 
posesses, for example. Could you update the comment to explain?

Maybe something like "While the OOM score of a process is a complex 
function of the process state and configuration, a decent approximation of the 
OOM score is 10 x percent of memory used..."



src/slave/containerizer/mesos/isolators/cgroups/subsystems/memory.cpp
Lines 199 (patched)


Do we really want to do this? My concern is that this will make any 
non-Mesos-task processes on the node (networking and security components, for 
example) more likely to be OOM-killed than Mesos tasks. Perhaps we should only 
set the OOM score adjustment for burstable tasks. What do you think?



src/slave/containerizer/mesos/isolators/cgroups/subsystems/memory.cpp
Lines 201-204 (patched)


I might recommend:

"We would like burstable containers which consume more memory than their 
memory request to be preferentially OOM-killed first. To accomplish this, we 
set their OOM score adjustment as shown below, which attempts to ensure that 
the container which consumes more memory than its memory request will have an 
OOM score of 1000."


- Greg Mann


On Jan. 5, 2020, 2:06 p.m., Qian Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71944/
> ---
> 
> (Updated Jan. 5, 2020, 2:06 p.m.)
> 
> 
> Review request for mesos, Andrei Budnik and Greg Mann.
> 
> 
> Bugs: MESOS-10048
> https://issues.apache.org/jira/browse/MESOS-10048
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Set container process's OOM score adjust.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/mesos/isolators/cgroups/subsystems/memory.hpp 
> 27d88e91fb784179effd54781f84000fe85c13eb 
>   src/slave/containerizer/mesos/isolators/cgroups/subsystems/memory.cpp 
> 0896d37761a11f55ba4b866d235c3bd2b79dcfba 
> 
> 
> Diff: https://reviews.apache.org/r/71944/diff/2/
> 
> 
> Testing
> ---
> 
> sudo make check
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>



Re: Review Request 71944: Set container process's OOM score adjust.

2020-01-05 Thread Qian Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71944/
---

(Updated Jan. 5, 2020, 10:06 p.m.)


Review request for mesos, Andrei Budnik and Greg Mann.


Changes
---

Added a log message.


Bugs: MESOS-10048
https://issues.apache.org/jira/browse/MESOS-10048


Repository: mesos


Description
---

Set container process's OOM score adjust.


Diffs (updated)
-

  src/slave/containerizer/mesos/isolators/cgroups/subsystems/memory.hpp 
27d88e91fb784179effd54781f84000fe85c13eb 
  src/slave/containerizer/mesos/isolators/cgroups/subsystems/memory.cpp 
0896d37761a11f55ba4b866d235c3bd2b79dcfba 


Diff: https://reviews.apache.org/r/71944/diff/2/

Changes: https://reviews.apache.org/r/71944/diff/1-2/


Testing
---

sudo make check


Thanks,

Qian Zhang



Re: Review Request 71944: Set container process's OOM score adjust.

2020-01-01 Thread Mesos Reviewbot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71944/#review219110
---



Patch looks great!

Reviews applied: [71855, 71931, 71856, 71858, 71884, 71885, 71886, 71943, 71944]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' 
CONFIGURATION='--verbose --disable-libtool-wrappers 
--disable-parallel-test-execution' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; 
./support/docker-build.sh

- Mesos Reviewbot


On Jan. 2, 2020, 1:15 a.m., Qian Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71944/
> ---
> 
> (Updated Jan. 2, 2020, 1:15 a.m.)
> 
> 
> Review request for mesos, Andrei Budnik and Greg Mann.
> 
> 
> Bugs: MESOS-10048
> https://issues.apache.org/jira/browse/MESOS-10048
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Set container process's OOM score adjust.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/mesos/isolators/cgroups/subsystems/memory.hpp 
> 27d88e91fb784179effd54781f84000fe85c13eb 
>   src/slave/containerizer/mesos/isolators/cgroups/subsystems/memory.cpp 
> 0896d37761a11f55ba4b866d235c3bd2b79dcfba 
> 
> 
> Diff: https://reviews.apache.org/r/71944/diff/1/
> 
> 
> Testing
> ---
> 
> sudo make check
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>



Review Request 71944: Set container process's OOM score adjust.

2020-01-01 Thread Qian Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71944/
---

Review request for mesos, Andrei Budnik and Greg Mann.


Bugs: MESOS-10048
https://issues.apache.org/jira/browse/MESOS-10048


Repository: mesos


Description
---

Set container process's OOM score adjust.


Diffs
-

  src/slave/containerizer/mesos/isolators/cgroups/subsystems/memory.hpp 
27d88e91fb784179effd54781f84000fe85c13eb 
  src/slave/containerizer/mesos/isolators/cgroups/subsystems/memory.cpp 
0896d37761a11f55ba4b866d235c3bd2b79dcfba 


Diff: https://reviews.apache.org/r/71944/diff/1/


Testing
---

sudo make check


Thanks,

Qian Zhang