Re: [PATCH 4/4] oom: split out forced OOM killer
On Thu, 9 Jul 2015, Michal Hocko wrote: > > > The forced OOM killing is currently wired into out_of_memory() call > > > even though their objective is different which makes the code ugly > > > and harder to follow. Generic out_of_memory path has to deal with > > > configuration settings and heuristics which are completely irrelevant > > > to the forced OOM killer (e.g. sysctl_oom_kill_allocating_task or > > > OOM killer prevention for already dying tasks). All of them are > > > either relying on explicit force_kill check or indirectly by checking > > > current->mm which is always NULL for sysrq+f. This is not nice, hard > > > to follow and error prone. > > > > > > Let's pull forced OOM killer code out into a separate function > > > (force_out_of_memory) which is really trivial now. > > > As a bonus we can clearly state that this is a forced OOM killer > > > in the OOM message which is helpful to distinguish it from the > > > regular OOM killer. > > > > > > Signed-off-by: Michal Hocko > > > > It's really absurd that we have to go through this over and over and that > > your patches are actually being merged into -mm just because you don't get > > the point. > > > > We have no need for a force_out_of_memory() function. None whatsoever. > > The reasons are explained in the changelog and I do not see a single > argument against any of them. > We have a large number of checks in the oom killer to handle various circumstances. Those include different sysctl behaviors, different oom contexts (system, mempolicy, cpuset, memcg) to handle, behavior on concurrent exiting or killed processes, different calling context, etc. That doesn't mean they are deserving of individual functions that duplicate logic that add more and more lines of code. > > Keeping oc->force_kill around is just more pointless space on a very deep > > stack and I'm tired of fixing stack overflows. > > This just doesn't make any sense. oc->force_kill vs oc->order = > -1 replacement is completely independent on this patch and can be > implemented on top of it if you really insist. > You are introducing a separate function that duplicates logic to avoid adding two checks to existing conditionals. That's what I disagree with. > > I'm certainly not going to > > introduce others because you think it looks cleaner in the code when > > memory compaction does the exact same thing by using cc->order == -1 to > > mean explicit compaction. > > > > This is turning into a complete waste of time. > > You know what? I am tired of your complete immunity to any arguments and > the way how you are pushing more hacks into an already cluttered code. > I'm going to make one final comment on these constant reiterations of the same patchset and then move on. I simply don't have the time to continue to discuss stylistic differences: in this case, I disagree with you introducing a new function that duplicates logic elsewhere to avoid adding two checks in existing conditions. If we look at memory compaction, I see cc->order == -1 checks in four places. cc->order == -1 means compaction was triggered explicitly from the command line, just as oc->order == -1 in my patchset means the oom killer was triggered explicitly from sysrq. __compact_finished(): /* * order == -1 is expected when compacting via * /proc/sys/vm/compact_memory */ if (cc->order == -1) return COMPACT_CONTINUE; __compaction_suitable(): /* * order == -1 is expected when compacting via * /proc/sys/vm/compact_memory */ if (order == -1) return COMPACT_CONTINUE; __compact_pgdat(): /* * When called via /proc/sys/vm/compact_memory * this makes sure we compact the whole zone regardless of * cached scanner positions. */ if (cc->order == -1) __reset_isolation_suitable(zone); if (cc->order == -1 || !compaction_deferred(zone, cc->order)) compact_zone(zone, cc); We don't implement separate memory compaction scanners when triggered by the command line. We simply check, where appropriate, if this is a full compaction scan or not. In that case, cc->order doesn't matter since we aren't trying to allocate a page; this is the exact same as in my patchset since oc->order doesn't matter since we aren't concerned with the order of the failed page allocation. I have never had trouble following Mel's code when it comes to the Linux VM. I recognized this as an opportunity to remove data on the stack, which is always important for the page allocator and oom killer because it can be very deep, by doing the exact same thing. check_panic_on_oom(): /* Do not panic for oom kills triggered by sysrq */ if (oc->order == -1) return; and two changes to existing conditions
Re: [PATCH 4/4] oom: split out forced OOM killer
On Wed 08-07-15 16:41:23, David Rientjes wrote: > On Wed, 8 Jul 2015, Michal Hocko wrote: > > > From: Michal Hocko > > > > The forced OOM killing is currently wired into out_of_memory() call > > even though their objective is different which makes the code ugly > > and harder to follow. Generic out_of_memory path has to deal with > > configuration settings and heuristics which are completely irrelevant > > to the forced OOM killer (e.g. sysctl_oom_kill_allocating_task or > > OOM killer prevention for already dying tasks). All of them are > > either relying on explicit force_kill check or indirectly by checking > > current->mm which is always NULL for sysrq+f. This is not nice, hard > > to follow and error prone. > > > > Let's pull forced OOM killer code out into a separate function > > (force_out_of_memory) which is really trivial now. > > As a bonus we can clearly state that this is a forced OOM killer > > in the OOM message which is helpful to distinguish it from the > > regular OOM killer. > > > > Signed-off-by: Michal Hocko > > It's really absurd that we have to go through this over and over and that > your patches are actually being merged into -mm just because you don't get > the point. > > We have no need for a force_out_of_memory() function. None whatsoever. The reasons are explained in the changelog and I do not see a single argument against any of them. > Keeping oc->force_kill around is just more pointless space on a very deep > stack and I'm tired of fixing stack overflows. This just doesn't make any sense. oc->force_kill vs oc->order = -1 replacement is completely independent on this patch and can be implemented on top of it if you really insist. > I'm certainly not going to > introduce others because you think it looks cleaner in the code when > memory compaction does the exact same thing by using cc->order == -1 to > mean explicit compaction. > > This is turning into a complete waste of time. You know what? I am tired of your complete immunity to any arguments and the way how you are pushing more hacks into an already cluttered code. out_of_memory is a giant mess wrt. to force killing and you can see at least two different bugs being there just because of the code obfuscation. If this is the state that you want to keep, I do not care. I wanted to fix real issues and do a clean up on top. You seem to do anything to block that. I just give up. -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/4] oom: split out forced OOM killer
On Wed 08-07-15 16:41:23, David Rientjes wrote: On Wed, 8 Jul 2015, Michal Hocko wrote: From: Michal Hocko mho...@suse.cz The forced OOM killing is currently wired into out_of_memory() call even though their objective is different which makes the code ugly and harder to follow. Generic out_of_memory path has to deal with configuration settings and heuristics which are completely irrelevant to the forced OOM killer (e.g. sysctl_oom_kill_allocating_task or OOM killer prevention for already dying tasks). All of them are either relying on explicit force_kill check or indirectly by checking current-mm which is always NULL for sysrq+f. This is not nice, hard to follow and error prone. Let's pull forced OOM killer code out into a separate function (force_out_of_memory) which is really trivial now. As a bonus we can clearly state that this is a forced OOM killer in the OOM message which is helpful to distinguish it from the regular OOM killer. Signed-off-by: Michal Hocko mho...@suse.cz It's really absurd that we have to go through this over and over and that your patches are actually being merged into -mm just because you don't get the point. We have no need for a force_out_of_memory() function. None whatsoever. The reasons are explained in the changelog and I do not see a single argument against any of them. Keeping oc-force_kill around is just more pointless space on a very deep stack and I'm tired of fixing stack overflows. This just doesn't make any sense. oc-force_kill vs oc-order = -1 replacement is completely independent on this patch and can be implemented on top of it if you really insist. I'm certainly not going to introduce others because you think it looks cleaner in the code when memory compaction does the exact same thing by using cc-order == -1 to mean explicit compaction. This is turning into a complete waste of time. You know what? I am tired of your complete immunity to any arguments and the way how you are pushing more hacks into an already cluttered code. out_of_memory is a giant mess wrt. to force killing and you can see at least two different bugs being there just because of the code obfuscation. If this is the state that you want to keep, I do not care. I wanted to fix real issues and do a clean up on top. You seem to do anything to block that. I just give up. -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/4] oom: split out forced OOM killer
On Thu, 9 Jul 2015, Michal Hocko wrote: The forced OOM killing is currently wired into out_of_memory() call even though their objective is different which makes the code ugly and harder to follow. Generic out_of_memory path has to deal with configuration settings and heuristics which are completely irrelevant to the forced OOM killer (e.g. sysctl_oom_kill_allocating_task or OOM killer prevention for already dying tasks). All of them are either relying on explicit force_kill check or indirectly by checking current-mm which is always NULL for sysrq+f. This is not nice, hard to follow and error prone. Let's pull forced OOM killer code out into a separate function (force_out_of_memory) which is really trivial now. As a bonus we can clearly state that this is a forced OOM killer in the OOM message which is helpful to distinguish it from the regular OOM killer. Signed-off-by: Michal Hocko mho...@suse.cz It's really absurd that we have to go through this over and over and that your patches are actually being merged into -mm just because you don't get the point. We have no need for a force_out_of_memory() function. None whatsoever. The reasons are explained in the changelog and I do not see a single argument against any of them. We have a large number of checks in the oom killer to handle various circumstances. Those include different sysctl behaviors, different oom contexts (system, mempolicy, cpuset, memcg) to handle, behavior on concurrent exiting or killed processes, different calling context, etc. That doesn't mean they are deserving of individual functions that duplicate logic that add more and more lines of code. Keeping oc-force_kill around is just more pointless space on a very deep stack and I'm tired of fixing stack overflows. This just doesn't make any sense. oc-force_kill vs oc-order = -1 replacement is completely independent on this patch and can be implemented on top of it if you really insist. You are introducing a separate function that duplicates logic to avoid adding two checks to existing conditionals. That's what I disagree with. I'm certainly not going to introduce others because you think it looks cleaner in the code when memory compaction does the exact same thing by using cc-order == -1 to mean explicit compaction. This is turning into a complete waste of time. You know what? I am tired of your complete immunity to any arguments and the way how you are pushing more hacks into an already cluttered code. I'm going to make one final comment on these constant reiterations of the same patchset and then move on. I simply don't have the time to continue to discuss stylistic differences: in this case, I disagree with you introducing a new function that duplicates logic elsewhere to avoid adding two checks in existing conditions. If we look at memory compaction, I see cc-order == -1 checks in four places. cc-order == -1 means compaction was triggered explicitly from the command line, just as oc-order == -1 in my patchset means the oom killer was triggered explicitly from sysrq. __compact_finished(): /* * order == -1 is expected when compacting via * /proc/sys/vm/compact_memory */ if (cc-order == -1) return COMPACT_CONTINUE; __compaction_suitable(): /* * order == -1 is expected when compacting via * /proc/sys/vm/compact_memory */ if (order == -1) return COMPACT_CONTINUE; __compact_pgdat(): /* * When called via /proc/sys/vm/compact_memory * this makes sure we compact the whole zone regardless of * cached scanner positions. */ if (cc-order == -1) __reset_isolation_suitable(zone); if (cc-order == -1 || !compaction_deferred(zone, cc-order)) compact_zone(zone, cc); We don't implement separate memory compaction scanners when triggered by the command line. We simply check, where appropriate, if this is a full compaction scan or not. In that case, cc-order doesn't matter since we aren't trying to allocate a page; this is the exact same as in my patchset since oc-order doesn't matter since we aren't concerned with the order of the failed page allocation. I have never had trouble following Mel's code when it comes to the Linux VM. I recognized this as an opportunity to remove data on the stack, which is always important for the page allocator and oom killer because it can be very deep, by doing the exact same thing. check_panic_on_oom(): /* Do not panic for oom kills triggered by sysrq */ if (oc-order == -1) return; and two changes to existing conditions to determine if we should panic or if we have an eligible victim. What we don't do is what
Re: [PATCH 4/4] oom: split out forced OOM killer
On Wed, 8 Jul 2015, Michal Hocko wrote: > From: Michal Hocko > > The forced OOM killing is currently wired into out_of_memory() call > even though their objective is different which makes the code ugly > and harder to follow. Generic out_of_memory path has to deal with > configuration settings and heuristics which are completely irrelevant > to the forced OOM killer (e.g. sysctl_oom_kill_allocating_task or > OOM killer prevention for already dying tasks). All of them are > either relying on explicit force_kill check or indirectly by checking > current->mm which is always NULL for sysrq+f. This is not nice, hard > to follow and error prone. > > Let's pull forced OOM killer code out into a separate function > (force_out_of_memory) which is really trivial now. > As a bonus we can clearly state that this is a forced OOM killer > in the OOM message which is helpful to distinguish it from the > regular OOM killer. > > Signed-off-by: Michal Hocko It's really absurd that we have to go through this over and over and that your patches are actually being merged into -mm just because you don't get the point. We have no need for a force_out_of_memory() function. None whatsoever. Keeping oc->force_kill around is just more pointless space on a very deep stack and I'm tired of fixing stack overflows. I'm certainly not going to introduce others because you think it looks cleaner in the code when memory compaction does the exact same thing by using cc->order == -1 to mean explicit compaction. This is turning into a complete waste of time. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 4/4] oom: split out forced OOM killer
From: Michal Hocko The forced OOM killing is currently wired into out_of_memory() call even though their objective is different which makes the code ugly and harder to follow. Generic out_of_memory path has to deal with configuration settings and heuristics which are completely irrelevant to the forced OOM killer (e.g. sysctl_oom_kill_allocating_task or OOM killer prevention for already dying tasks). All of them are either relying on explicit force_kill check or indirectly by checking current->mm which is always NULL for sysrq+f. This is not nice, hard to follow and error prone. Let's pull forced OOM killer code out into a separate function (force_out_of_memory) which is really trivial now. As a bonus we can clearly state that this is a forced OOM killer in the OOM message which is helpful to distinguish it from the regular OOM killer. Signed-off-by: Michal Hocko --- drivers/tty/sysrq.c | 9 + include/linux/oom.h | 1 + mm/oom_kill.c | 57 - 3 files changed, 41 insertions(+), 26 deletions(-) diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c index 865b837a9aee..6a3def693ded 100644 --- a/drivers/tty/sysrq.c +++ b/drivers/tty/sysrq.c @@ -356,15 +356,8 @@ static struct sysrq_key_op sysrq_term_op = { static void moom_callback(struct work_struct *ignored) { - const gfp_t gfp_mask = GFP_KERNEL; - struct oom_context oc = { - .zonelist = node_zonelist(first_memory_node, gfp_mask), - .gfp_mask = gfp_mask, - .force_kill = true, - }; - mutex_lock(_lock); - if (!out_of_memory()) + if (!force_out_of_memory()) pr_info("OOM request ignored because killer is disabled\n"); mutex_unlock(_lock); } diff --git a/include/linux/oom.h b/include/linux/oom.h index 094407cb2d2e..6af2d12d6134 100644 --- a/include/linux/oom.h +++ b/include/linux/oom.h @@ -77,6 +77,7 @@ extern enum oom_scan_t oom_scan_process_thread(struct oom_context *oc, struct task_struct *task, unsigned long totalpages); extern bool out_of_memory(struct oom_context *oc); +extern bool force_out_of_memory(void); extern void exit_oom_victim(void); diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 01aa4cb86857..6a0b09296236 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -627,6 +627,38 @@ int unregister_oom_notifier(struct notifier_block *nb) EXPORT_SYMBOL_GPL(unregister_oom_notifier); /** + * force_out_of_memory - forces OOM killer to kill a process + * + * Explicitly trigger the OOM killer. The system doesn't have to be under + * OOM condition (e.g. sysrq+f). + */ +bool force_out_of_memory(void) +{ + struct task_struct *p; + unsigned long totalpages; + unsigned int points; + const gfp_t gfp_mask = GFP_KERNEL; + struct oom_context oc = { + .zonelist = node_zonelist(first_memory_node, gfp_mask), + .gfp_mask = gfp_mask, + .force_kill = true, + }; + + if (oom_killer_disabled) + return false; + + constrained_alloc(, ); + p = select_bad_process(, , totalpages); + if (p != (void *)-1UL) + oom_kill_process(, p, points, totalpages, NULL, +"Forced out of memory killer"); + else + pr_warn("Sysrq triggered out of memory. No killable task found...\n"); + + return true; +} + +/** * out_of_memory - kill the "best" process when we run out of memory * @oc: pointer to struct oom_context * @@ -647,12 +679,10 @@ bool out_of_memory(struct oom_context *oc) if (oom_killer_disabled) return false; - if (!oc->force_kill) { - blocking_notifier_call_chain(_notify_list, 0, ); - if (freed > 0) - /* Got some memory back in the last second. */ - goto out; - } + blocking_notifier_call_chain(_notify_list, 0, ); + if (freed > 0) + /* Got some memory back in the last second. */ + goto out; /* * If current has a pending SIGKILL or is exiting, then automatically @@ -675,13 +705,8 @@ bool out_of_memory(struct oom_context *oc) constraint = constrained_alloc(oc, ); if (constraint != CONSTRAINT_MEMORY_POLICY) oc->nodemask = NULL; - if (!oc->force_kill) - check_panic_on_oom(oc, constraint, NULL); + check_panic_on_oom(oc, constraint, NULL); - /* -* not affecting force_kill because sysrq triggered OOM killer runs from -* the workqueue context so current->mm will be NULL -*/ if (sysctl_oom_kill_allocating_task && current->mm && !oom_unkillable_task(current, NULL, oc->nodemask) && current->signal->oom_score_adj != OOM_SCORE_ADJ_MIN) { @@ -694,12 +719,8 @@ bool out_of_memory(struct oom_context *oc) p =
[PATCH 4/4] oom: split out forced OOM killer
From: Michal Hocko mho...@suse.cz The forced OOM killing is currently wired into out_of_memory() call even though their objective is different which makes the code ugly and harder to follow. Generic out_of_memory path has to deal with configuration settings and heuristics which are completely irrelevant to the forced OOM killer (e.g. sysctl_oom_kill_allocating_task or OOM killer prevention for already dying tasks). All of them are either relying on explicit force_kill check or indirectly by checking current-mm which is always NULL for sysrq+f. This is not nice, hard to follow and error prone. Let's pull forced OOM killer code out into a separate function (force_out_of_memory) which is really trivial now. As a bonus we can clearly state that this is a forced OOM killer in the OOM message which is helpful to distinguish it from the regular OOM killer. Signed-off-by: Michal Hocko mho...@suse.cz --- drivers/tty/sysrq.c | 9 + include/linux/oom.h | 1 + mm/oom_kill.c | 57 - 3 files changed, 41 insertions(+), 26 deletions(-) diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c index 865b837a9aee..6a3def693ded 100644 --- a/drivers/tty/sysrq.c +++ b/drivers/tty/sysrq.c @@ -356,15 +356,8 @@ static struct sysrq_key_op sysrq_term_op = { static void moom_callback(struct work_struct *ignored) { - const gfp_t gfp_mask = GFP_KERNEL; - struct oom_context oc = { - .zonelist = node_zonelist(first_memory_node, gfp_mask), - .gfp_mask = gfp_mask, - .force_kill = true, - }; - mutex_lock(oom_lock); - if (!out_of_memory(oc)) + if (!force_out_of_memory()) pr_info(OOM request ignored because killer is disabled\n); mutex_unlock(oom_lock); } diff --git a/include/linux/oom.h b/include/linux/oom.h index 094407cb2d2e..6af2d12d6134 100644 --- a/include/linux/oom.h +++ b/include/linux/oom.h @@ -77,6 +77,7 @@ extern enum oom_scan_t oom_scan_process_thread(struct oom_context *oc, struct task_struct *task, unsigned long totalpages); extern bool out_of_memory(struct oom_context *oc); +extern bool force_out_of_memory(void); extern void exit_oom_victim(void); diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 01aa4cb86857..6a0b09296236 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -627,6 +627,38 @@ int unregister_oom_notifier(struct notifier_block *nb) EXPORT_SYMBOL_GPL(unregister_oom_notifier); /** + * force_out_of_memory - forces OOM killer to kill a process + * + * Explicitly trigger the OOM killer. The system doesn't have to be under + * OOM condition (e.g. sysrq+f). + */ +bool force_out_of_memory(void) +{ + struct task_struct *p; + unsigned long totalpages; + unsigned int points; + const gfp_t gfp_mask = GFP_KERNEL; + struct oom_context oc = { + .zonelist = node_zonelist(first_memory_node, gfp_mask), + .gfp_mask = gfp_mask, + .force_kill = true, + }; + + if (oom_killer_disabled) + return false; + + constrained_alloc(oc, totalpages); + p = select_bad_process(oc, points, totalpages); + if (p != (void *)-1UL) + oom_kill_process(oc, p, points, totalpages, NULL, +Forced out of memory killer); + else + pr_warn(Sysrq triggered out of memory. No killable task found...\n); + + return true; +} + +/** * out_of_memory - kill the best process when we run out of memory * @oc: pointer to struct oom_context * @@ -647,12 +679,10 @@ bool out_of_memory(struct oom_context *oc) if (oom_killer_disabled) return false; - if (!oc-force_kill) { - blocking_notifier_call_chain(oom_notify_list, 0, freed); - if (freed 0) - /* Got some memory back in the last second. */ - goto out; - } + blocking_notifier_call_chain(oom_notify_list, 0, freed); + if (freed 0) + /* Got some memory back in the last second. */ + goto out; /* * If current has a pending SIGKILL or is exiting, then automatically @@ -675,13 +705,8 @@ bool out_of_memory(struct oom_context *oc) constraint = constrained_alloc(oc, totalpages); if (constraint != CONSTRAINT_MEMORY_POLICY) oc-nodemask = NULL; - if (!oc-force_kill) - check_panic_on_oom(oc, constraint, NULL); + check_panic_on_oom(oc, constraint, NULL); - /* -* not affecting force_kill because sysrq triggered OOM killer runs from -* the workqueue context so current-mm will be NULL -*/ if (sysctl_oom_kill_allocating_task current-mm !oom_unkillable_task(current, NULL, oc-nodemask) current-signal-oom_score_adj != OOM_SCORE_ADJ_MIN) { @@ -694,12 +719,8 @@ bool
Re: [PATCH 4/4] oom: split out forced OOM killer
On Wed, 8 Jul 2015, Michal Hocko wrote: From: Michal Hocko mho...@suse.cz The forced OOM killing is currently wired into out_of_memory() call even though their objective is different which makes the code ugly and harder to follow. Generic out_of_memory path has to deal with configuration settings and heuristics which are completely irrelevant to the forced OOM killer (e.g. sysctl_oom_kill_allocating_task or OOM killer prevention for already dying tasks). All of them are either relying on explicit force_kill check or indirectly by checking current-mm which is always NULL for sysrq+f. This is not nice, hard to follow and error prone. Let's pull forced OOM killer code out into a separate function (force_out_of_memory) which is really trivial now. As a bonus we can clearly state that this is a forced OOM killer in the OOM message which is helpful to distinguish it from the regular OOM killer. Signed-off-by: Michal Hocko mho...@suse.cz It's really absurd that we have to go through this over and over and that your patches are actually being merged into -mm just because you don't get the point. We have no need for a force_out_of_memory() function. None whatsoever. Keeping oc-force_kill around is just more pointless space on a very deep stack and I'm tired of fixing stack overflows. I'm certainly not going to introduce others because you think it looks cleaner in the code when memory compaction does the exact same thing by using cc-order == -1 to mean explicit compaction. This is turning into a complete waste of time. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/