Re: process hangs on do_exit when oom happens
On Fri, 2012-10-26 at 10:03 -0700, Mike Galbraith wrote: > The bug is in the patch that used sched_setscheduler_nocheck(). Plain > sched_setscheduler() would have replied -EGOAWAY. sched_setscheduler_nocheck() should say go away too methinks. This isn't about permissions, it's about not being stupid in general. sched: fix __sched_setscheduler() RT_GROUP_SCHED conditionals Remove user and rt_bandwidth_enabled() RT_GROUP_SCHED conditionals in __sched_setscheduler(). The end result of kernel OR user promoting a task in a group with zero rt_runtime allocated is the same bad thing, and throttle switch position matters little. It's safer to just say no solely based upon bandwidth existence, may save the user a nasty surprise if he later flips the throttle switch to 'on'. The commit below came about due to sched_setscheduler_nocheck() allowing a task in a task group with zero rt_runtime allocated to be promoted by the kernel oom logic, thus marooning it forever. commit 341aea2bc48bf652777fb015cc2b3dfa9a451817 Author: KOSAKI Motohiro Date: Thu Apr 14 15:22:13 2011 -0700 oom-kill: remove boost_dying_task_prio() This is an almost-revert of commit 93b43fa ("oom: give the dying task a higher priority"). That commit dramatically improved oom killer logic when a fork-bomb occurs. But I've found that it has nasty corner case. Now cpu cgroup has strange default RT runtime. It's 0! That said, if a process under cpu cgroup promote RT scheduling class, the process never run at all. Signed-off-by: Mike Galbraith diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 2d8927f..d3a35f8 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -3810,17 +3810,14 @@ recheck: } #ifdef CONFIG_RT_GROUP_SCHED - if (user) { - /* -* Do not allow realtime tasks into groups that have no runtime -* assigned. -*/ - if (rt_bandwidth_enabled() && rt_policy(policy) && - task_group(p)->rt_bandwidth.rt_runtime == 0 && - !task_group_is_autogroup(task_group(p))) { - task_rq_unlock(rq, p, &flags); - return -EPERM; - } + /* +* Do not allow realtime tasks into groups that have no runtime +* assigned. +*/ + if (rt_policy(policy) && task_group(p)->rt_bandwidth.rt_runtime == 0 && + !task_group_is_autogroup(task_group(p))) { + task_rq_unlock(rq, p, &flags); + return -EPERM; } #endif -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: process hangs on do_exit when oom happens
On Fri, 2012-10-26 at 10:42 +0800, Qiang Gao wrote: > On Thu, Oct 25, 2012 at 5:57 PM, Michal Hocko wrote: > > On Wed 24-10-12 11:44:17, Qiang Gao wrote: > >> On Wed, Oct 24, 2012 at 1:43 AM, Balbir Singh > >> wrote: > >> > On Tue, Oct 23, 2012 at 3:45 PM, Michal Hocko wrote: > >> >> On Tue 23-10-12 18:10:33, Qiang Gao wrote: > >> >>> On Tue, Oct 23, 2012 at 5:50 PM, Michal Hocko wrote: > >> >>> > On Tue 23-10-12 15:18:48, Qiang Gao wrote: > >> >>> >> This process was moved to RT-priority queue when global oom-killer > >> >>> >> happened to boost the recovery of the system.. > >> >>> > > >> >>> > Who did that? oom killer doesn't boost the priority (scheduling > >> >>> > class) > >> >>> > AFAIK. > >> >>> > > >> >>> >> but it wasn't get properily dealt with. I still have no idea why > >> >>> >> where > >> >>> >> the problem is .. > >> >>> > > >> >>> > Well your configuration says that there is no runtime reserved for > >> >>> > the > >> >>> > group. > >> >>> > Please refer to Documentation/scheduler/sched-rt-group.txt for more > >> >>> > information. > >> >>> > > >> >> [...] > >> >>> maybe this is not a upstream-kernel bug. the centos/redhat kernel > >> >>> would boost the process to RT prio when the process was selected > >> >>> by oom-killer. > >> >> > >> >> This still looks like your cpu controller is misconfigured. Even if the > >> >> task is promoted to be realtime. > >> > > >> > > >> > Precisely! You need to have rt bandwidth enabled for RT tasks to run, > >> > as a workaround please give the groups some RT bandwidth and then work > >> > out the migration to RT and what should be the defaults on the distro. > >> > > >> > Balbir > >> > >> > >> see https://patchwork.kernel.org/patch/719411/ > > > > The patch surely "fixes" your problem but the primary fault here is the > > mis-configured cpu cgroup. If the value for the bandwidth is zero by > > default then all realtime processes in the group a screwed. The value > > should be set to something more reasonable. > > I am not familiar with the cpu controller but it seems that > > alloc_rt_sched_group needs some treat. Care to look into it and send a > > patch to the cpu controller and cgroup maintainers, please? > > > > -- > > Michal Hocko > > SUSE Labs > > I'm trying to fix the problem. but no substantive progress yet. The throttle tracks a finite resource for an arbitrary number of groups, so there's no sane rt_runtime default other than zero. Most folks only want the top level throttle warm fuzzy, so a complete runtime RT_GROUP_SCHED on/off switch with default to off, ie rt tasks cannot be moved until switched on would fix some annoying "Oopsie, I forgot" allocation troubles. If you turn it on, shame on you if you fail to allocate, you asked for it, you're not just stuck with it because your distro enabled it in their config. Or, perhaps just make zero rt_runtime always mean traverse up to first non-zero rt_runtime, ie zero allocation children may consume parental runtime as they see fit on first come first served basis, when it's gone, tough, parent/children all wait for refill. Or whatever, as long as you don't bust distribution/tracking for those crazy people who intentionally use RT_GROUP_SCHED ;-) The bug is in the patch that used sched_setscheduler_nocheck(). Plain sched_setscheduler() would have replied -EGOAWAY. -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: process hangs on do_exit when oom happens
On Thu, Oct 25, 2012 at 5:57 PM, Michal Hocko wrote: > On Wed 24-10-12 11:44:17, Qiang Gao wrote: >> On Wed, Oct 24, 2012 at 1:43 AM, Balbir Singh wrote: >> > On Tue, Oct 23, 2012 at 3:45 PM, Michal Hocko wrote: >> >> On Tue 23-10-12 18:10:33, Qiang Gao wrote: >> >>> On Tue, Oct 23, 2012 at 5:50 PM, Michal Hocko wrote: >> >>> > On Tue 23-10-12 15:18:48, Qiang Gao wrote: >> >>> >> This process was moved to RT-priority queue when global oom-killer >> >>> >> happened to boost the recovery of the system.. >> >>> > >> >>> > Who did that? oom killer doesn't boost the priority (scheduling class) >> >>> > AFAIK. >> >>> > >> >>> >> but it wasn't get properily dealt with. I still have no idea why where >> >>> >> the problem is .. >> >>> > >> >>> > Well your configuration says that there is no runtime reserved for the >> >>> > group. >> >>> > Please refer to Documentation/scheduler/sched-rt-group.txt for more >> >>> > information. >> >>> > >> >> [...] >> >>> maybe this is not a upstream-kernel bug. the centos/redhat kernel >> >>> would boost the process to RT prio when the process was selected >> >>> by oom-killer. >> >> >> >> This still looks like your cpu controller is misconfigured. Even if the >> >> task is promoted to be realtime. >> > >> > >> > Precisely! You need to have rt bandwidth enabled for RT tasks to run, >> > as a workaround please give the groups some RT bandwidth and then work >> > out the migration to RT and what should be the defaults on the distro. >> > >> > Balbir >> >> >> see https://patchwork.kernel.org/patch/719411/ > > The patch surely "fixes" your problem but the primary fault here is the > mis-configured cpu cgroup. If the value for the bandwidth is zero by > default then all realtime processes in the group a screwed. The value > should be set to something more reasonable. > I am not familiar with the cpu controller but it seems that > alloc_rt_sched_group needs some treat. Care to look into it and send a > patch to the cpu controller and cgroup maintainers, please? > > -- > Michal Hocko > SUSE Labs I'm trying to fix the problem. but no substantive progress yet. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: process hangs on do_exit when oom happens
On Wed 24-10-12 11:44:17, Qiang Gao wrote: > On Wed, Oct 24, 2012 at 1:43 AM, Balbir Singh wrote: > > On Tue, Oct 23, 2012 at 3:45 PM, Michal Hocko wrote: > >> On Tue 23-10-12 18:10:33, Qiang Gao wrote: > >>> On Tue, Oct 23, 2012 at 5:50 PM, Michal Hocko wrote: > >>> > On Tue 23-10-12 15:18:48, Qiang Gao wrote: > >>> >> This process was moved to RT-priority queue when global oom-killer > >>> >> happened to boost the recovery of the system.. > >>> > > >>> > Who did that? oom killer doesn't boost the priority (scheduling class) > >>> > AFAIK. > >>> > > >>> >> but it wasn't get properily dealt with. I still have no idea why where > >>> >> the problem is .. > >>> > > >>> > Well your configuration says that there is no runtime reserved for the > >>> > group. > >>> > Please refer to Documentation/scheduler/sched-rt-group.txt for more > >>> > information. > >>> > > >> [...] > >>> maybe this is not a upstream-kernel bug. the centos/redhat kernel > >>> would boost the process to RT prio when the process was selected > >>> by oom-killer. > >> > >> This still looks like your cpu controller is misconfigured. Even if the > >> task is promoted to be realtime. > > > > > > Precisely! You need to have rt bandwidth enabled for RT tasks to run, > > as a workaround please give the groups some RT bandwidth and then work > > out the migration to RT and what should be the defaults on the distro. > > > > Balbir > > > see https://patchwork.kernel.org/patch/719411/ The patch surely "fixes" your problem but the primary fault here is the mis-configured cpu cgroup. If the value for the bandwidth is zero by default then all realtime processes in the group a screwed. The value should be set to something more reasonable. I am not familiar with the cpu controller but it seems that alloc_rt_sched_group needs some treat. Care to look into it and send a patch to the cpu controller and cgroup maintainers, please? -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: process hangs on do_exit when oom happens
On Wed, Oct 24, 2012 at 1:43 AM, Balbir Singh wrote: > On Tue, Oct 23, 2012 at 3:45 PM, Michal Hocko wrote: >> On Tue 23-10-12 18:10:33, Qiang Gao wrote: >>> On Tue, Oct 23, 2012 at 5:50 PM, Michal Hocko wrote: >>> > On Tue 23-10-12 15:18:48, Qiang Gao wrote: >>> >> This process was moved to RT-priority queue when global oom-killer >>> >> happened to boost the recovery of the system.. >>> > >>> > Who did that? oom killer doesn't boost the priority (scheduling class) >>> > AFAIK. >>> > >>> >> but it wasn't get properily dealt with. I still have no idea why where >>> >> the problem is .. >>> > >>> > Well your configuration says that there is no runtime reserved for the >>> > group. >>> > Please refer to Documentation/scheduler/sched-rt-group.txt for more >>> > information. >>> > >> [...] >>> maybe this is not a upstream-kernel bug. the centos/redhat kernel >>> would boost the process to RT prio when the process was selected >>> by oom-killer. >> >> This still looks like your cpu controller is misconfigured. Even if the >> task is promoted to be realtime. > > > Precisely! You need to have rt bandwidth enabled for RT tasks to run, > as a workaround please give the groups some RT bandwidth and then work > out the migration to RT and what should be the defaults on the distro. > > Balbir see https://patchwork.kernel.org/patch/719411/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: process hangs on do_exit when oom happens
On Tue, Oct 23, 2012 at 3:45 PM, Michal Hocko wrote: > On Tue 23-10-12 18:10:33, Qiang Gao wrote: >> On Tue, Oct 23, 2012 at 5:50 PM, Michal Hocko wrote: >> > On Tue 23-10-12 15:18:48, Qiang Gao wrote: >> >> This process was moved to RT-priority queue when global oom-killer >> >> happened to boost the recovery of the system.. >> > >> > Who did that? oom killer doesn't boost the priority (scheduling class) >> > AFAIK. >> > >> >> but it wasn't get properily dealt with. I still have no idea why where >> >> the problem is .. >> > >> > Well your configuration says that there is no runtime reserved for the >> > group. >> > Please refer to Documentation/scheduler/sched-rt-group.txt for more >> > information. >> > > [...] >> maybe this is not a upstream-kernel bug. the centos/redhat kernel >> would boost the process to RT prio when the process was selected >> by oom-killer. > > This still looks like your cpu controller is misconfigured. Even if the > task is promoted to be realtime. Precisely! You need to have rt bandwidth enabled for RT tasks to run, as a workaround please give the groups some RT bandwidth and then work out the migration to RT and what should be the defaults on the distro. Balbir -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: process hangs on do_exit when oom happens
On Tue 23-10-12 18:10:33, Qiang Gao wrote: > On Tue, Oct 23, 2012 at 5:50 PM, Michal Hocko wrote: > > On Tue 23-10-12 15:18:48, Qiang Gao wrote: > >> This process was moved to RT-priority queue when global oom-killer > >> happened to boost the recovery of the system.. > > > > Who did that? oom killer doesn't boost the priority (scheduling class) > > AFAIK. > > > >> but it wasn't get properily dealt with. I still have no idea why where > >> the problem is .. > > > > Well your configuration says that there is no runtime reserved for the > > group. > > Please refer to Documentation/scheduler/sched-rt-group.txt for more > > information. > > [...] > maybe this is not a upstream-kernel bug. the centos/redhat kernel > would boost the process to RT prio when the process was selected > by oom-killer. This still looks like your cpu controller is misconfigured. Even if the task is promoted to be realtime. -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: process hangs on do_exit when oom happens
On Tue, Oct 23, 2012 at 5:50 PM, Michal Hocko wrote: > On Tue 23-10-12 15:18:48, Qiang Gao wrote: >> This process was moved to RT-priority queue when global oom-killer >> happened to boost the recovery of the system.. > > Who did that? oom killer doesn't boost the priority (scheduling class) > AFAIK. > >> but it wasn't get properily dealt with. I still have no idea why where >> the problem is .. > > Well your configuration says that there is no runtime reserved for the > group. > Please refer to Documentation/scheduler/sched-rt-group.txt for more > information. > >> On Tue, Oct 23, 2012 at 12:40 PM, Balbir Singh wrote: >> > On Tue, Oct 23, 2012 at 9:05 AM, Qiang Gao wrote: >> >> information about the system is in the attach file "information.txt" >> >> >> >> I can not reproduce it in the upstream 3.6.0 kernel.. >> >> >> >> On Sat, Oct 20, 2012 at 12:04 AM, Michal Hocko wrote: >> >>> On Wed 17-10-12 18:23:34, gaoqiang wrote: >> I looked up nothing useful with google,so I'm here for help.. >> >> when this happens: I use memcg to limit the memory use of a >> process,and when the memcg cgroup was out of memory, >> the process was oom-killed however,it cannot really complete the >> exiting. here is the some information >> >>> >> >>> How many tasks are in the group and what kind of memory do they use? >> >>> Is it possible that you were hit by the same issue as described in >> >>> 79dfdacc memcg: make oom_lock 0 and 1 based rather than counter. >> >>> >> OS version: centos6.22.6.32.220.7.1 >> >>> >> >>> Your kernel is quite old and you should be probably asking your >> >>> distribution to help you out. There were many fixes since 2.6.32. >> >>> Are you able to reproduce the same issue with the current vanila kernel? >> >>> >> /proc/pid/stack >> --- >> >> [] __cond_resched+0x2a/0x40 >> [] unmap_vmas+0xb49/0xb70 >> [] exit_mmap+0x7e/0x140 >> [] mmput+0x58/0x110 >> [] exit_mm+0x11d/0x160 >> [] do_exit+0x1ad/0x860 >> [] do_group_exit+0x41/0xb0 >> [] get_signal_to_deliver+0x1e8/0x430 >> [] do_notify_resume+0xf4/0x8b0 >> [] int_signal+0x12/0x17 >> [] 0x >> >>> >> >>> This looks strange because this is just an exit part which shouldn't >> >>> deadlock or anything. Is this stack stable? Have you tried to take check >> >>> it more times? >> > >> > Looking at information.txt, I found something interesting >> > >> > rt_rq[0]:/1314 >> > .rt_nr_running : 1 >> > .rt_throttled : 1 >> > .rt_time : 0.856656 >> > .rt_runtime: 0.00 >> > >> > >> > cfs_rq[0]:/1314 >> > .exec_clock: 8738.133429 >> > .MIN_vruntime : 0.01 >> > .min_vruntime : 8739.371271 >> > .max_vruntime : 0.01 >> > .spread: 0.00 >> > .spread0 : -9792.24 >> > .nr_spread_over: 1 >> > .nr_running: 0 >> > .load : 0 >> > .load_avg : 7376.722880 >> > .load_period : 7.203830 >> > .load_contrib : 1023 >> > .load_tg : 1023 >> > .se->exec_start: 282004.715064 >> > .se->vruntime : 18435.664560 >> > .se->sum_exec_runtime : 8738.133429 >> > .se->wait_start: 0.00 >> > .se->sleep_start : 0.00 >> > .se->block_start : 0.00 >> > .se->sleep_max : 0.00 >> > .se->block_max : 0.00 >> > .se->exec_max : 77.977054 >> > .se->slice_max : 0.00 >> > .se->wait_max : 2.664779 >> > .se->wait_sum : 29.970575 >> > .se->wait_count: 102 >> > .se->load.weight : 2 >> > >> > So 1314 is a real time process and >> > >> > cpu.rt_period_us: >> > 100 >> > -- >> > cpu.rt_runtime_us: >> > 0 >> > >> > When did tt move to being a Real Time process (hint: see nr_running >> > and nr_throttled)? >> > >> > Balbir >> -- >> To unsubscribe from this list: send the line "unsubscribe cgroups" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- > Michal Hocko > SUSE Labs > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ maybe this is not a upstream-kernel bug. the centos/redhat kernel would boost the process to RT prio when the process was selected by oom-killer. I think I should report
Re: process hangs on do_exit when oom happens
On Tue 23-10-12 15:18:48, Qiang Gao wrote: > This process was moved to RT-priority queue when global oom-killer > happened to boost the recovery of the system.. Who did that? oom killer doesn't boost the priority (scheduling class) AFAIK. > but it wasn't get properily dealt with. I still have no idea why where > the problem is .. Well your configuration says that there is no runtime reserved for the group. Please refer to Documentation/scheduler/sched-rt-group.txt for more information. > On Tue, Oct 23, 2012 at 12:40 PM, Balbir Singh wrote: > > On Tue, Oct 23, 2012 at 9:05 AM, Qiang Gao wrote: > >> information about the system is in the attach file "information.txt" > >> > >> I can not reproduce it in the upstream 3.6.0 kernel.. > >> > >> On Sat, Oct 20, 2012 at 12:04 AM, Michal Hocko wrote: > >>> On Wed 17-10-12 18:23:34, gaoqiang wrote: > I looked up nothing useful with google,so I'm here for help.. > > when this happens: I use memcg to limit the memory use of a > process,and when the memcg cgroup was out of memory, > the process was oom-killed however,it cannot really complete the > exiting. here is the some information > >>> > >>> How many tasks are in the group and what kind of memory do they use? > >>> Is it possible that you were hit by the same issue as described in > >>> 79dfdacc memcg: make oom_lock 0 and 1 based rather than counter. > >>> > OS version: centos6.22.6.32.220.7.1 > >>> > >>> Your kernel is quite old and you should be probably asking your > >>> distribution to help you out. There were many fixes since 2.6.32. > >>> Are you able to reproduce the same issue with the current vanila kernel? > >>> > /proc/pid/stack > --- > > [] __cond_resched+0x2a/0x40 > [] unmap_vmas+0xb49/0xb70 > [] exit_mmap+0x7e/0x140 > [] mmput+0x58/0x110 > [] exit_mm+0x11d/0x160 > [] do_exit+0x1ad/0x860 > [] do_group_exit+0x41/0xb0 > [] get_signal_to_deliver+0x1e8/0x430 > [] do_notify_resume+0xf4/0x8b0 > [] int_signal+0x12/0x17 > [] 0x > >>> > >>> This looks strange because this is just an exit part which shouldn't > >>> deadlock or anything. Is this stack stable? Have you tried to take check > >>> it more times? > > > > Looking at information.txt, I found something interesting > > > > rt_rq[0]:/1314 > > .rt_nr_running : 1 > > .rt_throttled : 1 > > .rt_time : 0.856656 > > .rt_runtime: 0.00 > > > > > > cfs_rq[0]:/1314 > > .exec_clock: 8738.133429 > > .MIN_vruntime : 0.01 > > .min_vruntime : 8739.371271 > > .max_vruntime : 0.01 > > .spread: 0.00 > > .spread0 : -9792.24 > > .nr_spread_over: 1 > > .nr_running: 0 > > .load : 0 > > .load_avg : 7376.722880 > > .load_period : 7.203830 > > .load_contrib : 1023 > > .load_tg : 1023 > > .se->exec_start: 282004.715064 > > .se->vruntime : 18435.664560 > > .se->sum_exec_runtime : 8738.133429 > > .se->wait_start: 0.00 > > .se->sleep_start : 0.00 > > .se->block_start : 0.00 > > .se->sleep_max : 0.00 > > .se->block_max : 0.00 > > .se->exec_max : 77.977054 > > .se->slice_max : 0.00 > > .se->wait_max : 2.664779 > > .se->wait_sum : 29.970575 > > .se->wait_count: 102 > > .se->load.weight : 2 > > > > So 1314 is a real time process and > > > > cpu.rt_period_us: > > 100 > > -- > > cpu.rt_runtime_us: > > 0 > > > > When did tt move to being a Real Time process (hint: see nr_running > > and nr_throttled)? > > > > Balbir > -- > To unsubscribe from this list: send the line "unsubscribe cgroups" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: process hangs on do_exit when oom happens
On Tue 23-10-12 17:08:40, Qiang Gao wrote: > this is just an example to show how to reproduce. actually,the first time I > saw > this situation was on a machine with 288G RAM with many tasks running and > we limit 30G for each. but finanlly, no one exceeds this limit the the system > oom. Yes but mentioning memory controller then might be misleading... It seems that the only factor in your load is the cpu controller. And please stop top-posting. It makes the discussion messy. -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: process hangs on do_exit when oom happens
global-oom is the right thing to do. but oom-killed-process hanging on do_exit is not the normal behavior On Tue, Oct 23, 2012 at 5:01 PM, Sha Zhengju wrote: > On 10/23/2012 11:35 AM, Qiang Gao wrote: >> >> information about the system is in the attach file "information.txt" >> >> I can not reproduce it in the upstream 3.6.0 kernel.. >> >> On Sat, Oct 20, 2012 at 12:04 AM, Michal Hocko wrote: >>> >>> On Wed 17-10-12 18:23:34, gaoqiang wrote: I looked up nothing useful with google,so I'm here for help.. when this happens: I use memcg to limit the memory use of a process,and when the memcg cgroup was out of memory, the process was oom-killed however,it cannot really complete the exiting. here is the some information >>> >>> How many tasks are in the group and what kind of memory do they use? >>> Is it possible that you were hit by the same issue as described in >>> 79dfdacc memcg: make oom_lock 0 and 1 based rather than counter. >>> OS version: centos6.22.6.32.220.7.1 >>> >>> Your kernel is quite old and you should be probably asking your >>> distribution to help you out. There were many fixes since 2.6.32. >>> Are you able to reproduce the same issue with the current vanila kernel? >>> /proc/pid/stack --- [] __cond_resched+0x2a/0x40 [] unmap_vmas+0xb49/0xb70 [] exit_mmap+0x7e/0x140 [] mmput+0x58/0x110 [] exit_mm+0x11d/0x160 [] do_exit+0x1ad/0x860 [] do_group_exit+0x41/0xb0 [] get_signal_to_deliver+0x1e8/0x430 [] do_notify_resume+0xf4/0x8b0 [] int_signal+0x12/0x17 [] 0x >>> >>> This looks strange because this is just an exit part which shouldn't >>> deadlock or anything. Is this stack stable? Have you tried to take check >>> it more times? >>> > > Does the machine only have about 700M memory? I also find something > in the log file: > > Node 0 DMA free:2772kB min:72kB low:88kB high:108kB present:15312kB.. > lowmem_reserve[]: 0 674 674 674 > Node 0 DMA32 free:*3172kB* min:3284kB low:4104kB high:4924kB > present:690712kB .. > lowmem_reserve[]: 0 0 0 0 > 0 pages in swap cache > Swap cache stats: add 0, delete 0, find 0/0 > Free swap = 0kB > Total swap = 0kB > 179184 pages RAM ==> 179184 * 4 / 1024 = *700M* > 6773 pages reserved > > > Note that the free memory of DMA32(3172KB) is lower than min watermark, > which means the global is under pressure now. What's more the swap is off, > so the global oom is normal behavior. > > > Thanks, > Sha -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: process hangs on do_exit when oom happens
this is just an example to show how to reproduce. actually,the first time I saw this situation was on a machine with 288G RAM with many tasks running and we limit 30G for each. but finanlly, no one exceeds this limit the the system oom. On Tue, Oct 23, 2012 at 4:35 PM, Michal Hocko wrote: > On Tue 23-10-12 11:35:52, Qiang Gao wrote: >> I'm sure this is a global-oom,not cgroup-oom. [the dmesg output in the end] > > Yes this is the global oom killer because: >> cglimit -M 700M ./tt >> then after global-oom,the process hangs.. > >> 179184 pages RAM > > So you have ~700M of RAM so the memcg limit is basically pointless as it > cannot be reached... > -- > Michal Hocko > SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: process hangs on do_exit when oom happens
On 10/23/2012 11:35 AM, Qiang Gao wrote: information about the system is in the attach file "information.txt" I can not reproduce it in the upstream 3.6.0 kernel.. On Sat, Oct 20, 2012 at 12:04 AM, Michal Hocko wrote: On Wed 17-10-12 18:23:34, gaoqiang wrote: I looked up nothing useful with google,so I'm here for help.. when this happens: I use memcg to limit the memory use of a process,and when the memcg cgroup was out of memory, the process was oom-killed however,it cannot really complete the exiting. here is the some information How many tasks are in the group and what kind of memory do they use? Is it possible that you were hit by the same issue as described in 79dfdacc memcg: make oom_lock 0 and 1 based rather than counter. OS version: centos6.22.6.32.220.7.1 Your kernel is quite old and you should be probably asking your distribution to help you out. There were many fixes since 2.6.32. Are you able to reproduce the same issue with the current vanila kernel? /proc/pid/stack --- [] __cond_resched+0x2a/0x40 [] unmap_vmas+0xb49/0xb70 [] exit_mmap+0x7e/0x140 [] mmput+0x58/0x110 [] exit_mm+0x11d/0x160 [] do_exit+0x1ad/0x860 [] do_group_exit+0x41/0xb0 [] get_signal_to_deliver+0x1e8/0x430 [] do_notify_resume+0xf4/0x8b0 [] int_signal+0x12/0x17 [] 0x This looks strange because this is just an exit part which shouldn't deadlock or anything. Is this stack stable? Have you tried to take check it more times? Does the machine only have about 700M memory? I also find something in the log file: Node 0 DMA free:2772kB min:72kB low:88kB high:108kB present:15312kB.. lowmem_reserve[]: 0 674 674 674 Node 0 DMA32 free:*3172kB* min:3284kB low:4104kB high:4924kB present:690712kB .. lowmem_reserve[]: 0 0 0 0 0 pages in swap cache Swap cache stats: add 0, delete 0, find 0/0 Free swap = 0kB Total swap = 0kB 179184 pages RAM ==> 179184 * 4 / 1024 = *700M* 6773 pages reserved Note that the free memory of DMA32(3172KB) is lower than min watermark, which means the global is under pressure now. What's more the swap is off, so the global oom is normal behavior. Thanks, Sha -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: process hangs on do_exit when oom happens
On Tue 23-10-12 11:35:52, Qiang Gao wrote: > I'm sure this is a global-oom,not cgroup-oom. [the dmesg output in the end] Yes this is the global oom killer because: > cglimit -M 700M ./tt > then after global-oom,the process hangs.. > 179184 pages RAM So you have ~700M of RAM so the memcg limit is basically pointless as it cannot be reached... -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: process hangs on do_exit when oom happens
This process was moved to RT-priority queue when global oom-killer happened to boost the recovery of the system.. but it wasn't get properily dealt with. I still have no idea why where the problem is .. On Tue, Oct 23, 2012 at 12:40 PM, Balbir Singh wrote: > On Tue, Oct 23, 2012 at 9:05 AM, Qiang Gao wrote: >> information about the system is in the attach file "information.txt" >> >> I can not reproduce it in the upstream 3.6.0 kernel.. >> >> On Sat, Oct 20, 2012 at 12:04 AM, Michal Hocko wrote: >>> On Wed 17-10-12 18:23:34, gaoqiang wrote: I looked up nothing useful with google,so I'm here for help.. when this happens: I use memcg to limit the memory use of a process,and when the memcg cgroup was out of memory, the process was oom-killed however,it cannot really complete the exiting. here is the some information >>> >>> How many tasks are in the group and what kind of memory do they use? >>> Is it possible that you were hit by the same issue as described in >>> 79dfdacc memcg: make oom_lock 0 and 1 based rather than counter. >>> OS version: centos6.22.6.32.220.7.1 >>> >>> Your kernel is quite old and you should be probably asking your >>> distribution to help you out. There were many fixes since 2.6.32. >>> Are you able to reproduce the same issue with the current vanila kernel? >>> /proc/pid/stack --- [] __cond_resched+0x2a/0x40 [] unmap_vmas+0xb49/0xb70 [] exit_mmap+0x7e/0x140 [] mmput+0x58/0x110 [] exit_mm+0x11d/0x160 [] do_exit+0x1ad/0x860 [] do_group_exit+0x41/0xb0 [] get_signal_to_deliver+0x1e8/0x430 [] do_notify_resume+0xf4/0x8b0 [] int_signal+0x12/0x17 [] 0x >>> >>> This looks strange because this is just an exit part which shouldn't >>> deadlock or anything. Is this stack stable? Have you tried to take check >>> it more times? > > Looking at information.txt, I found something interesting > > rt_rq[0]:/1314 > .rt_nr_running : 1 > .rt_throttled : 1 > .rt_time : 0.856656 > .rt_runtime: 0.00 > > > cfs_rq[0]:/1314 > .exec_clock: 8738.133429 > .MIN_vruntime : 0.01 > .min_vruntime : 8739.371271 > .max_vruntime : 0.01 > .spread: 0.00 > .spread0 : -9792.24 > .nr_spread_over: 1 > .nr_running: 0 > .load : 0 > .load_avg : 7376.722880 > .load_period : 7.203830 > .load_contrib : 1023 > .load_tg : 1023 > .se->exec_start: 282004.715064 > .se->vruntime : 18435.664560 > .se->sum_exec_runtime : 8738.133429 > .se->wait_start: 0.00 > .se->sleep_start : 0.00 > .se->block_start : 0.00 > .se->sleep_max : 0.00 > .se->block_max : 0.00 > .se->exec_max : 77.977054 > .se->slice_max : 0.00 > .se->wait_max : 2.664779 > .se->wait_sum : 29.970575 > .se->wait_count: 102 > .se->load.weight : 2 > > So 1314 is a real time process and > > cpu.rt_period_us: > 100 > -- > cpu.rt_runtime_us: > 0 > > When did tt move to being a Real Time process (hint: see nr_running > and nr_throttled)? > > Balbir -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: process hangs on do_exit when oom happens
On Tue, Oct 23, 2012 at 9:05 AM, Qiang Gao wrote: > information about the system is in the attach file "information.txt" > > I can not reproduce it in the upstream 3.6.0 kernel.. > > On Sat, Oct 20, 2012 at 12:04 AM, Michal Hocko wrote: >> On Wed 17-10-12 18:23:34, gaoqiang wrote: >>> I looked up nothing useful with google,so I'm here for help.. >>> >>> when this happens: I use memcg to limit the memory use of a >>> process,and when the memcg cgroup was out of memory, >>> the process was oom-killed however,it cannot really complete the >>> exiting. here is the some information >> >> How many tasks are in the group and what kind of memory do they use? >> Is it possible that you were hit by the same issue as described in >> 79dfdacc memcg: make oom_lock 0 and 1 based rather than counter. >> >>> OS version: centos6.22.6.32.220.7.1 >> >> Your kernel is quite old and you should be probably asking your >> distribution to help you out. There were many fixes since 2.6.32. >> Are you able to reproduce the same issue with the current vanila kernel? >> >>> /proc/pid/stack >>> --- >>> >>> [] __cond_resched+0x2a/0x40 >>> [] unmap_vmas+0xb49/0xb70 >>> [] exit_mmap+0x7e/0x140 >>> [] mmput+0x58/0x110 >>> [] exit_mm+0x11d/0x160 >>> [] do_exit+0x1ad/0x860 >>> [] do_group_exit+0x41/0xb0 >>> [] get_signal_to_deliver+0x1e8/0x430 >>> [] do_notify_resume+0xf4/0x8b0 >>> [] int_signal+0x12/0x17 >>> [] 0x >> >> This looks strange because this is just an exit part which shouldn't >> deadlock or anything. Is this stack stable? Have you tried to take check >> it more times? Looking at information.txt, I found something interesting rt_rq[0]:/1314 .rt_nr_running : 1 .rt_throttled : 1 .rt_time : 0.856656 .rt_runtime: 0.00 cfs_rq[0]:/1314 .exec_clock: 8738.133429 .MIN_vruntime : 0.01 .min_vruntime : 8739.371271 .max_vruntime : 0.01 .spread: 0.00 .spread0 : -9792.24 .nr_spread_over: 1 .nr_running: 0 .load : 0 .load_avg : 7376.722880 .load_period : 7.203830 .load_contrib : 1023 .load_tg : 1023 .se->exec_start: 282004.715064 .se->vruntime : 18435.664560 .se->sum_exec_runtime : 8738.133429 .se->wait_start: 0.00 .se->sleep_start : 0.00 .se->block_start : 0.00 .se->sleep_max : 0.00 .se->block_max : 0.00 .se->exec_max : 77.977054 .se->slice_max : 0.00 .se->wait_max : 2.664779 .se->wait_sum : 29.970575 .se->wait_count: 102 .se->load.weight : 2 So 1314 is a real time process and cpu.rt_period_us: 100 -- cpu.rt_runtime_us: 0 When did tt move to being a Real Time process (hint: see nr_running and nr_throttled)? Balbir -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: process hangs on do_exit when oom happens
On Mon 22-10-12 10:16:43, Qiang Gao wrote: > I don't know whether the process will exit finally, bug this stack lasts > for hours, which is obviously unnormal. > The situation: we use a command calld "cglimit" to fork-and-exec the > worker process,and the "cglimit" will > set some limitation on the worker with cgroup. for now,we limit the > memory,and we also use cpu cgroup,but with > no limiation,so when the worker is running, the cgroup directory looks like > following: > > /cgroup/memory/worker : this directory limit the memory > /cgroup/cpu/worker :with no limit,but worker process is in. > > for some reason(some other process we didn't consider), the worker process > invoke global oom-killer, Are you sure that this is really global oom? What was the limit for the group? > not cgroup-oom-killer. then the worker process hangs there. > > Actually, if we didn't set the worker process into the cpu cgroup, this > will never happens. Strange and it smells like a misconfiguration. Could you provide the compllete setting for both controllers? grep . -r /cgroup/ > On Sat, Oct 20, 2012 at 12:04 AM, Michal Hocko wrote: > > > On Wed 17-10-12 18:23:34, gaoqiang wrote: > > > I looked up nothing useful with google,so I'm here for help.. > > > > > > when this happens: I use memcg to limit the memory use of a > > > process,and when the memcg cgroup was out of memory, > > > the process was oom-killed however,it cannot really complete the > > > exiting. here is the some information > > > > How many tasks are in the group and what kind of memory do they use? > > Is it possible that you were hit by the same issue as described in > > 79dfdacc memcg: make oom_lock 0 and 1 based rather than counter. > > > > > OS version: centos6.22.6.32.220.7.1 > > > > Your kernel is quite old and you should be probably asking your > > distribution to help you out. There were many fixes since 2.6.32. > > Are you able to reproduce the same issue with the current vanila kernel? > > > > > /proc/pid/stack > > > --- > > > > > > [] __cond_resched+0x2a/0x40 > > > [] unmap_vmas+0xb49/0xb70 > > > [] exit_mmap+0x7e/0x140 > > > [] mmput+0x58/0x110 > > > [] exit_mm+0x11d/0x160 > > > [] do_exit+0x1ad/0x860 > > > [] do_group_exit+0x41/0xb0 > > > [] get_signal_to_deliver+0x1e8/0x430 > > > [] do_notify_resume+0xf4/0x8b0 > > > [] int_signal+0x12/0x17 > > > [] 0x > > > > This looks strange because this is just an exit part which shouldn't > > deadlock or anything. Is this stack stable? Have you tried to take check > > it more times? > > > > -- > > Michal Hocko > > SUSE Labs > > -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: process hangs on do_exit when oom happens
On Mon, Oct 22, 2012 at 7:46 AM, Qiang Gao wrote: > I don't know whether the process will exit finally, bug this stack lasts > for hours, which is obviously unnormal. > The situation: we use a command calld "cglimit" to fork-and-exec the worker > process,and the "cglimit" will > set some limitation on the worker with cgroup. for now,we limit the > memory,and we also use cpu cgroup,but with > no limiation,so when the worker is running, the cgroup directory looks like > following: > > /cgroup/memory/worker : this directory limit the memory > /cgroup/cpu/worker :with no limit,but worker process is in. > > for some reason(some other process we didn't consider), the worker process > invoke global oom-killer, > not cgroup-oom-killer. then the worker process hangs there. > > Actually, if we didn't set the worker process into the cpu cgroup, this will > never happens. > You said you don't use CPU limits right? can you also send in the output of /proc/sched_debug. Can you also send in your /etc/cgconfig.conf? If the OOM is not caused by cgroup memory limit and the global system is under pressure in 2.6.32, it can trigger an OOM. Also 1. Have you turned off swapping (seems like it) right? 2. Do you have a NUMA policy setup for this task? Can you also share the .config (not sure if any special patches are being used) in the version you've mentioned. Balbir -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: process hangs on do_exit when oom happens
I don't know whether the process will exit finally, bug this stack lasts for hours, which is obviously unnormal. The situation: we use a command calld "cglimit" to fork-and-exec the worker process,and the "cglimit" will set some limitation on the worker with cgroup. for now,we limit the memory,and we also use cpu cgroup,but with no limiation,so when the worker is running, the cgroup directory looks like following: /cgroup/memory/worker : this directory limit the memory /cgroup/cpu/worker :with no limit,but worker process is in. for some reason(some other process we didn't consider), the worker process invoke global oom-killer, not cgroup-oom-killer. then the worker process hangs there. Actually, if we didn't set the worker process into the cpu cgroup, this will never happens. On Sat, Oct 20, 2012 at 12:04 AM, Michal Hocko wrote: > > On Wed 17-10-12 18:23:34, gaoqiang wrote: > > I looked up nothing useful with google,so I'm here for help.. > > > > when this happens: I use memcg to limit the memory use of a > > process,and when the memcg cgroup was out of memory, > > the process was oom-killed however,it cannot really complete the > > exiting. here is the some information > > How many tasks are in the group and what kind of memory do they use? > Is it possible that you were hit by the same issue as described in > 79dfdacc memcg: make oom_lock 0 and 1 based rather than counter. > > > OS version: centos6.22.6.32.220.7.1 > > Your kernel is quite old and you should be probably asking your > distribution to help you out. There were many fixes since 2.6.32. > Are you able to reproduce the same issue with the current vanila kernel? > > > /proc/pid/stack > > --- > > > > [] __cond_resched+0x2a/0x40 > > [] unmap_vmas+0xb49/0xb70 > > [] exit_mmap+0x7e/0x140 > > [] mmput+0x58/0x110 > > [] exit_mm+0x11d/0x160 > > [] do_exit+0x1ad/0x860 > > [] do_group_exit+0x41/0xb0 > > [] get_signal_to_deliver+0x1e8/0x430 > > [] do_notify_resume+0xf4/0x8b0 > > [] int_signal+0x12/0x17 > > [] 0x > > This looks strange because this is just an exit part which shouldn't > deadlock or anything. Is this stack stable? Have you tried to take check > it more times? > > -- > Michal Hocko > SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: process hangs on do_exit when oom happens
On Wed 17-10-12 18:23:34, gaoqiang wrote: > I looked up nothing useful with google,so I'm here for help.. > > when this happens: I use memcg to limit the memory use of a > process,and when the memcg cgroup was out of memory, > the process was oom-killed however,it cannot really complete the > exiting. here is the some information How many tasks are in the group and what kind of memory do they use? Is it possible that you were hit by the same issue as described in 79dfdacc memcg: make oom_lock 0 and 1 based rather than counter. > OS version: centos6.22.6.32.220.7.1 Your kernel is quite old and you should be probably asking your distribution to help you out. There were many fixes since 2.6.32. Are you able to reproduce the same issue with the current vanila kernel? > /proc/pid/stack > --- > > [] __cond_resched+0x2a/0x40 > [] unmap_vmas+0xb49/0xb70 > [] exit_mmap+0x7e/0x140 > [] mmput+0x58/0x110 > [] exit_mm+0x11d/0x160 > [] do_exit+0x1ad/0x860 > [] do_group_exit+0x41/0xb0 > [] get_signal_to_deliver+0x1e8/0x430 > [] do_notify_resume+0xf4/0x8b0 > [] int_signal+0x12/0x17 > [] 0x This looks strange because this is just an exit part which shouldn't deadlock or anything. Is this stack stable? Have you tried to take check it more times? -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
process hangs on do_exit when oom happens
I looked up nothing useful with google,so I'm here for help.. when this happens: I use memcg to limit the memory use of a process,and when the memcg cgroup was out of memory, the process was oom-killed however,it cannot really complete the exiting. here is the some information OS version: centos6.22.6.32.220.7.1 /proc/pid/stack --- [] __cond_resched+0x2a/0x40 [] unmap_vmas+0xb49/0xb70 [] exit_mmap+0x7e/0x140 [] mmput+0x58/0x110 [] exit_mm+0x11d/0x160 [] do_exit+0x1ad/0x860 [] do_group_exit+0x41/0xb0 [] get_signal_to_deliver+0x1e8/0x430 [] do_notify_resume+0xf4/0x8b0 [] int_signal+0x12/0x17 [] 0x /proc/pid/stat --- 11337 (CF_user_based) R 1 11314 11314 0 -1 4203524 7753602 0 0 0 622 1806 0 0 -2 0 1 0 324381340 0 0 18446744073709551615 0 0 0 0 0 0 0 0 66784 0 0 0 17 3 1 1 0 0 0 /proc/pid/status Name: CF_user_based State: R (running) Tgid: 11337 Pid:11337 PPid: 1 TracerPid: 0 Uid:32114 32114 32114 32114 Gid:32114 32114 32114 32114 Utrace: 0 FDSize: 128 Groups: 32114 Threads:1 SigQ: 2/2325005 SigPnd: ShdPnd: 4100 SigBlk: SigIgn: SigCgt: 0001800104e0 CapInh: CapPrm: CapEff: CapBnd: Cpus_allowed: Cpus_allowed_list: 0-31 Mems_allowed: ,0003 Mems_allowed_list: 0-1 voluntary_ctxt_switches:4300 nonvoluntary_ctxt_switches: 77 /var/log/messages --- Oct 17 15:22:19 hpc16 kernel: CF_user_based invoked oom-killer: gfp_mask=0x280da, order=0, oom_adj=0, oom_score_adj=0 Oct 17 15:22:19 hpc16 kernel: CF_user_based cpuset=/ mems_allowed=0-1 Oct 17 15:22:19 hpc16 kernel: Pid: 3909, comm: CF_user_based Not tainted 2.6.32-2.0.0.1 #4 Oct 17 15:22:19 hpc16 kernel: Call Trace: Oct 17 15:22:19 hpc16 kernel: [] ? dump_header+0x85/0x1a0 Oct 17 15:22:19 hpc16 kernel: [] ? oom_kill_process+0x25e/0x2a0 Oct 17 15:22:19 hpc16 kernel: [] ? select_bad_process+0xce/0x110 Oct 17 15:22:19 hpc16 kernel: [] ? out_of_memory+0x1a8/0x390 Oct 17 15:22:19 hpc16 kernel: [] ? __alloc_pages_nodemask+0x73a/0x750 Oct 17 15:22:19 hpc16 kernel: [] ? __mem_cgroup_commit_charge+0x45/0x90 Oct 17 15:22:19 hpc16 kernel: [] ? alloc_pages_vma+0x9a/0x190 Oct 17 15:22:19 hpc16 kernel: [] ? handle_pte_fault+0x4cc/0xa90 Oct 17 15:22:19 hpc16 kernel: [] ? alloc_pages_current+0xab/0x110 Oct 17 15:22:19 hpc16 kernel: [] ? invalidate_interrupt5+0xe/0x20 Oct 17 15:22:19 hpc16 kernel: [] ? handle_mm_fault+0x12a/0x1b0 Oct 17 15:22:19 hpc16 kernel: [] ? do_page_fault+0x199/0x550 Oct 17 15:22:19 hpc16 kernel: [] ? call_rwsem_wake+0x18/0x30 Oct 17 15:22:19 hpc16 kernel: [] ? invalidate_interrupt5+0xe/0x20 Oct 17 15:22:19 hpc16 kernel: [] ? page_fault+0x25/0x30 Oct 17 15:22:19 hpc16 kernel: Mem-Info: Oct 17 15:22:19 hpc16 kernel: Node 0 Normal per-cpu: Oct 17 15:22:19 hpc16 kernel: CPU0: hi: 186, btch: 31 usd: 0 Oct 17 15:22:19 hpc16 kernel: CPU1: hi: 186, btch: 31 usd: 0 Oct 17 15:22:19 hpc16 kernel: CPU2: hi: 186, btch: 31 usd: 0 Oct 17 15:22:19 hpc16 kernel: CPU3: hi: 186, btch: 31 usd: 0 Oct 17 15:22:19 hpc16 kernel: CPU4: hi: 186, btch: 31 usd: 0 Oct 17 15:22:19 hpc16 kernel: CPU5: hi: 186, btch: 31 usd: 0 Oct 17 15:22:19 hpc16 kernel: CPU6: hi: 186, btch: 31 usd: 0 Oct 17 15:22:19 hpc16 kernel: CPU7: hi: 186, btch: 31 usd: 0 Oct 17 15:22:19 hpc16 kernel: CPU8: hi: 186, btch: 31 usd: 0 Oct 17 15:22:19 hpc16 kernel: CPU9: hi: 186, btch: 31 usd: 0 Oct 17 15:22:19 hpc16 kernel: CPU 10: hi: 186, btch: 31 usd: 0 Oct 17 15:22:19 hpc16 kernel: CPU 11: hi: 186, btch: 31 usd: 0 Oct 17 15:22:19 hpc16 kernel: CPU 12: hi: 186, btch: 31 usd: 0 Oct 17 15:22:19 hpc16 kernel: CPU 13: hi: 186, btch: 31 usd: 0 Oct 17 15:22:19 hpc16 kernel: CPU 14: hi: 186, btch: 31 usd: 0 Oct 17 15:22:19 hpc16 kernel: CPU 15: hi: 186, btch: 31 usd: 0 Oct 17 15:22:19 hpc16 kernel: CPU 16: hi: 186, btch: 31 usd: 0 Oct 17 15:22:19 hpc16 kernel: CPU 17: hi: 186, btch: 31 usd: 0 Oct 17 15:22:19 hpc16 kernel: CPU 18: hi: 186, btch: 31 usd: 0 Oct 17 15:22:19 hpc16 kernel: CPU 19: hi: 186, btch: 31 usd: 0 Oct 17 15:22:19 hpc16 kernel: CPU 20: hi: 186, btch: 31 usd: 0 Oct 17 15:22:19 hpc16 kernel: CPU 21: hi: 186, btch: 31 usd: 0 Oct 17 15:22:19 hpc16 kernel: CPU 22: hi: 186, btch: 31 usd: 0 Oct 17 15:22:19 hpc16 kernel: CPU 23: hi: 186, btch: 31 usd: 18 Oct 17 15:22:19 hpc16 kernel: Node 1 DMA per-cpu: Oct 17 15:22:19 hpc16 kernel: CPU0: hi:0, btch: 1 usd: 0 Oct 17 15:22:19 hpc16 kernel: CPU1: hi:0, btch: 1 usd: 0 Oct 17 15:22:19 hpc16 kernel: CPU2: hi