Re: process hangs on do_exit when oom happens

2012-10-26 Thread Mike Galbraith
On Fri, 2012-10-26 at 10:03 -0700, Mike Galbraith wrote:

> The bug is in the patch that used sched_setscheduler_nocheck().  Plain
> sched_setscheduler() would have replied -EGOAWAY.

sched_setscheduler_nocheck() should say go away too methinks.  This
isn't about permissions, it's about not being stupid in general.

sched: fix __sched_setscheduler() RT_GROUP_SCHED conditionals

Remove user and rt_bandwidth_enabled() RT_GROUP_SCHED conditionals in
__sched_setscheduler().  The end result of kernel OR user promoting a
task in a group with zero rt_runtime allocated is the same bad thing,
and throttle switch position matters little.  It's safer to just say
no solely based upon bandwidth existence, may save the user a nasty
surprise if he later flips the throttle switch to 'on'.

The commit below came about due to sched_setscheduler_nocheck()
allowing a task in a task group with zero rt_runtime allocated to
be promoted by the kernel oom logic, thus marooning it forever.


commit 341aea2bc48bf652777fb015cc2b3dfa9a451817
Author: KOSAKI Motohiro 
Date:   Thu Apr 14 15:22:13 2011 -0700

oom-kill: remove boost_dying_task_prio()

This is an almost-revert of commit 93b43fa ("oom: give the dying task a
higher priority").

That commit dramatically improved oom killer logic when a fork-bomb
occurs.  But I've found that it has nasty corner case.  Now cpu cgroup has
strange default RT runtime.  It's 0!  That said, if a process under cpu
cgroup promote RT scheduling class, the process never run at all.


Signed-off-by: Mike Galbraith 

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 2d8927f..d3a35f8 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3810,17 +3810,14 @@ recheck:
}
 
 #ifdef CONFIG_RT_GROUP_SCHED
-   if (user) {
-   /*
-* Do not allow realtime tasks into groups that have no runtime
-* assigned.
-*/
-   if (rt_bandwidth_enabled() && rt_policy(policy) &&
-   task_group(p)->rt_bandwidth.rt_runtime == 0 &&
-   !task_group_is_autogroup(task_group(p))) {
-   task_rq_unlock(rq, p, &flags);
-   return -EPERM;
-   }
+   /*
+* Do not allow realtime tasks into groups that have no runtime
+* assigned.
+*/
+   if (rt_policy(policy) && task_group(p)->rt_bandwidth.rt_runtime == 0 &&
+   !task_group_is_autogroup(task_group(p))) {
+   task_rq_unlock(rq, p, &flags);
+   return -EPERM;
}
 #endif
 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: process hangs on do_exit when oom happens

2012-10-26 Thread Mike Galbraith
On Fri, 2012-10-26 at 10:42 +0800, Qiang Gao wrote: 
> On Thu, Oct 25, 2012 at 5:57 PM, Michal Hocko  wrote:
> > On Wed 24-10-12 11:44:17, Qiang Gao wrote:
> >> On Wed, Oct 24, 2012 at 1:43 AM, Balbir Singh  
> >> wrote:
> >> > On Tue, Oct 23, 2012 at 3:45 PM, Michal Hocko  wrote:
> >> >> On Tue 23-10-12 18:10:33, Qiang Gao wrote:
> >> >>> On Tue, Oct 23, 2012 at 5:50 PM, Michal Hocko  wrote:
> >> >>> > On Tue 23-10-12 15:18:48, Qiang Gao wrote:
> >> >>> >> This process was moved to RT-priority queue when global oom-killer
> >> >>> >> happened to boost the recovery of the system..
> >> >>> >
> >> >>> > Who did that? oom killer doesn't boost the priority (scheduling 
> >> >>> > class)
> >> >>> > AFAIK.
> >> >>> >
> >> >>> >> but it wasn't get properily dealt with. I still have no idea why 
> >> >>> >> where
> >> >>> >> the problem is ..
> >> >>> >
> >> >>> > Well your configuration says that there is no runtime reserved for 
> >> >>> > the
> >> >>> > group.
> >> >>> > Please refer to Documentation/scheduler/sched-rt-group.txt for more
> >> >>> > information.
> >> >>> >
> >> >> [...]
> >> >>> maybe this is not a upstream-kernel bug. the centos/redhat kernel
> >> >>> would boost the process to RT prio when the process was selected
> >> >>> by oom-killer.
> >> >>
> >> >> This still looks like your cpu controller is misconfigured. Even if the
> >> >> task is promoted to be realtime.
> >> >
> >> >
> >> > Precisely! You need to have rt bandwidth enabled for RT tasks to run,
> >> > as a workaround please give the groups some RT bandwidth and then work
> >> > out the migration to RT and what should be the defaults on the distro.
> >> >
> >> > Balbir
> >>
> >>
> >> see https://patchwork.kernel.org/patch/719411/
> >
> > The patch surely "fixes" your problem but the primary fault here is the
> > mis-configured cpu cgroup. If the value for the bandwidth is zero by
> > default then all realtime processes in the group a screwed. The value
> > should be set to something more reasonable.
> > I am not familiar with the cpu controller but it seems that
> > alloc_rt_sched_group needs some treat. Care to look into it and send a
> > patch to the cpu controller and cgroup maintainers, please?
> >
> > --
> > Michal Hocko
> > SUSE Labs
> 
> I'm trying to fix the problem. but no substantive progress yet.

The throttle tracks a finite resource for an arbitrary number of groups,
so there's no sane rt_runtime default other than zero.

Most folks only want the top level throttle warm fuzzy, so a complete
runtime RT_GROUP_SCHED on/off switch with default to off, ie rt tasks
cannot be moved until switched on would fix some annoying "Oopsie, I
forgot" allocation troubles.  If you turn it on, shame on you if you
fail to allocate, you asked for it, you're not just stuck with it
because your distro enabled it in their config.

Or, perhaps just make zero rt_runtime always mean traverse up to first
non-zero rt_runtime, ie zero allocation children may consume parental
runtime as they see fit on first come first served basis, when it's
gone, tough, parent/children all wait for refill.

Or whatever, as long as you don't bust distribution/tracking for those
crazy people who intentionally use RT_GROUP_SCHED ;-)

The bug is in the patch that used sched_setscheduler_nocheck().  Plain
sched_setscheduler() would have replied -EGOAWAY.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: process hangs on do_exit when oom happens

2012-10-25 Thread Qiang Gao
On Thu, Oct 25, 2012 at 5:57 PM, Michal Hocko  wrote:
> On Wed 24-10-12 11:44:17, Qiang Gao wrote:
>> On Wed, Oct 24, 2012 at 1:43 AM, Balbir Singh  wrote:
>> > On Tue, Oct 23, 2012 at 3:45 PM, Michal Hocko  wrote:
>> >> On Tue 23-10-12 18:10:33, Qiang Gao wrote:
>> >>> On Tue, Oct 23, 2012 at 5:50 PM, Michal Hocko  wrote:
>> >>> > On Tue 23-10-12 15:18:48, Qiang Gao wrote:
>> >>> >> This process was moved to RT-priority queue when global oom-killer
>> >>> >> happened to boost the recovery of the system..
>> >>> >
>> >>> > Who did that? oom killer doesn't boost the priority (scheduling class)
>> >>> > AFAIK.
>> >>> >
>> >>> >> but it wasn't get properily dealt with. I still have no idea why where
>> >>> >> the problem is ..
>> >>> >
>> >>> > Well your configuration says that there is no runtime reserved for the
>> >>> > group.
>> >>> > Please refer to Documentation/scheduler/sched-rt-group.txt for more
>> >>> > information.
>> >>> >
>> >> [...]
>> >>> maybe this is not a upstream-kernel bug. the centos/redhat kernel
>> >>> would boost the process to RT prio when the process was selected
>> >>> by oom-killer.
>> >>
>> >> This still looks like your cpu controller is misconfigured. Even if the
>> >> task is promoted to be realtime.
>> >
>> >
>> > Precisely! You need to have rt bandwidth enabled for RT tasks to run,
>> > as a workaround please give the groups some RT bandwidth and then work
>> > out the migration to RT and what should be the defaults on the distro.
>> >
>> > Balbir
>>
>>
>> see https://patchwork.kernel.org/patch/719411/
>
> The patch surely "fixes" your problem but the primary fault here is the
> mis-configured cpu cgroup. If the value for the bandwidth is zero by
> default then all realtime processes in the group a screwed. The value
> should be set to something more reasonable.
> I am not familiar with the cpu controller but it seems that
> alloc_rt_sched_group needs some treat. Care to look into it and send a
> patch to the cpu controller and cgroup maintainers, please?
>
> --
> Michal Hocko
> SUSE Labs

I'm trying to fix the problem. but no substantive progress yet.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: process hangs on do_exit when oom happens

2012-10-25 Thread Michal Hocko
On Wed 24-10-12 11:44:17, Qiang Gao wrote:
> On Wed, Oct 24, 2012 at 1:43 AM, Balbir Singh  wrote:
> > On Tue, Oct 23, 2012 at 3:45 PM, Michal Hocko  wrote:
> >> On Tue 23-10-12 18:10:33, Qiang Gao wrote:
> >>> On Tue, Oct 23, 2012 at 5:50 PM, Michal Hocko  wrote:
> >>> > On Tue 23-10-12 15:18:48, Qiang Gao wrote:
> >>> >> This process was moved to RT-priority queue when global oom-killer
> >>> >> happened to boost the recovery of the system..
> >>> >
> >>> > Who did that? oom killer doesn't boost the priority (scheduling class)
> >>> > AFAIK.
> >>> >
> >>> >> but it wasn't get properily dealt with. I still have no idea why where
> >>> >> the problem is ..
> >>> >
> >>> > Well your configuration says that there is no runtime reserved for the
> >>> > group.
> >>> > Please refer to Documentation/scheduler/sched-rt-group.txt for more
> >>> > information.
> >>> >
> >> [...]
> >>> maybe this is not a upstream-kernel bug. the centos/redhat kernel
> >>> would boost the process to RT prio when the process was selected
> >>> by oom-killer.
> >>
> >> This still looks like your cpu controller is misconfigured. Even if the
> >> task is promoted to be realtime.
> >
> >
> > Precisely! You need to have rt bandwidth enabled for RT tasks to run,
> > as a workaround please give the groups some RT bandwidth and then work
> > out the migration to RT and what should be the defaults on the distro.
> >
> > Balbir
> 
> 
> see https://patchwork.kernel.org/patch/719411/

The patch surely "fixes" your problem but the primary fault here is the
mis-configured cpu cgroup. If the value for the bandwidth is zero by
default then all realtime processes in the group a screwed. The value
should be set to something more reasonable.
I am not familiar with the cpu controller but it seems that
alloc_rt_sched_group needs some treat. Care to look into it and send a
patch to the cpu controller and cgroup maintainers, please?

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: process hangs on do_exit when oom happens

2012-10-23 Thread Qiang Gao
On Wed, Oct 24, 2012 at 1:43 AM, Balbir Singh  wrote:
> On Tue, Oct 23, 2012 at 3:45 PM, Michal Hocko  wrote:
>> On Tue 23-10-12 18:10:33, Qiang Gao wrote:
>>> On Tue, Oct 23, 2012 at 5:50 PM, Michal Hocko  wrote:
>>> > On Tue 23-10-12 15:18:48, Qiang Gao wrote:
>>> >> This process was moved to RT-priority queue when global oom-killer
>>> >> happened to boost the recovery of the system..
>>> >
>>> > Who did that? oom killer doesn't boost the priority (scheduling class)
>>> > AFAIK.
>>> >
>>> >> but it wasn't get properily dealt with. I still have no idea why where
>>> >> the problem is ..
>>> >
>>> > Well your configuration says that there is no runtime reserved for the
>>> > group.
>>> > Please refer to Documentation/scheduler/sched-rt-group.txt for more
>>> > information.
>>> >
>> [...]
>>> maybe this is not a upstream-kernel bug. the centos/redhat kernel
>>> would boost the process to RT prio when the process was selected
>>> by oom-killer.
>>
>> This still looks like your cpu controller is misconfigured. Even if the
>> task is promoted to be realtime.
>
>
> Precisely! You need to have rt bandwidth enabled for RT tasks to run,
> as a workaround please give the groups some RT bandwidth and then work
> out the migration to RT and what should be the defaults on the distro.
>
> Balbir


see https://patchwork.kernel.org/patch/719411/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: process hangs on do_exit when oom happens

2012-10-23 Thread Balbir Singh
On Tue, Oct 23, 2012 at 3:45 PM, Michal Hocko  wrote:
> On Tue 23-10-12 18:10:33, Qiang Gao wrote:
>> On Tue, Oct 23, 2012 at 5:50 PM, Michal Hocko  wrote:
>> > On Tue 23-10-12 15:18:48, Qiang Gao wrote:
>> >> This process was moved to RT-priority queue when global oom-killer
>> >> happened to boost the recovery of the system..
>> >
>> > Who did that? oom killer doesn't boost the priority (scheduling class)
>> > AFAIK.
>> >
>> >> but it wasn't get properily dealt with. I still have no idea why where
>> >> the problem is ..
>> >
>> > Well your configuration says that there is no runtime reserved for the
>> > group.
>> > Please refer to Documentation/scheduler/sched-rt-group.txt for more
>> > information.
>> >
> [...]
>> maybe this is not a upstream-kernel bug. the centos/redhat kernel
>> would boost the process to RT prio when the process was selected
>> by oom-killer.
>
> This still looks like your cpu controller is misconfigured. Even if the
> task is promoted to be realtime.


Precisely! You need to have rt bandwidth enabled for RT tasks to run,
as a workaround please give the groups some RT bandwidth and then work
out the migration to RT and what should be the defaults on the distro.

Balbir
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: process hangs on do_exit when oom happens

2012-10-23 Thread Michal Hocko
On Tue 23-10-12 18:10:33, Qiang Gao wrote:
> On Tue, Oct 23, 2012 at 5:50 PM, Michal Hocko  wrote:
> > On Tue 23-10-12 15:18:48, Qiang Gao wrote:
> >> This process was moved to RT-priority queue when global oom-killer
> >> happened to boost the recovery of the system..
> >
> > Who did that? oom killer doesn't boost the priority (scheduling class)
> > AFAIK.
> >
> >> but it wasn't get properily dealt with. I still have no idea why where
> >> the problem is ..
> >
> > Well your configuration says that there is no runtime reserved for the
> > group.
> > Please refer to Documentation/scheduler/sched-rt-group.txt for more
> > information.
> >
[...]
> maybe this is not a upstream-kernel bug. the centos/redhat kernel
> would boost the process to RT prio when the process was selected
> by oom-killer.

This still looks like your cpu controller is misconfigured. Even if the
task is promoted to be realtime.
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: process hangs on do_exit when oom happens

2012-10-23 Thread Qiang Gao
On Tue, Oct 23, 2012 at 5:50 PM, Michal Hocko  wrote:
> On Tue 23-10-12 15:18:48, Qiang Gao wrote:
>> This process was moved to RT-priority queue when global oom-killer
>> happened to boost the recovery of the system..
>
> Who did that? oom killer doesn't boost the priority (scheduling class)
> AFAIK.
>
>> but it wasn't get properily dealt with. I still have no idea why where
>> the problem is ..
>
> Well your configuration says that there is no runtime reserved for the
> group.
> Please refer to Documentation/scheduler/sched-rt-group.txt for more
> information.
>
>> On Tue, Oct 23, 2012 at 12:40 PM, Balbir Singh  wrote:
>> > On Tue, Oct 23, 2012 at 9:05 AM, Qiang Gao  wrote:
>> >> information about the system is in the attach file "information.txt"
>> >>
>> >> I can not reproduce it in the upstream 3.6.0 kernel..
>> >>
>> >> On Sat, Oct 20, 2012 at 12:04 AM, Michal Hocko  wrote:
>> >>> On Wed 17-10-12 18:23:34, gaoqiang wrote:
>>  I looked up nothing useful with google,so I'm here for help..
>> 
>>  when this happens:  I use memcg to limit the memory use of a
>>  process,and when the memcg cgroup was out of memory,
>>  the process was oom-killed   however,it cannot really complete the
>>  exiting. here is the some information
>> >>>
>> >>> How many tasks are in the group and what kind of memory do they use?
>> >>> Is it possible that you were hit by the same issue as described in
>> >>> 79dfdacc memcg: make oom_lock 0 and 1 based rather than counter.
>> >>>
>>  OS version:  centos6.22.6.32.220.7.1
>> >>>
>> >>> Your kernel is quite old and you should be probably asking your
>> >>> distribution to help you out. There were many fixes since 2.6.32.
>> >>> Are you able to reproduce the same issue with the current vanila kernel?
>> >>>
>>  /proc/pid/stack
>>  ---
>> 
>>  [] __cond_resched+0x2a/0x40
>>  [] unmap_vmas+0xb49/0xb70
>>  [] exit_mmap+0x7e/0x140
>>  [] mmput+0x58/0x110
>>  [] exit_mm+0x11d/0x160
>>  [] do_exit+0x1ad/0x860
>>  [] do_group_exit+0x41/0xb0
>>  [] get_signal_to_deliver+0x1e8/0x430
>>  [] do_notify_resume+0xf4/0x8b0
>>  [] int_signal+0x12/0x17
>>  [] 0x
>> >>>
>> >>> This looks strange because this is just an exit part which shouldn't
>> >>> deadlock or anything. Is this stack stable? Have you tried to take check
>> >>> it more times?
>> >
>> > Looking at information.txt, I found something interesting
>> >
>> > rt_rq[0]:/1314
>> >   .rt_nr_running : 1
>> >   .rt_throttled  : 1
>> >   .rt_time   : 0.856656
>> >   .rt_runtime: 0.00
>> >
>> >
>> > cfs_rq[0]:/1314
>> >   .exec_clock: 8738.133429
>> >   .MIN_vruntime  : 0.01
>> >   .min_vruntime  : 8739.371271
>> >   .max_vruntime  : 0.01
>> >   .spread: 0.00
>> >   .spread0   : -9792.24
>> >   .nr_spread_over: 1
>> >   .nr_running: 0
>> >   .load  : 0
>> >   .load_avg  : 7376.722880
>> >   .load_period   : 7.203830
>> >   .load_contrib  : 1023
>> >   .load_tg   : 1023
>> >   .se->exec_start: 282004.715064
>> >   .se->vruntime  : 18435.664560
>> >   .se->sum_exec_runtime  : 8738.133429
>> >   .se->wait_start: 0.00
>> >   .se->sleep_start   : 0.00
>> >   .se->block_start   : 0.00
>> >   .se->sleep_max : 0.00
>> >   .se->block_max : 0.00
>> >   .se->exec_max  : 77.977054
>> >   .se->slice_max : 0.00
>> >   .se->wait_max  : 2.664779
>> >   .se->wait_sum  : 29.970575
>> >   .se->wait_count: 102
>> >   .se->load.weight   : 2
>> >
>> > So 1314 is a real time process and
>> >
>> > cpu.rt_period_us:
>> > 100
>> > --
>> > cpu.rt_runtime_us:
>> > 0
>> >
>> > When did tt move to being a Real Time process (hint: see nr_running
>> > and nr_throttled)?
>> >
>> > Balbir
>> --
>> To unsubscribe from this list: send the line "unsubscribe cgroups" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
> --
> Michal Hocko
> SUSE Labs
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


maybe this is not a upstream-kernel bug. the centos/redhat kernel
would boost the process to RT prio when the process was selected
by oom-killer.

I think I should report

Re: process hangs on do_exit when oom happens

2012-10-23 Thread Michal Hocko
On Tue 23-10-12 15:18:48, Qiang Gao wrote:
> This process was moved to RT-priority queue when global oom-killer
> happened to boost the recovery of the system..

Who did that? oom killer doesn't boost the priority (scheduling class)
AFAIK.

> but it wasn't get properily dealt with. I still have no idea why where
> the problem is ..

Well your configuration says that there is no runtime reserved for the
group.
Please refer to Documentation/scheduler/sched-rt-group.txt for more
information.

> On Tue, Oct 23, 2012 at 12:40 PM, Balbir Singh  wrote:
> > On Tue, Oct 23, 2012 at 9:05 AM, Qiang Gao  wrote:
> >> information about the system is in the attach file "information.txt"
> >>
> >> I can not reproduce it in the upstream 3.6.0 kernel..
> >>
> >> On Sat, Oct 20, 2012 at 12:04 AM, Michal Hocko  wrote:
> >>> On Wed 17-10-12 18:23:34, gaoqiang wrote:
>  I looked up nothing useful with google,so I'm here for help..
> 
>  when this happens:  I use memcg to limit the memory use of a
>  process,and when the memcg cgroup was out of memory,
>  the process was oom-killed   however,it cannot really complete the
>  exiting. here is the some information
> >>>
> >>> How many tasks are in the group and what kind of memory do they use?
> >>> Is it possible that you were hit by the same issue as described in
> >>> 79dfdacc memcg: make oom_lock 0 and 1 based rather than counter.
> >>>
>  OS version:  centos6.22.6.32.220.7.1
> >>>
> >>> Your kernel is quite old and you should be probably asking your
> >>> distribution to help you out. There were many fixes since 2.6.32.
> >>> Are you able to reproduce the same issue with the current vanila kernel?
> >>>
>  /proc/pid/stack
>  ---
> 
>  [] __cond_resched+0x2a/0x40
>  [] unmap_vmas+0xb49/0xb70
>  [] exit_mmap+0x7e/0x140
>  [] mmput+0x58/0x110
>  [] exit_mm+0x11d/0x160
>  [] do_exit+0x1ad/0x860
>  [] do_group_exit+0x41/0xb0
>  [] get_signal_to_deliver+0x1e8/0x430
>  [] do_notify_resume+0xf4/0x8b0
>  [] int_signal+0x12/0x17
>  [] 0x
> >>>
> >>> This looks strange because this is just an exit part which shouldn't
> >>> deadlock or anything. Is this stack stable? Have you tried to take check
> >>> it more times?
> >
> > Looking at information.txt, I found something interesting
> >
> > rt_rq[0]:/1314
> >   .rt_nr_running : 1
> >   .rt_throttled  : 1
> >   .rt_time   : 0.856656
> >   .rt_runtime: 0.00
> >
> >
> > cfs_rq[0]:/1314
> >   .exec_clock: 8738.133429
> >   .MIN_vruntime  : 0.01
> >   .min_vruntime  : 8739.371271
> >   .max_vruntime  : 0.01
> >   .spread: 0.00
> >   .spread0   : -9792.24
> >   .nr_spread_over: 1
> >   .nr_running: 0
> >   .load  : 0
> >   .load_avg  : 7376.722880
> >   .load_period   : 7.203830
> >   .load_contrib  : 1023
> >   .load_tg   : 1023
> >   .se->exec_start: 282004.715064
> >   .se->vruntime  : 18435.664560
> >   .se->sum_exec_runtime  : 8738.133429
> >   .se->wait_start: 0.00
> >   .se->sleep_start   : 0.00
> >   .se->block_start   : 0.00
> >   .se->sleep_max : 0.00
> >   .se->block_max : 0.00
> >   .se->exec_max  : 77.977054
> >   .se->slice_max : 0.00
> >   .se->wait_max  : 2.664779
> >   .se->wait_sum  : 29.970575
> >   .se->wait_count: 102
> >   .se->load.weight   : 2
> >
> > So 1314 is a real time process and
> >
> > cpu.rt_period_us:
> > 100
> > --
> > cpu.rt_runtime_us:
> > 0
> >
> > When did tt move to being a Real Time process (hint: see nr_running
> > and nr_throttled)?
> >
> > Balbir
> --
> To unsubscribe from this list: send the line "unsubscribe cgroups" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: process hangs on do_exit when oom happens

2012-10-23 Thread Michal Hocko
On Tue 23-10-12 17:08:40, Qiang Gao wrote:
> this is just an example to show how to reproduce. actually,the first time I 
> saw
> this situation was on a machine with 288G RAM with many tasks running and
> we limit 30G for each.  but finanlly, no one exceeds this limit the the system
> oom.

Yes but mentioning memory controller then might be misleading... It
seems that the only factor in your load is the cpu controller.

And please stop top-posting. It makes the discussion messy.
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: process hangs on do_exit when oom happens

2012-10-23 Thread Qiang Gao
global-oom is the right thing to do. but oom-killed-process hanging on
do_exit is not the normal behavior

On Tue, Oct 23, 2012 at 5:01 PM, Sha Zhengju  wrote:
> On 10/23/2012 11:35 AM, Qiang Gao wrote:
>>
>> information about the system is in the attach file "information.txt"
>>
>> I can not reproduce it in the upstream 3.6.0 kernel..
>>
>> On Sat, Oct 20, 2012 at 12:04 AM, Michal Hocko  wrote:
>>>
>>> On Wed 17-10-12 18:23:34, gaoqiang wrote:

 I looked up nothing useful with google,so I'm here for help..

 when this happens:  I use memcg to limit the memory use of a
 process,and when the memcg cgroup was out of memory,
 the process was oom-killed   however,it cannot really complete the
 exiting. here is the some information
>>>
>>> How many tasks are in the group and what kind of memory do they use?
>>> Is it possible that you were hit by the same issue as described in
>>> 79dfdacc memcg: make oom_lock 0 and 1 based rather than counter.
>>>
 OS version:  centos6.22.6.32.220.7.1
>>>
>>> Your kernel is quite old and you should be probably asking your
>>> distribution to help you out. There were many fixes since 2.6.32.
>>> Are you able to reproduce the same issue with the current vanila kernel?
>>>
 /proc/pid/stack
 ---

 [] __cond_resched+0x2a/0x40
 [] unmap_vmas+0xb49/0xb70
 [] exit_mmap+0x7e/0x140
 [] mmput+0x58/0x110
 [] exit_mm+0x11d/0x160
 [] do_exit+0x1ad/0x860
 [] do_group_exit+0x41/0xb0
 [] get_signal_to_deliver+0x1e8/0x430
 [] do_notify_resume+0xf4/0x8b0
 [] int_signal+0x12/0x17
 [] 0x
>>>
>>> This looks strange because this is just an exit part which shouldn't
>>> deadlock or anything. Is this stack stable? Have you tried to take check
>>> it more times?
>>>
>
> Does the machine only have about 700M memory? I also find something
> in the log file:
>
> Node 0 DMA free:2772kB min:72kB low:88kB high:108kB present:15312kB..
> lowmem_reserve[]: 0 674 674 674
> Node 0 DMA32 free:*3172kB* min:3284kB low:4104kB high:4924kB
> present:690712kB ..
> lowmem_reserve[]: 0 0 0 0
> 0 pages in swap cache
> Swap cache stats: add 0, delete 0, find 0/0
> Free swap  = 0kB
> Total swap = 0kB
> 179184 pages RAM  ==>  179184 * 4 / 1024 = *700M*
> 6773 pages reserved
>
>
> Note that the free memory of DMA32(3172KB) is lower than min watermark,
> which means the global is under pressure now. What's more the swap is off,
> so the global oom is normal behavior.
>
>
> Thanks,
> Sha
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: process hangs on do_exit when oom happens

2012-10-23 Thread Qiang Gao
this is just an example to show how to reproduce. actually,the first time I saw
this situation was on a machine with 288G RAM with many tasks running and
we limit 30G for each.  but finanlly, no one exceeds this limit the the system
oom.


On Tue, Oct 23, 2012 at 4:35 PM, Michal Hocko  wrote:
> On Tue 23-10-12 11:35:52, Qiang Gao wrote:
>> I'm sure this is a global-oom,not cgroup-oom. [the dmesg output in the end]
>
> Yes this is the global oom killer because:
>> cglimit -M 700M ./tt
>> then after global-oom,the process hangs..
>
>> 179184 pages RAM
>
> So you have ~700M of RAM so the memcg limit is basically pointless as it
> cannot be reached...
> --
> Michal Hocko
> SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: process hangs on do_exit when oom happens

2012-10-23 Thread Sha Zhengju

On 10/23/2012 11:35 AM, Qiang Gao wrote:

information about the system is in the attach file "information.txt"

I can not reproduce it in the upstream 3.6.0 kernel..

On Sat, Oct 20, 2012 at 12:04 AM, Michal Hocko  wrote:

On Wed 17-10-12 18:23:34, gaoqiang wrote:

I looked up nothing useful with google,so I'm here for help..

when this happens:  I use memcg to limit the memory use of a
process,and when the memcg cgroup was out of memory,
the process was oom-killed   however,it cannot really complete the
exiting. here is the some information

How many tasks are in the group and what kind of memory do they use?
Is it possible that you were hit by the same issue as described in
79dfdacc memcg: make oom_lock 0 and 1 based rather than counter.


OS version:  centos6.22.6.32.220.7.1

Your kernel is quite old and you should be probably asking your
distribution to help you out. There were many fixes since 2.6.32.
Are you able to reproduce the same issue with the current vanila kernel?


/proc/pid/stack
---

[] __cond_resched+0x2a/0x40
[] unmap_vmas+0xb49/0xb70
[] exit_mmap+0x7e/0x140
[] mmput+0x58/0x110
[] exit_mm+0x11d/0x160
[] do_exit+0x1ad/0x860
[] do_group_exit+0x41/0xb0
[] get_signal_to_deliver+0x1e8/0x430
[] do_notify_resume+0xf4/0x8b0
[] int_signal+0x12/0x17
[] 0x

This looks strange because this is just an exit part which shouldn't
deadlock or anything. Is this stack stable? Have you tried to take check
it more times?



Does the machine only have about 700M memory? I also find something
in the log file:

Node 0 DMA free:2772kB min:72kB low:88kB high:108kB present:15312kB..
lowmem_reserve[]: 0 674 674 674
Node 0 DMA32 free:*3172kB* min:3284kB low:4104kB high:4924kB present:690712kB ..
lowmem_reserve[]: 0 0 0 0
0 pages in swap cache
Swap cache stats: add 0, delete 0, find 0/0
Free swap  = 0kB
Total swap = 0kB
179184 pages RAM  ==>  179184 * 4 / 1024 = *700M*
6773 pages reserved


Note that the free memory of DMA32(3172KB) is lower than min watermark,
which means the global is under pressure now. What's more the swap is off,
so the global oom is normal behavior.


Thanks,
Sha
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: process hangs on do_exit when oom happens

2012-10-23 Thread Michal Hocko
On Tue 23-10-12 11:35:52, Qiang Gao wrote:
> I'm sure this is a global-oom,not cgroup-oom. [the dmesg output in the end]

Yes this is the global oom killer because:
> cglimit -M 700M ./tt 
> then after global-oom,the process hangs..

> 179184 pages RAM

So you have ~700M of RAM so the memcg limit is basically pointless as it
cannot be reached...
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: process hangs on do_exit when oom happens

2012-10-23 Thread Qiang Gao
This process was moved to RT-priority  queue when global oom-killer
happened to boost the recovery
of the system.. but it wasn't get properily dealt with. I still have
no idea why where the problem is ..
On Tue, Oct 23, 2012 at 12:40 PM, Balbir Singh  wrote:
> On Tue, Oct 23, 2012 at 9:05 AM, Qiang Gao  wrote:
>> information about the system is in the attach file "information.txt"
>>
>> I can not reproduce it in the upstream 3.6.0 kernel..
>>
>> On Sat, Oct 20, 2012 at 12:04 AM, Michal Hocko  wrote:
>>> On Wed 17-10-12 18:23:34, gaoqiang wrote:
 I looked up nothing useful with google,so I'm here for help..

 when this happens:  I use memcg to limit the memory use of a
 process,and when the memcg cgroup was out of memory,
 the process was oom-killed   however,it cannot really complete the
 exiting. here is the some information
>>>
>>> How many tasks are in the group and what kind of memory do they use?
>>> Is it possible that you were hit by the same issue as described in
>>> 79dfdacc memcg: make oom_lock 0 and 1 based rather than counter.
>>>
 OS version:  centos6.22.6.32.220.7.1
>>>
>>> Your kernel is quite old and you should be probably asking your
>>> distribution to help you out. There were many fixes since 2.6.32.
>>> Are you able to reproduce the same issue with the current vanila kernel?
>>>
 /proc/pid/stack
 ---

 [] __cond_resched+0x2a/0x40
 [] unmap_vmas+0xb49/0xb70
 [] exit_mmap+0x7e/0x140
 [] mmput+0x58/0x110
 [] exit_mm+0x11d/0x160
 [] do_exit+0x1ad/0x860
 [] do_group_exit+0x41/0xb0
 [] get_signal_to_deliver+0x1e8/0x430
 [] do_notify_resume+0xf4/0x8b0
 [] int_signal+0x12/0x17
 [] 0x
>>>
>>> This looks strange because this is just an exit part which shouldn't
>>> deadlock or anything. Is this stack stable? Have you tried to take check
>>> it more times?
>
> Looking at information.txt, I found something interesting
>
> rt_rq[0]:/1314
>   .rt_nr_running : 1
>   .rt_throttled  : 1
>   .rt_time   : 0.856656
>   .rt_runtime: 0.00
>
>
> cfs_rq[0]:/1314
>   .exec_clock: 8738.133429
>   .MIN_vruntime  : 0.01
>   .min_vruntime  : 8739.371271
>   .max_vruntime  : 0.01
>   .spread: 0.00
>   .spread0   : -9792.24
>   .nr_spread_over: 1
>   .nr_running: 0
>   .load  : 0
>   .load_avg  : 7376.722880
>   .load_period   : 7.203830
>   .load_contrib  : 1023
>   .load_tg   : 1023
>   .se->exec_start: 282004.715064
>   .se->vruntime  : 18435.664560
>   .se->sum_exec_runtime  : 8738.133429
>   .se->wait_start: 0.00
>   .se->sleep_start   : 0.00
>   .se->block_start   : 0.00
>   .se->sleep_max : 0.00
>   .se->block_max : 0.00
>   .se->exec_max  : 77.977054
>   .se->slice_max : 0.00
>   .se->wait_max  : 2.664779
>   .se->wait_sum  : 29.970575
>   .se->wait_count: 102
>   .se->load.weight   : 2
>
> So 1314 is a real time process and
>
> cpu.rt_period_us:
> 100
> --
> cpu.rt_runtime_us:
> 0
>
> When did tt move to being a Real Time process (hint: see nr_running
> and nr_throttled)?
>
> Balbir
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: process hangs on do_exit when oom happens

2012-10-22 Thread Balbir Singh
On Tue, Oct 23, 2012 at 9:05 AM, Qiang Gao  wrote:
> information about the system is in the attach file "information.txt"
>
> I can not reproduce it in the upstream 3.6.0 kernel..
>
> On Sat, Oct 20, 2012 at 12:04 AM, Michal Hocko  wrote:
>> On Wed 17-10-12 18:23:34, gaoqiang wrote:
>>> I looked up nothing useful with google,so I'm here for help..
>>>
>>> when this happens:  I use memcg to limit the memory use of a
>>> process,and when the memcg cgroup was out of memory,
>>> the process was oom-killed   however,it cannot really complete the
>>> exiting. here is the some information
>>
>> How many tasks are in the group and what kind of memory do they use?
>> Is it possible that you were hit by the same issue as described in
>> 79dfdacc memcg: make oom_lock 0 and 1 based rather than counter.
>>
>>> OS version:  centos6.22.6.32.220.7.1
>>
>> Your kernel is quite old and you should be probably asking your
>> distribution to help you out. There were many fixes since 2.6.32.
>> Are you able to reproduce the same issue with the current vanila kernel?
>>
>>> /proc/pid/stack
>>> ---
>>>
>>> [] __cond_resched+0x2a/0x40
>>> [] unmap_vmas+0xb49/0xb70
>>> [] exit_mmap+0x7e/0x140
>>> [] mmput+0x58/0x110
>>> [] exit_mm+0x11d/0x160
>>> [] do_exit+0x1ad/0x860
>>> [] do_group_exit+0x41/0xb0
>>> [] get_signal_to_deliver+0x1e8/0x430
>>> [] do_notify_resume+0xf4/0x8b0
>>> [] int_signal+0x12/0x17
>>> [] 0x
>>
>> This looks strange because this is just an exit part which shouldn't
>> deadlock or anything. Is this stack stable? Have you tried to take check
>> it more times?

Looking at information.txt, I found something interesting

rt_rq[0]:/1314
  .rt_nr_running : 1
  .rt_throttled  : 1
  .rt_time   : 0.856656
  .rt_runtime: 0.00


cfs_rq[0]:/1314
  .exec_clock: 8738.133429
  .MIN_vruntime  : 0.01
  .min_vruntime  : 8739.371271
  .max_vruntime  : 0.01
  .spread: 0.00
  .spread0   : -9792.24
  .nr_spread_over: 1
  .nr_running: 0
  .load  : 0
  .load_avg  : 7376.722880
  .load_period   : 7.203830
  .load_contrib  : 1023
  .load_tg   : 1023
  .se->exec_start: 282004.715064
  .se->vruntime  : 18435.664560
  .se->sum_exec_runtime  : 8738.133429
  .se->wait_start: 0.00
  .se->sleep_start   : 0.00
  .se->block_start   : 0.00
  .se->sleep_max : 0.00
  .se->block_max : 0.00
  .se->exec_max  : 77.977054
  .se->slice_max : 0.00
  .se->wait_max  : 2.664779
  .se->wait_sum  : 29.970575
  .se->wait_count: 102
  .se->load.weight   : 2

So 1314 is a real time process and

cpu.rt_period_us:
100
--
cpu.rt_runtime_us:
0

When did tt move to being a Real Time process (hint: see nr_running
and nr_throttled)?

Balbir
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: process hangs on do_exit when oom happens

2012-10-22 Thread Michal Hocko
On Mon 22-10-12 10:16:43, Qiang Gao wrote:
> I don't know whether  the process will exit finally, bug this stack lasts
> for hours, which is obviously unnormal.
> The situation:  we use a command calld "cglimit" to fork-and-exec the
> worker process,and the "cglimit" will
> set some limitation on the worker with cgroup. for now,we limit the
> memory,and we also use cpu cgroup,but with
> no limiation,so when the worker is running, the cgroup directory looks like
> following:
> 
> /cgroup/memory/worker : this directory limit the memory
> /cgroup/cpu/worker :with no limit,but worker process is in.
> 
> for some reason(some other process we didn't consider),  the worker process
> invoke global oom-killer,

Are you sure that this is really global oom? What was the limit for the
group?

> not cgroup-oom-killer.  then the worker process hangs there.
> 
> Actually, if we didn't set the worker process into the cpu cgroup, this
> will never happens.

Strange and it smells like a misconfiguration. Could you provide the
compllete setting for both controllers?
grep . -r /cgroup/

> On Sat, Oct 20, 2012 at 12:04 AM, Michal Hocko  wrote:
> 
> > On Wed 17-10-12 18:23:34, gaoqiang wrote:
> > > I looked up nothing useful with google,so I'm here for help..
> > >
> > > when this happens:  I use memcg to limit the memory use of a
> > > process,and when the memcg cgroup was out of memory,
> > > the process was oom-killed   however,it cannot really complete the
> > > exiting. here is the some information
> >
> > How many tasks are in the group and what kind of memory do they use?
> > Is it possible that you were hit by the same issue as described in
> > 79dfdacc memcg: make oom_lock 0 and 1 based rather than counter.
> >
> > > OS version:  centos6.22.6.32.220.7.1
> >
> > Your kernel is quite old and you should be probably asking your
> > distribution to help you out. There were many fixes since 2.6.32.
> > Are you able to reproduce the same issue with the current vanila kernel?
> >
> > > /proc/pid/stack
> > > ---
> > >
> > > [] __cond_resched+0x2a/0x40
> > > [] unmap_vmas+0xb49/0xb70
> > > [] exit_mmap+0x7e/0x140
> > > [] mmput+0x58/0x110
> > > [] exit_mm+0x11d/0x160
> > > [] do_exit+0x1ad/0x860
> > > [] do_group_exit+0x41/0xb0
> > > [] get_signal_to_deliver+0x1e8/0x430
> > > [] do_notify_resume+0xf4/0x8b0
> > > [] int_signal+0x12/0x17
> > > [] 0x
> >
> > This looks strange because this is just an exit part which shouldn't
> > deadlock or anything. Is this stack stable? Have you tried to take check
> > it more times?
> >
> > --
> > Michal Hocko
> > SUSE Labs
> >

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: process hangs on do_exit when oom happens

2012-10-21 Thread Balbir Singh
On Mon, Oct 22, 2012 at 7:46 AM, Qiang Gao  wrote:
> I don't know whether  the process will exit finally, bug this stack lasts
> for hours, which is obviously unnormal.
> The situation:  we use a command calld "cglimit" to fork-and-exec the worker
> process,and the "cglimit" will
> set some limitation on the worker with cgroup. for now,we limit the
> memory,and we also use cpu cgroup,but with
> no limiation,so when the worker is running, the cgroup directory looks like
> following:
>
> /cgroup/memory/worker : this directory limit the memory
> /cgroup/cpu/worker :with no limit,but worker process is in.
>
> for some reason(some other process we didn't consider),  the worker process
> invoke global oom-killer,
> not cgroup-oom-killer.  then the worker process hangs there.
>
> Actually, if we didn't set the worker process into the cpu cgroup, this will
> never happens.
>

You said you don't use CPU limits right? can you also send in the
output of /proc/sched_debug. Can you also send in your
/etc/cgconfig.conf? If the OOM is not caused by cgroup memory limit
and the global system is under pressure in 2.6.32, it can trigger an
OOM.

Also

1. Have you turned off swapping (seems like it) right?
2. Do you have a NUMA policy setup for this task?

Can you also share the .config (not sure if any special patches are
being used) in the version you've mentioned.

Balbir
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: process hangs on do_exit when oom happens

2012-10-21 Thread Qiang Gao
I don't know whether  the process will exit finally, bug this stack
lasts for hours, which is obviously unnormal.
The situation:  we use a command calld "cglimit" to fork-and-exec the
worker process,and the "cglimit" will
set some limitation on the worker with cgroup. for now,we limit the
memory,and we also use cpu cgroup,but with
no limiation,so when the worker is running, the cgroup directory looks
like following:

/cgroup/memory/worker : this directory limit the memory
/cgroup/cpu/worker :with no limit,but worker process is in.

for some reason(some other process we didn't consider),  the worker
process invoke global oom-killer,
not cgroup-oom-killer.  then the worker process hangs there.

Actually, if we didn't set the worker process into the cpu cgroup,
this will never happens.

On Sat, Oct 20, 2012 at 12:04 AM, Michal Hocko  wrote:
>
> On Wed 17-10-12 18:23:34, gaoqiang wrote:
> > I looked up nothing useful with google,so I'm here for help..
> >
> > when this happens:  I use memcg to limit the memory use of a
> > process,and when the memcg cgroup was out of memory,
> > the process was oom-killed   however,it cannot really complete the
> > exiting. here is the some information
>
> How many tasks are in the group and what kind of memory do they use?
> Is it possible that you were hit by the same issue as described in
> 79dfdacc memcg: make oom_lock 0 and 1 based rather than counter.
>
> > OS version:  centos6.22.6.32.220.7.1
>
> Your kernel is quite old and you should be probably asking your
> distribution to help you out. There were many fixes since 2.6.32.
> Are you able to reproduce the same issue with the current vanila kernel?
>
> > /proc/pid/stack
> > ---
> >
> > [] __cond_resched+0x2a/0x40
> > [] unmap_vmas+0xb49/0xb70
> > [] exit_mmap+0x7e/0x140
> > [] mmput+0x58/0x110
> > [] exit_mm+0x11d/0x160
> > [] do_exit+0x1ad/0x860
> > [] do_group_exit+0x41/0xb0
> > [] get_signal_to_deliver+0x1e8/0x430
> > [] do_notify_resume+0xf4/0x8b0
> > [] int_signal+0x12/0x17
> > [] 0x
>
> This looks strange because this is just an exit part which shouldn't
> deadlock or anything. Is this stack stable? Have you tried to take check
> it more times?
>
> --
> Michal Hocko
> SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: process hangs on do_exit when oom happens

2012-10-19 Thread Michal Hocko
On Wed 17-10-12 18:23:34, gaoqiang wrote:
> I looked up nothing useful with google,so I'm here for help..
> 
> when this happens:  I use memcg to limit the memory use of a
> process,and when the memcg cgroup was out of memory,
> the process was oom-killed   however,it cannot really complete the
> exiting. here is the some information

How many tasks are in the group and what kind of memory do they use?
Is it possible that you were hit by the same issue as described in 
79dfdacc memcg: make oom_lock 0 and 1 based rather than counter.

> OS version:  centos6.22.6.32.220.7.1

Your kernel is quite old and you should be probably asking your
distribution to help you out. There were many fixes since 2.6.32.
Are you able to reproduce the same issue with the current vanila kernel?

> /proc/pid/stack
> ---
> 
> [] __cond_resched+0x2a/0x40
> [] unmap_vmas+0xb49/0xb70
> [] exit_mmap+0x7e/0x140
> [] mmput+0x58/0x110
> [] exit_mm+0x11d/0x160
> [] do_exit+0x1ad/0x860
> [] do_group_exit+0x41/0xb0
> [] get_signal_to_deliver+0x1e8/0x430
> [] do_notify_resume+0xf4/0x8b0
> [] int_signal+0x12/0x17
> [] 0x

This looks strange because this is just an exit part which shouldn't
deadlock or anything. Is this stack stable? Have you tried to take check
it more times?

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


process hangs on do_exit when oom happens

2012-10-17 Thread gaoqiang

I looked up nothing useful with google,so I'm here for help..

when this happens:  I use memcg to limit the memory use of a process,and  
when the memcg cgroup was out of memory,
the process was oom-killed   however,it cannot really complete the  
exiting. here is the some information



OS version:  centos6.22.6.32.220.7.1

/proc/pid/stack
---

[] __cond_resched+0x2a/0x40
[] unmap_vmas+0xb49/0xb70
[] exit_mmap+0x7e/0x140
[] mmput+0x58/0x110
[] exit_mm+0x11d/0x160
[] do_exit+0x1ad/0x860
[] do_group_exit+0x41/0xb0
[] get_signal_to_deliver+0x1e8/0x430
[] do_notify_resume+0xf4/0x8b0
[] int_signal+0x12/0x17
[] 0x

/proc/pid/stat
---

11337 (CF_user_based) R 1 11314 11314 0 -1 4203524 7753602 0 0 0 622 1806  
0 0 -2 0 1 0 324381340 0 0 18446744073709551615 0 0 0 0 0 0 0 0 66784 0 0  
0 17 3 1 1 0 0 0


/proc/pid/status

Name:   CF_user_based
State:  R (running)
Tgid:   11337
Pid:11337
PPid:   1
TracerPid:  0
Uid:32114   32114   32114   32114
Gid:32114   32114   32114   32114
Utrace: 0
FDSize: 128
Groups: 32114
Threads:1
SigQ:   2/2325005
SigPnd: 
ShdPnd: 4100
SigBlk: 
SigIgn: 
SigCgt: 0001800104e0
CapInh: 
CapPrm: 
CapEff: 
CapBnd: 
Cpus_allowed:   
Cpus_allowed_list:  0-31
Mems_allowed:   ,0003
Mems_allowed_list:  0-1
voluntary_ctxt_switches:4300
nonvoluntary_ctxt_switches: 77

/var/log/messages
---

Oct 17 15:22:19 hpc16 kernel: CF_user_based invoked oom-killer:  
gfp_mask=0x280da, order=0, oom_adj=0, oom_score_adj=0

Oct 17 15:22:19 hpc16 kernel: CF_user_based cpuset=/ mems_allowed=0-1
Oct 17 15:22:19 hpc16 kernel: Pid: 3909, comm: CF_user_based Not tainted  
2.6.32-2.0.0.1 #4

Oct 17 15:22:19 hpc16 kernel: Call Trace:
Oct 17 15:22:19 hpc16 kernel: [] ? dump_header+0x85/0x1a0
Oct 17 15:22:19 hpc16 kernel: [] ?  
oom_kill_process+0x25e/0x2a0
Oct 17 15:22:19 hpc16 kernel: [] ?  
select_bad_process+0xce/0x110
Oct 17 15:22:19 hpc16 kernel: [] ?  
out_of_memory+0x1a8/0x390
Oct 17 15:22:19 hpc16 kernel: [] ?  
__alloc_pages_nodemask+0x73a/0x750
Oct 17 15:22:19 hpc16 kernel: [] ?  
__mem_cgroup_commit_charge+0x45/0x90
Oct 17 15:22:19 hpc16 kernel: [] ?  
alloc_pages_vma+0x9a/0x190
Oct 17 15:22:19 hpc16 kernel: [] ?  
handle_pte_fault+0x4cc/0xa90
Oct 17 15:22:19 hpc16 kernel: [] ?  
alloc_pages_current+0xab/0x110
Oct 17 15:22:19 hpc16 kernel: [] ?  
invalidate_interrupt5+0xe/0x20
Oct 17 15:22:19 hpc16 kernel: [] ?  
handle_mm_fault+0x12a/0x1b0
Oct 17 15:22:19 hpc16 kernel: [] ?  
do_page_fault+0x199/0x550
Oct 17 15:22:19 hpc16 kernel: [] ?  
call_rwsem_wake+0x18/0x30
Oct 17 15:22:19 hpc16 kernel: [] ?  
invalidate_interrupt5+0xe/0x20

Oct 17 15:22:19 hpc16 kernel: [] ? page_fault+0x25/0x30
Oct 17 15:22:19 hpc16 kernel: Mem-Info:
Oct 17 15:22:19 hpc16 kernel: Node 0 Normal per-cpu:
Oct 17 15:22:19 hpc16 kernel: CPU0: hi:  186, btch:  31 usd:   0
Oct 17 15:22:19 hpc16 kernel: CPU1: hi:  186, btch:  31 usd:   0
Oct 17 15:22:19 hpc16 kernel: CPU2: hi:  186, btch:  31 usd:   0
Oct 17 15:22:19 hpc16 kernel: CPU3: hi:  186, btch:  31 usd:   0
Oct 17 15:22:19 hpc16 kernel: CPU4: hi:  186, btch:  31 usd:   0
Oct 17 15:22:19 hpc16 kernel: CPU5: hi:  186, btch:  31 usd:   0
Oct 17 15:22:19 hpc16 kernel: CPU6: hi:  186, btch:  31 usd:   0
Oct 17 15:22:19 hpc16 kernel: CPU7: hi:  186, btch:  31 usd:   0
Oct 17 15:22:19 hpc16 kernel: CPU8: hi:  186, btch:  31 usd:   0
Oct 17 15:22:19 hpc16 kernel: CPU9: hi:  186, btch:  31 usd:   0
Oct 17 15:22:19 hpc16 kernel: CPU   10: hi:  186, btch:  31 usd:   0
Oct 17 15:22:19 hpc16 kernel: CPU   11: hi:  186, btch:  31 usd:   0
Oct 17 15:22:19 hpc16 kernel: CPU   12: hi:  186, btch:  31 usd:   0
Oct 17 15:22:19 hpc16 kernel: CPU   13: hi:  186, btch:  31 usd:   0
Oct 17 15:22:19 hpc16 kernel: CPU   14: hi:  186, btch:  31 usd:   0
Oct 17 15:22:19 hpc16 kernel: CPU   15: hi:  186, btch:  31 usd:   0
Oct 17 15:22:19 hpc16 kernel: CPU   16: hi:  186, btch:  31 usd:   0
Oct 17 15:22:19 hpc16 kernel: CPU   17: hi:  186, btch:  31 usd:   0
Oct 17 15:22:19 hpc16 kernel: CPU   18: hi:  186, btch:  31 usd:   0
Oct 17 15:22:19 hpc16 kernel: CPU   19: hi:  186, btch:  31 usd:   0
Oct 17 15:22:19 hpc16 kernel: CPU   20: hi:  186, btch:  31 usd:   0
Oct 17 15:22:19 hpc16 kernel: CPU   21: hi:  186, btch:  31 usd:   0
Oct 17 15:22:19 hpc16 kernel: CPU   22: hi:  186, btch:  31 usd:   0
Oct 17 15:22:19 hpc16 kernel: CPU   23: hi:  186, btch:  31 usd:  18
Oct 17 15:22:19 hpc16 kernel: Node 1 DMA per-cpu:
Oct 17 15:22:19 hpc16 kernel: CPU0: hi:0, btch:   1 usd:   0
Oct 17 15:22:19 hpc16 kernel: CPU1: hi:0, btch:   1 usd:   0
Oct 17 15:22:19 hpc16 kernel: CPU2: hi