[SOLVED] Re: oom with 2.6.11

2005-03-20 Thread Christian Kujau
for the record: the "oom bug" turned out to be user generated. a *lot* of
small scripts were started, triggering oom again and again, user error.

the source of the problem is still pppd and the discussion continues as a
debian bugreport:

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=299875

thank you for all your help,
Christian.
-- 
BOFH excuse #139:

UBNC (user brain not connected)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[SOLVED] Re: oom with 2.6.11

2005-03-20 Thread Christian Kujau
for the record: the oom bug turned out to be user generated. a *lot* of
small scripts were started, triggering oom again and again, user error.

the source of the problem is still pppd and the discussion continues as a
debian bugreport:

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=299875

thank you for all your help,
Christian.
-- 
BOFH excuse #139:

UBNC (user brain not connected)
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: oom with 2.6.11

2005-03-17 Thread Christian Kujau
Coywolf Qi Hunt wrote:
> I do "grep check-route.sh oom_2.6.11.3.txt | wc" and it shows 4365

duh, good catch! really!

> lines, which means there're 4365 that script processes running, from 
> pid 4260 to12747, mostly with pretty low points, 123.
> Based on this points, suppose each script consumes 100k, that'll be
> 100k*4k=400M roughly. And your box's is merely 256M MemTotal.

yes, i just checked, the script is looping and crond is starting a new
one, and anotherand the oom-killer does not catch it, because it's too
small and of course don't know where it is coming from (crond).

> Check this script and disable it; see what will happen.

yes, will do that. on a (not so unimportant) side-note: i was told the
whole thing should be fixed with 2.6.11.4:

  [PATCH] CAN-2005-0384: Remote Linux DoS on ppp servers


after all it seems to be PEBKAC and bad luck...what a week.

thank you for your help,
Christian.
-- 
BOFH excuse #416:

We're out of slots on the server
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: oom with 2.6.11

2005-03-17 Thread Coywolf Qi Hunt
On Thu, 17 Mar 2005 02:27:29 +0100, Christian Kujau <[EMAIL PROTECTED]> wrote:
> hello again,
> 
> unfortunately i've hit OOM again, this time with "#define DEBUG" enabled
> in mm/oom_kill.c:
> 
> http://nerdbynature.de/bits/sheep/2.6.11/oom/oom_2.6.11.3.txt
> 
> by "Mar 16 18:32" pppd died again and OOM kicked in 30min later.
> (there are a *lot* messages of a shell script named "check-route.sh". it's
> a little script which runs every minute or so to check if my default route
> is still ok and if ping to the outside world are possible. definitely not
> a memory hog, but noisy)


I do "grep check-route.sh oom_2.6.11.3.txt | wc" and it shows 4365
lines, which means
there're 4365 that script processes running, from pid 4260 to12747,
mostly with pretty low points, 123.  Based on this points, suppose
each script consumes 100k, that'll be 100k*4k=400M roughly. And your
box's is merely 256M MemTotal.

The currently oom algorithm fails to find out such kinds of memory hog.
And the kernel kills other innocent processes because the its points
is much lower than most others.

Check this script and disable it; see what will happen.

-- 
Coywolf Qi Hunt
http://sosdg.org/~coywolf/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: oom with 2.6.11

2005-03-17 Thread Coywolf Qi Hunt
On Thu, 17 Mar 2005 02:27:29 +0100, Christian Kujau [EMAIL PROTECTED] wrote:
 hello again,
 
 unfortunately i've hit OOM again, this time with #define DEBUG enabled
 in mm/oom_kill.c:
 
 http://nerdbynature.de/bits/sheep/2.6.11/oom/oom_2.6.11.3.txt
 
 by Mar 16 18:32 pppd died again and OOM kicked in 30min later.
 (there are a *lot* messages of a shell script named check-route.sh. it's
 a little script which runs every minute or so to check if my default route
 is still ok and if ping to the outside world are possible. definitely not
 a memory hog, but noisy)


I do grep check-route.sh oom_2.6.11.3.txt | wc and it shows 4365
lines, which means
there're 4365 that script processes running, from pid 4260 to12747,
mostly with pretty low points, 123.  Based on this points, suppose
each script consumes 100k, that'll be 100k*4k=400M roughly. And your
box's is merely 256M MemTotal.

The currently oom algorithm fails to find out such kinds of memory hog.
And the kernel kills other innocent processes because the its points
is much lower than most others.

Check this script and disable it; see what will happen.

-- 
Coywolf Qi Hunt
http://sosdg.org/~coywolf/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: oom with 2.6.11

2005-03-17 Thread Christian Kujau
Coywolf Qi Hunt wrote:
 I do grep check-route.sh oom_2.6.11.3.txt | wc and it shows 4365

duh, good catch! really!

 lines, which means there're 4365 that script processes running, from 
 pid 4260 to12747, mostly with pretty low points, 123.
 Based on this points, suppose each script consumes 100k, that'll be
 100k*4k=400M roughly. And your box's is merely 256M MemTotal.

yes, i just checked, the script is looping and crond is starting a new
one, and anotherand the oom-killer does not catch it, because it's too
small and of course don't know where it is coming from (crond).

 Check this script and disable it; see what will happen.

yes, will do that. on a (not so unimportant) side-note: i was told the
whole thing should be fixed with 2.6.11.4:

  [PATCH] CAN-2005-0384: Remote Linux DoS on ppp servers


after all it seems to be PEBKAC and bad luck...what a week.

thank you for your help,
Christian.
-- 
BOFH excuse #416:

We're out of slots on the server
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: oom with 2.6.11

2005-03-16 Thread Christian Kujau
Andrew Morton wrote:
> 
> Some application went berzerk, used up all the swap and then oomed the box.
> 
> You could perhaps run `top -d1' then hit M so the output is sorted by
> bloatiness, then try to catch the culprit.

i've already done that. as OOM happens when i am not around, i did that
with "ps":

http://lkml.org/lkml/2005/3/12/88
http://nerdbynature.de/bits/sheep/2.6.11/oom/daily_stats-2.6.11-rc5-bk2.log.gz

> But it would be better to have some app which prints the N most
> memory-hungry processes every second and simply scrolls that up the screen. 
> I'm not aware of such a thing, but it could be cooked up via
> /proc/N/cmdline and /proc/N/statm.

i hope the link above does reveal this information.

i just wrote a bug report for the (debian), ppp package, but to know
*where* the memory goes to would really help, i think.

thank you,
Christian.
-- 
BOFH excuse #72:

Satan did it
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: oom with 2.6.11

2005-03-16 Thread Andrew Morton
Christian Kujau <[EMAIL PROTECTED]> wrote:
>
> unfortunately i've hit OOM again, this time with "#define DEBUG" enabled
>  in mm/oom_kill.c:
> 
>  http://nerdbynature.de/bits/sheep/2.6.11/oom/oom_2.6.11.3.txt
> 
>  by "Mar 16 18:32" pppd died again and OOM kicked in 30min later.
>  (there are a *lot* messages of a shell script named "check-route.sh". it's
>  a little script which runs every minute or so to check if my default route
>  is still ok and if ping to the outside world are possible. definitely not
>  a memory hog, but noisy)
> 
>  since tracking the "most memory consuming applications" did not reveal any
>  hints [1], i have monitored /proc/slabinfo and /proc/meminfo this time:
> 
>  http://nerdbynature.de/bits/sheep/2.6.11/oom/daily_stats-2.6.11.3.gz
> 
>  as stated before, i was suspecting pppd to be the bad guy here, and yes: i
>  downgraded pppd to an earlier version and pppd (and the system) survived 2
>  terminations of my dial-up ISP. yesterday i've upgraded back again to
>  current pppd (debian/unstable) and the OOM problem returned. yes, i'll bug
>  the debian people now (hello!), but grepping for "ppp" in
>  daily_stats-2.6.11.3.gz gives no hits. so "pppd" does not get *any* points
>  from mm/oom_kill.c and thus no attempts are made to kill it (it is always
>  only kill'able with "-9").

The oom-killer tries to be nicer to processes which are running as root.

> furthermore, i thought /proc/slabinfo coud give
>  me some hints about *where* all the memory went in. scrolling down this
>  file to the bottom, where "SwapFree" shows "0 kB" i don't see any alarming
>  numbers in the "slabinfo" right above "meminfo".

MemTotal:   256372 kB
MemFree:  3280 kB
Buffers:   608 kB
Cached:   3256 kB
SwapCached:664 kB
Active: 105020 kB
Inactive:20364 kB
HighTotal:   0 kB
HighFree:0 kB
LowTotal:   256372 kB
LowFree:  3280 kB
SwapTotal:  784468 kB
SwapFree:0 kB
Dirty:  12 kB
Writeback:   0 kB
Mapped: 130332 kB
Slab:61424 kB
CommitLimit:912652 kB
Committed_AS:  1323548 kB
PageTables:  51668 kB
VmallocTotal:   778184 kB
VmallocUsed:  3464 kB
VmallocChunk:   774492 kB

Some application went berzerk, used up all the swap and then oomed the box.

You could perhaps run `top -d1' then hit M so the output is sorted by
bloatiness, then try to catch the culprit.

But it would be better to have some app which prints the N most
memory-hungry processes every second and simply scrolls that up the screen. 
I'm not aware of such a thing, but it could be cooked up via
/proc/N/cmdline and /proc/N/statm.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: oom with 2.6.11

2005-03-16 Thread Christian Kujau
hello again,

unfortunately i've hit OOM again, this time with "#define DEBUG" enabled
in mm/oom_kill.c:

http://nerdbynature.de/bits/sheep/2.6.11/oom/oom_2.6.11.3.txt

by "Mar 16 18:32" pppd died again and OOM kicked in 30min later.
(there are a *lot* messages of a shell script named "check-route.sh". it's
a little script which runs every minute or so to check if my default route
is still ok and if ping to the outside world are possible. definitely not
a memory hog, but noisy)

since tracking the "most memory consuming applications" did not reveal any
hints [1], i have monitored /proc/slabinfo and /proc/meminfo this time:

http://nerdbynature.de/bits/sheep/2.6.11/oom/daily_stats-2.6.11.3.gz

as stated before, i was suspecting pppd to be the bad guy here, and yes: i
downgraded pppd to an earlier version and pppd (and the system) survived 2
terminations of my dial-up ISP. yesterday i've upgraded back again to
current pppd (debian/unstable) and the OOM problem returned. yes, i'll bug
the debian people now (hello!), but grepping for "ppp" in
daily_stats-2.6.11.3.gz gives no hits. so "pppd" does not get *any* points
from mm/oom_kill.c and thus no attempts are made to kill it (it is always
only kill'able with "-9"). furthermore, i thought /proc/slabinfo coud give
me some hints about *where* all the memory went in. scrolling down this
file to the bottom, where "SwapFree" shows "0 kB" i don't see any alarming
numbers in the "slabinfo" right above "meminfo".

could someone give me a hint, please?

thanks,
Christian.

[1] http://lkml.org/lkml/2005/3/12/88

more info for this recent OOM issue:
http://nerdbynature.de/bits/sheep/2.6.11/oom/dmesg.2.6.11.3
http://nerdbynature.de/bits/sheep/2.6.11/oom/lsmod_2.6.11.3
http://nerdbynature.de/bits/sheep/2.6.11/oom/config-2.6.11.3.gz
-- 
BOFH excuse #62:

need to wrap system in aluminum foil to fix problem
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: oom with 2.6.11

2005-03-16 Thread Christian Kujau
hello again,

unfortunately i've hit OOM again, this time with #define DEBUG enabled
in mm/oom_kill.c:

http://nerdbynature.de/bits/sheep/2.6.11/oom/oom_2.6.11.3.txt

by Mar 16 18:32 pppd died again and OOM kicked in 30min later.
(there are a *lot* messages of a shell script named check-route.sh. it's
a little script which runs every minute or so to check if my default route
is still ok and if ping to the outside world are possible. definitely not
a memory hog, but noisy)

since tracking the most memory consuming applications did not reveal any
hints [1], i have monitored /proc/slabinfo and /proc/meminfo this time:

http://nerdbynature.de/bits/sheep/2.6.11/oom/daily_stats-2.6.11.3.gz

as stated before, i was suspecting pppd to be the bad guy here, and yes: i
downgraded pppd to an earlier version and pppd (and the system) survived 2
terminations of my dial-up ISP. yesterday i've upgraded back again to
current pppd (debian/unstable) and the OOM problem returned. yes, i'll bug
the debian people now (hello!), but grepping for ppp in
daily_stats-2.6.11.3.gz gives no hits. so pppd does not get *any* points
from mm/oom_kill.c and thus no attempts are made to kill it (it is always
only kill'able with -9). furthermore, i thought /proc/slabinfo coud give
me some hints about *where* all the memory went in. scrolling down this
file to the bottom, where SwapFree shows 0 kB i don't see any alarming
numbers in the slabinfo right above meminfo.

could someone give me a hint, please?

thanks,
Christian.

[1] http://lkml.org/lkml/2005/3/12/88

more info for this recent OOM issue:
http://nerdbynature.de/bits/sheep/2.6.11/oom/dmesg.2.6.11.3
http://nerdbynature.de/bits/sheep/2.6.11/oom/lsmod_2.6.11.3
http://nerdbynature.de/bits/sheep/2.6.11/oom/config-2.6.11.3.gz
-- 
BOFH excuse #62:

need to wrap system in aluminum foil to fix problem
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: oom with 2.6.11

2005-03-16 Thread Andrew Morton
Christian Kujau [EMAIL PROTECTED] wrote:

 unfortunately i've hit OOM again, this time with #define DEBUG enabled
  in mm/oom_kill.c:
 
  http://nerdbynature.de/bits/sheep/2.6.11/oom/oom_2.6.11.3.txt
 
  by Mar 16 18:32 pppd died again and OOM kicked in 30min later.
  (there are a *lot* messages of a shell script named check-route.sh. it's
  a little script which runs every minute or so to check if my default route
  is still ok and if ping to the outside world are possible. definitely not
  a memory hog, but noisy)
 
  since tracking the most memory consuming applications did not reveal any
  hints [1], i have monitored /proc/slabinfo and /proc/meminfo this time:
 
  http://nerdbynature.de/bits/sheep/2.6.11/oom/daily_stats-2.6.11.3.gz
 
  as stated before, i was suspecting pppd to be the bad guy here, and yes: i
  downgraded pppd to an earlier version and pppd (and the system) survived 2
  terminations of my dial-up ISP. yesterday i've upgraded back again to
  current pppd (debian/unstable) and the OOM problem returned. yes, i'll bug
  the debian people now (hello!), but grepping for ppp in
  daily_stats-2.6.11.3.gz gives no hits. so pppd does not get *any* points
  from mm/oom_kill.c and thus no attempts are made to kill it (it is always
  only kill'able with -9).

The oom-killer tries to be nicer to processes which are running as root.

 furthermore, i thought /proc/slabinfo coud give
  me some hints about *where* all the memory went in. scrolling down this
  file to the bottom, where SwapFree shows 0 kB i don't see any alarming
  numbers in the slabinfo right above meminfo.

MemTotal:   256372 kB
MemFree:  3280 kB
Buffers:   608 kB
Cached:   3256 kB
SwapCached:664 kB
Active: 105020 kB
Inactive:20364 kB
HighTotal:   0 kB
HighFree:0 kB
LowTotal:   256372 kB
LowFree:  3280 kB
SwapTotal:  784468 kB
SwapFree:0 kB
Dirty:  12 kB
Writeback:   0 kB
Mapped: 130332 kB
Slab:61424 kB
CommitLimit:912652 kB
Committed_AS:  1323548 kB
PageTables:  51668 kB
VmallocTotal:   778184 kB
VmallocUsed:  3464 kB
VmallocChunk:   774492 kB

Some application went berzerk, used up all the swap and then oomed the box.

You could perhaps run `top -d1' then hit M so the output is sorted by
bloatiness, then try to catch the culprit.

But it would be better to have some app which prints the N most
memory-hungry processes every second and simply scrolls that up the screen. 
I'm not aware of such a thing, but it could be cooked up via
/proc/N/cmdline and /proc/N/statm.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: oom with 2.6.11

2005-03-16 Thread Christian Kujau
Andrew Morton wrote:
 
 Some application went berzerk, used up all the swap and then oomed the box.
 
 You could perhaps run `top -d1' then hit M so the output is sorted by
 bloatiness, then try to catch the culprit.

i've already done that. as OOM happens when i am not around, i did that
with ps:

http://lkml.org/lkml/2005/3/12/88
http://nerdbynature.de/bits/sheep/2.6.11/oom/daily_stats-2.6.11-rc5-bk2.log.gz

 But it would be better to have some app which prints the N most
 memory-hungry processes every second and simply scrolls that up the screen. 
 I'm not aware of such a thing, but it could be cooked up via
 /proc/N/cmdline and /proc/N/statm.

i hope the link above does reveal this information.

i just wrote a bug report for the (debian), ppp package, but to know
*where* the memory goes to would really help, i think.

thank you,
Christian.
-- 
BOFH excuse #72:

Satan did it
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: oom with 2.6.11

2005-03-15 Thread Christian Kujau
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Mauricio Lin wrote:
>>>
>>>Did this problem start from 2.6.11-rc2-bk10?
>>
>>i noticed it first at 2.6.11, then again with 2.6.11-rc5-bk2. suspecting
>>pppd to be the culprit to chew up all RAM after being terminated by my ISP
>>once a day - i just have to wait (must be around 2a.m.).
> 
> Have you tried with 2.6.10 in order to check this problem?

i don't think i've run plain 2.6.10 at all, syslog does not say so. i've
run several versions of 2.6.9, then 2.6.10-rc1, 2.6.10-rc1-bk19,
2.6.10-rc2-bk9, 2.6.10-rc3-bk11, 2.6.11-rc2-bk10, 2.6.11-rc5-bk2.

as stated earlier, the problem kicked in when i switched to 2.6.11, i
switched back to 2.6.11-rc5-bk2 and it happened again. by looking at the
syslog i am (still) suspecting pppd to be the source of OOM, but it did
not show up as a memory pig in "ps". i am running 2.6.11.3 now and the
machine survived almost 2 passes of pppd's daily hangups. i'll keep an eye
on it.

thank you for your concern,
Christian.
- --
BOFH excuse #357:

I'd love to help you -- it's just that the Boss won't let me near the
computer.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.0 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFCNu1e+A7rjkF8z0wRAmSPAJ49erb21bO6inHPn/zoUH2gNHJk8QCguYDA
HyzXS9whrJnb9eXVz9IP8zc=
=zhEx
-END PGP SIGNATURE-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: oom with 2.6.11

2005-03-15 Thread Mauricio Lin
Hi Christian,

On Fri, 11 Mar 2005 16:09:24 +0100, Christian Kujau <[EMAIL PROTECTED]> wrote:
> Mauricio Lin wrote:
> > Hi Christian,
> >
> > I would like to know what are the kernel versions this problem happened.
> >
> > Did this problem start from 2.6.11-rc2-bk10?
> 
> i noticed it first at 2.6.11, then again with 2.6.11-rc5-bk2. suspecting
> pppd to be the culprit to chew up all RAM after being terminated by my ISP
> once a day - i just have to wait (must be around 2a.m.).

Have you tried with 2.6.10 in order to check this problem?

BR,

Mauricio Lin.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: oom with 2.6.11

2005-03-15 Thread Mauricio Lin
Hi Christian,

On Fri, 11 Mar 2005 16:09:24 +0100, Christian Kujau [EMAIL PROTECTED] wrote:
 Mauricio Lin wrote:
  Hi Christian,
 
  I would like to know what are the kernel versions this problem happened.
 
  Did this problem start from 2.6.11-rc2-bk10?
 
 i noticed it first at 2.6.11, then again with 2.6.11-rc5-bk2. suspecting
 pppd to be the culprit to chew up all RAM after being terminated by my ISP
 once a day - i just have to wait (must be around 2a.m.).

Have you tried with 2.6.10 in order to check this problem?

BR,

Mauricio Lin.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: oom with 2.6.11

2005-03-12 Thread Christian Kujau
hi again,

i had to wait for my pppoe session to be terminated by the remote peer
[1], and now it happened again with 2.6.11-rc5-bk2:

http://nerdbynature.de/bits/sheep/2.6.11/oom/oom_2.6.11-rc5-bk2_2.txt
http://nerdbynature.de/bits/sheep/2.6.11/oom/lsmod_2.6.11-rc5-bk2
http://nerdbynature.de/bits/sheep/2.6.11/oom/config-2.6.11-rc5-bk2.gz

i wanted to see the application chewing up the most RAM so i was running
"ps aux --sort=-vsz,-rss | head -n1" every 5sec and wrote it's output to a
file:

http://nerdbynature.de/bits/sheep/2.6.11/oom/daily_stats-2.6.11-rc5-bk2.log.gz

but even when almost all swap is used up, mysqld ist still no#1 with
131928 KB VSZ, as always. still suspecting something wrong with pppd, the
binary istself does *not* show up on the top-ten-memory-pigs.

real memory is almost always used up, but that's ok.
i stripped it down to show only the timestamp and the "Swap: .. used" numbers:

http://nerdbynature.de/bits/sheep/2.6.11/oom/daily_stats-2.6.11-rc5-bk2_stripped.log.gz

as you can see, until "Sat Mar 12 01:05:51" 122MB swap was used, from then
on swap-usage goes up until "Sat Mar 12 01:49:45", when all swap is in
use. the system is unable to keep up with writing memory usage every 5
seconds, the next entry is from 12 "01:56:24" - 6 minutes after the first
OOM message (i'm sure someone could feed this to gnuplot and make nice
bars out of it - i can't)

where else could i look to actually *see* where the memory goes to?

somehow lost in kernel versions,
Christian.

[1] "remote peer" aka "ISP". is there a way to "simulate" a "LCP
terminated by peer" locally? otherwise i really have to wait 24h to
trigger the bug.
-- 
BOFH excuse #373:

Suspicious pointer corrupted virtual machine
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: oom with 2.6.11

2005-03-12 Thread Christian Kujau
hi again,

i had to wait for my pppoe session to be terminated by the remote peer
[1], and now it happened again with 2.6.11-rc5-bk2:

http://nerdbynature.de/bits/sheep/2.6.11/oom/oom_2.6.11-rc5-bk2_2.txt
http://nerdbynature.de/bits/sheep/2.6.11/oom/lsmod_2.6.11-rc5-bk2
http://nerdbynature.de/bits/sheep/2.6.11/oom/config-2.6.11-rc5-bk2.gz

i wanted to see the application chewing up the most RAM so i was running
ps aux --sort=-vsz,-rss | head -n1 every 5sec and wrote it's output to a
file:

http://nerdbynature.de/bits/sheep/2.6.11/oom/daily_stats-2.6.11-rc5-bk2.log.gz

but even when almost all swap is used up, mysqld ist still no#1 with
131928 KB VSZ, as always. still suspecting something wrong with pppd, the
binary istself does *not* show up on the top-ten-memory-pigs.

real memory is almost always used up, but that's ok.
i stripped it down to show only the timestamp and the Swap: .. used numbers:

http://nerdbynature.de/bits/sheep/2.6.11/oom/daily_stats-2.6.11-rc5-bk2_stripped.log.gz

as you can see, until Sat Mar 12 01:05:51 122MB swap was used, from then
on swap-usage goes up until Sat Mar 12 01:49:45, when all swap is in
use. the system is unable to keep up with writing memory usage every 5
seconds, the next entry is from 12 01:56:24 - 6 minutes after the first
OOM message (i'm sure someone could feed this to gnuplot and make nice
bars out of it - i can't)

where else could i look to actually *see* where the memory goes to?

somehow lost in kernel versions,
Christian.

[1] remote peer aka ISP. is there a way to simulate a LCP
terminated by peer locally? otherwise i really have to wait 24h to
trigger the bug.
-- 
BOFH excuse #373:

Suspicious pointer corrupted virtual machine
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: oom with 2.6.11

2005-03-11 Thread Christian Kujau
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Coywolf Qi Hunt wrote:
> In file mm/oom_kill.c, uncomment line 24: /* #define DEBUG */. 
> And next time when oom happens again, we'll see the badness.

oh, good hint. will do this before the next reboot (in a few hours i guess)


thanks,
Christian.

- --
BOFH excuse #134:

because of network lag due to too many people playing deathmatch
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.0 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFCMbTj+A7rjkF8z0wRAmdAAJ4sCIaUqZaKn7gGtkpN7Wb47acFiACgrOSe
DpT3tE1/4zfPIDueDwBMhDU=
=o6Lr
-END PGP SIGNATURE-

-- 
BOFH excuse #37:

heavy gravity fluctuation, move computer to floor rapidly
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: oom with 2.6.11

2005-03-11 Thread Christian Kujau
Mauricio Lin wrote:
> Hi Christian,
> 
> I would like to know what are the kernel versions this problem happened.
> 
> Did this problem start from 2.6.11-rc2-bk10?

i noticed it first at 2.6.11, then again with 2.6.11-rc5-bk2. suspecting
pppd to be the culprit to chew up all RAM after being terminated by my ISP
once a day - i just have to wait (must be around 2a.m.).

thanks,
Christian.
-- 
BOFH excuse #134:

because of network lag due to too many people playing deathmatch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: oom with 2.6.11

2005-03-11 Thread Coywolf Qi Hunt
In file mm/oom_kill.c, uncomment line 24: /* #define DEBUG */. 
And next time when oom happens again, we'll see the badness.

--coywolf

On Tue, 08 Mar 2005 16:21:21 +0100, Christian Kujau <[EMAIL PROTECTED]> wrote:
> hallo list,
> 
> today my machine went out out memory and noticing it several hours after
> the first OOM message in the log, i wonder
>1) why this happened at all and
>2) why almost every service was killed despite the clever algorithms
>   documented in mm/oom_kill.c.
> 
> the first oom message went to the syslog at 01:27, i was away and no heavy
> tasks were scheduled:
> 
> http://nerdbynature.de/bits/sheep/2.6.11/oom/oom_2.6.11.txt
> 
> mysqld got killed by the oom killer, so i have to suspect mysql for being
> the reason for oom here, even that i know that mysqld is running all day
> long. several other tasks got killed, but "Free swap" stays at 0kB and the
> oom killer kills almost every other tasks, with no success in freeing ram.
> 
> the log stops at 03:21, perhaps syslog-ng got killed.
> at around 07:31 i noticed the mess, did SYSRQ-E and now i was able to
> login again. i pressed SYSRQ-M/T/P too, they are all in the log. at this
> time loadavg was at 249 ;)
> 
> i went to runlevel 2, then up again to 3 and all services are up and
> running again.
> 
> some 2.6.11-rc3 BK snapshot was running pretty stable (no OOM) for ~30
> days before i switched to 2.6.11 (vanilla) a few days ago. i have to (not)
> reproduce the problem the next night, i wonder if it will happen again.
> 
> do you vm-gurus have any idea to the points asked above?
> 
> more infos about the box here: http://nerdbynature.de/bits/sheep/2.6.11/oom/
> 
> thank you for your comments,
> Christian.
> --
> BOFH excuse #281:
> 
> The co-locator cannot verify the frame-relay gateway to the ISDN server.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


-- 
Coywolf Qi Hunt
Homepage http://sosdg.org/~coywolf/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: oom with 2.6.11

2005-03-11 Thread Mauricio Lin
Hi Christian,

I would like to know what are the kernel versions this problem happened.

Did this problem start from 2.6.11-rc2-bk10?

BR,

Mauricio Lin.

On Thu, 10 Mar 2005 16:12:27 +0100, Christian Kujau <[EMAIL PROTECTED]> wrote:
> ok,
> 
> as "promised", it the OOM happened again with the same plain 2.6.11,
> details here.
> 
> http://nerdbynature.de/bits/sheep/2.6.11/oom/oom_2.6.11_2.txt
> 
> the following is a quite long, but please read on
> (if anyone is reading at all :))
> 
> this time it happened at 08:01, and i could image some heavy cron jobs
> were going on. but as i said: "it did not happen before". there are also
> output of SYSRQ-T/M/P. i did SYSRQ-E to recover the machine, but then
> decided to reboot back to 2.6.11-rc5-bk2.
> 
> i had a look at the changelogs too and noticed that ChangeLog-2.6.11
> contains 7 occurrences of "OOM" in the patch desctiption:
> 
> [PATCH] mm: overcommit updates, 2005-01-03
> [PATCH] vmscan: count writeback pages in nr_scanned, 2005-01-08
> [PATCH] possible rq starvation on oom, 2005-01-13
> [PATCH] mm: adjust dirty threshold for lowmem-only mappings, 2005-01-25
> [PATCH] mm: oom-killer tunable, 2005-02-02
> [PATCH] mm: fix several oom killer bugs, 2005-02-02
> [PATCH] Fix oops in alloc_zeroed_user_highpage() when [...],2005-02-09
> 
> release dates:
> 2.6.11-rc5-bk1  26-Feb-2005
> 2.6.11-rc5-bk2  27-Feb-2005  <
> 2.6.11-rc5-bk3  28-Feb-2005
> 2.6.11-rc5-bk4  01-Mar-2005
> 2.6.11  02-Mar-2005
> 
> so i really don't see any patches that *could* have something to do with
> the issue here.
> 
> now comes the weird part:
> 
> i was going to compile 2.6.11-rc5-bk4, to sort out the "bad" kernel.
> compiling went fine. ok, finished some email, ok, suddenly my swap was
> used up again, and no memory left - uh oh! OOM again, with 2.6.11-rc5-bk2!
> 
> to summarize it:
> i've run 2.6.11-rc2-bk10 during whole february, then switched to
> 2.6.11-rc5-bk2 on 28.02.2005, then to 2.6.11 on 05.03.2005 - and only
> noticed with 2.6.11 first, now with 2.6.11-rc5-bk2 too.
> 
> there is an interesting part in the logfiles:
> 
> http://nerdbynature.de/bits/sheep/2.6.11/oom/oom_2.6.11.txt
> http://nerdbynature.de/bits/sheep/2.6.11/oom/oom_2.6.11_2.txt
> http://nerdbynature.de/bits/sheep/2.6.11/oom/oom_2.6.11-rc5-bk2.txt
> 
> every last message before the "OOM" messages is something with pppd:
> 
> Mar 10 13:45:55 sheep pppd[1567]: Starting link
> Mar 10 14:12:29 sheep kernel: oom-killer: gfp_mask=0x1d2
> 
> Mar  8 00:59:58 sheep pppd[418]: Starting link
> Mar  8 01:27:33 sheep kernel: oom-killer: gfp_mask=0xd0
> 
> Mar  9 07:33:49 sheep pppd[30937]: Starting link
> Mar  9 08:01:35 sheep kernel: oom-killer: gfp_mask=0x1d2
> 
> and 30min later OOM kicks in. normally, pppd (pppoe) gives messages like this:
> 
> Mar 10 14:23:38 sheep pppd[26365]: Starting link
> Mar 10 14:23:38 sheep pppd[26365]: Serial connection established.
> Mar 10 14:23:38 sheep pppd[26365]: Connect: ppp0 <--> /dev/pts/0
> Mar 10 14:23:38 sheep pppoe[26383]: PADS: Service-Name: ''
> Mar 10 14:23:38 sheep pppoe[26383]: PPP session is 6804
> Mar 10 14:23:39 sheep pppd[26365]: CHAP authentication succeeded
> Mar 10 14:23:40 sheep pppd[26365]: Local IP address changed to
> [...]
> 
> is this strange? or not?
> 
> i hope someone has a hint for me, because "going back to the stable
> kernel" would mean "being bound to 2.6.11-rc2-bk10" :(
> 
> thank you for any hints,
> Christian.
> 
> PS: Steven, i've cc'ed you because you have trouble with new 2.6.11
> kernels and pppd too. maybe unrelated, maybe not.
> --
> BOFH excuse #185:
> 
> system consumed all the paper for paging
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: oom with 2.6.11

2005-03-11 Thread Mauricio Lin
Hi Christian,

I would like to know what are the kernel versions this problem happened.

Did this problem start from 2.6.11-rc2-bk10?

BR,

Mauricio Lin.

On Thu, 10 Mar 2005 16:12:27 +0100, Christian Kujau [EMAIL PROTECTED] wrote:
 ok,
 
 as promised, it the OOM happened again with the same plain 2.6.11,
 details here.
 
 http://nerdbynature.de/bits/sheep/2.6.11/oom/oom_2.6.11_2.txt
 
 the following is a quite long, but please read on
 (if anyone is reading at all :))
 
 this time it happened at 08:01, and i could image some heavy cron jobs
 were going on. but as i said: it did not happen before. there are also
 output of SYSRQ-T/M/P. i did SYSRQ-E to recover the machine, but then
 decided to reboot back to 2.6.11-rc5-bk2.
 
 i had a look at the changelogs too and noticed that ChangeLog-2.6.11
 contains 7 occurrences of OOM in the patch desctiption:
 
 [PATCH] mm: overcommit updates, 2005-01-03
 [PATCH] vmscan: count writeback pages in nr_scanned, 2005-01-08
 [PATCH] possible rq starvation on oom, 2005-01-13
 [PATCH] mm: adjust dirty threshold for lowmem-only mappings, 2005-01-25
 [PATCH] mm: oom-killer tunable, 2005-02-02
 [PATCH] mm: fix several oom killer bugs, 2005-02-02
 [PATCH] Fix oops in alloc_zeroed_user_highpage() when [...],2005-02-09
 
 release dates:
 2.6.11-rc5-bk1  26-Feb-2005
 2.6.11-rc5-bk2  27-Feb-2005  
 2.6.11-rc5-bk3  28-Feb-2005
 2.6.11-rc5-bk4  01-Mar-2005
 2.6.11  02-Mar-2005
 
 so i really don't see any patches that *could* have something to do with
 the issue here.
 
 now comes the weird part:
 
 i was going to compile 2.6.11-rc5-bk4, to sort out the bad kernel.
 compiling went fine. ok, finished some email, ok, suddenly my swap was
 used up again, and no memory left - uh oh! OOM again, with 2.6.11-rc5-bk2!
 
 to summarize it:
 i've run 2.6.11-rc2-bk10 during whole february, then switched to
 2.6.11-rc5-bk2 on 28.02.2005, then to 2.6.11 on 05.03.2005 - and only
 noticed with 2.6.11 first, now with 2.6.11-rc5-bk2 too.
 
 there is an interesting part in the logfiles:
 
 http://nerdbynature.de/bits/sheep/2.6.11/oom/oom_2.6.11.txt
 http://nerdbynature.de/bits/sheep/2.6.11/oom/oom_2.6.11_2.txt
 http://nerdbynature.de/bits/sheep/2.6.11/oom/oom_2.6.11-rc5-bk2.txt
 
 every last message before the OOM messages is something with pppd:
 
 Mar 10 13:45:55 sheep pppd[1567]: Starting link
 Mar 10 14:12:29 sheep kernel: oom-killer: gfp_mask=0x1d2
 
 Mar  8 00:59:58 sheep pppd[418]: Starting link
 Mar  8 01:27:33 sheep kernel: oom-killer: gfp_mask=0xd0
 
 Mar  9 07:33:49 sheep pppd[30937]: Starting link
 Mar  9 08:01:35 sheep kernel: oom-killer: gfp_mask=0x1d2
 
 and 30min later OOM kicks in. normally, pppd (pppoe) gives messages like this:
 
 Mar 10 14:23:38 sheep pppd[26365]: Starting link
 Mar 10 14:23:38 sheep pppd[26365]: Serial connection established.
 Mar 10 14:23:38 sheep pppd[26365]: Connect: ppp0 -- /dev/pts/0
 Mar 10 14:23:38 sheep pppoe[26383]: PADS: Service-Name: ''
 Mar 10 14:23:38 sheep pppoe[26383]: PPP session is 6804
 Mar 10 14:23:39 sheep pppd[26365]: CHAP authentication succeeded
 Mar 10 14:23:40 sheep pppd[26365]: Local IP address changed to
 [...]
 
 is this strange? or not?
 
 i hope someone has a hint for me, because going back to the stable
 kernel would mean being bound to 2.6.11-rc2-bk10 :(
 
 thank you for any hints,
 Christian.
 
 PS: Steven, i've cc'ed you because you have trouble with new 2.6.11
 kernels and pppd too. maybe unrelated, maybe not.
 --
 BOFH excuse #185:
 
 system consumed all the paper for paging

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: oom with 2.6.11

2005-03-11 Thread Coywolf Qi Hunt
In file mm/oom_kill.c, uncomment line 24: /* #define DEBUG */. 
And next time when oom happens again, we'll see the badness.

--coywolf

On Tue, 08 Mar 2005 16:21:21 +0100, Christian Kujau [EMAIL PROTECTED] wrote:
 hallo list,
 
 today my machine went out out memory and noticing it several hours after
 the first OOM message in the log, i wonder
1) why this happened at all and
2) why almost every service was killed despite the clever algorithms
   documented in mm/oom_kill.c.
 
 the first oom message went to the syslog at 01:27, i was away and no heavy
 tasks were scheduled:
 
 http://nerdbynature.de/bits/sheep/2.6.11/oom/oom_2.6.11.txt
 
 mysqld got killed by the oom killer, so i have to suspect mysql for being
 the reason for oom here, even that i know that mysqld is running all day
 long. several other tasks got killed, but Free swap stays at 0kB and the
 oom killer kills almost every other tasks, with no success in freeing ram.
 
 the log stops at 03:21, perhaps syslog-ng got killed.
 at around 07:31 i noticed the mess, did SYSRQ-E and now i was able to
 login again. i pressed SYSRQ-M/T/P too, they are all in the log. at this
 time loadavg was at 249 ;)
 
 i went to runlevel 2, then up again to 3 and all services are up and
 running again.
 
 some 2.6.11-rc3 BK snapshot was running pretty stable (no OOM) for ~30
 days before i switched to 2.6.11 (vanilla) a few days ago. i have to (not)
 reproduce the problem the next night, i wonder if it will happen again.
 
 do you vm-gurus have any idea to the points asked above?
 
 more infos about the box here: http://nerdbynature.de/bits/sheep/2.6.11/oom/
 
 thank you for your comments,
 Christian.
 --
 BOFH excuse #281:
 
 The co-locator cannot verify the frame-relay gateway to the ISDN server.
 -
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
 


-- 
Coywolf Qi Hunt
Homepage http://sosdg.org/~coywolf/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: oom with 2.6.11

2005-03-11 Thread Christian Kujau
Mauricio Lin wrote:
 Hi Christian,
 
 I would like to know what are the kernel versions this problem happened.
 
 Did this problem start from 2.6.11-rc2-bk10?

i noticed it first at 2.6.11, then again with 2.6.11-rc5-bk2. suspecting
pppd to be the culprit to chew up all RAM after being terminated by my ISP
once a day - i just have to wait (must be around 2a.m.).

thanks,
Christian.
-- 
BOFH excuse #134:

because of network lag due to too many people playing deathmatch
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: oom with 2.6.11

2005-03-11 Thread Christian Kujau
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Coywolf Qi Hunt wrote:
 In file mm/oom_kill.c, uncomment line 24: /* #define DEBUG */. 
 And next time when oom happens again, we'll see the badness.

oh, good hint. will do this before the next reboot (in a few hours i guess)


thanks,
Christian.

- --
BOFH excuse #134:

because of network lag due to too many people playing deathmatch
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.0 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFCMbTj+A7rjkF8z0wRAmdAAJ4sCIaUqZaKn7gGtkpN7Wb47acFiACgrOSe
DpT3tE1/4zfPIDueDwBMhDU=
=o6Lr
-END PGP SIGNATURE-

-- 
BOFH excuse #37:

heavy gravity fluctuation, move computer to floor rapidly
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: oom with 2.6.11

2005-03-10 Thread Pasi Kärkkäinen
On Fri, Mar 11, 2005 at 02:14:14AM +0100, Christian Kujau wrote:
> Andrew Morton wrote:
> > Christian Kujau <[EMAIL PROTECTED]> wrote:
> > 
> >>i was going to compile 2.6.11-rc5-bk4, to sort out the "bad" kernel.
> >>compiling went fine. ok, finished some email, ok, suddenly my swap was
> >>used up again, and no memory left - uh oh! OOM again, with 2.6.11-rc5-bk2!
> > 
> > 
> > Well if you ran out of swap then yes, the oom-killer will visit you.
> > 
> > Why did you run out of swapspace?
> 
> hm, if i only knew. i don't know how long it took the other night to go
> from "normal" to "OOM". but today, with 2.6.11-rc5-bk2 (well, yesterday
> actually) i was working normally, and all of a sudden swap goes from 170MB
> used swap (normal) to OOM. i think it took a minute or so, but i just
> can't tell which application went nuts. today the first process that got
> killed was "ssh-agent", the other day it was mysqld. but even after this,
> it should've released some memory, right? but the oom-killer goes on and
> on and kills the next task.
> 
> i'll monitor memory usage tonight and see what it gives. these "pppd"
> messages are suspicious though.
> 

I've also seen this 3 times now.. I'm running Xen 2.0 and the 2.6.10-xen0
kernel (with -as5 patch) goes OOM after a couple of days operation.. and
then the box reboots.

The "virtual machines" are OK, it's only the dom0 kernel that goes OOM.. 
And I'm not running anything special on dom0, only xen control stuff
(which is written in python..), ntp, nfs server, ssh, lvm2 and software raid.

Now I'm running a script which logs the cpu/memory/swap usage every 1
minutes.. trying to see if I can find the cause for the OOM.

-- Pasi Kärkkäinen
   
   ^
. .
 Linux
  /-\
 Choice.of.the
   .Next.Generation.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: oom with 2.6.11

2005-03-10 Thread Christian Kujau
Andrew Morton wrote:
> Christian Kujau <[EMAIL PROTECTED]> wrote:
> 
>>i was going to compile 2.6.11-rc5-bk4, to sort out the "bad" kernel.
>>compiling went fine. ok, finished some email, ok, suddenly my swap was
>>used up again, and no memory left - uh oh! OOM again, with 2.6.11-rc5-bk2!
> 
> 
> Well if you ran out of swap then yes, the oom-killer will visit you.
> 
> Why did you run out of swapspace?

hm, if i only knew. i don't know how long it took the other night to go
from "normal" to "OOM". but today, with 2.6.11-rc5-bk2 (well, yesterday
actually) i was working normally, and all of a sudden swap goes from 170MB
used swap (normal) to OOM. i think it took a minute or so, but i just
can't tell which application went nuts. today the first process that got
killed was "ssh-agent", the other day it was mysqld. but even after this,
it should've released some memory, right? but the oom-killer goes on and
on and kills the next task.

i'll monitor memory usage tonight and see what it gives. these "pppd"
messages are suspicious though.

thank you,
Christian.
-- 
BOFH excuse #135:

You put the disk in upside down.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: oom with 2.6.11

2005-03-10 Thread Andrew Morton
Christian Kujau <[EMAIL PROTECTED]> wrote:
>
> i was going to compile 2.6.11-rc5-bk4, to sort out the "bad" kernel.
> compiling went fine. ok, finished some email, ok, suddenly my swap was
> used up again, and no memory left - uh oh! OOM again, with 2.6.11-rc5-bk2!

Well if you ran out of swap then yes, the oom-killer will visit you.

Why did you run out of swapspace?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: oom with 2.6.11

2005-03-10 Thread Christian Kujau
ok,

as "promised", it the OOM happened again with the same plain 2.6.11,
details here.

http://nerdbynature.de/bits/sheep/2.6.11/oom/oom_2.6.11_2.txt

the following is a quite long, but please read on
(if anyone is reading at all :))

this time it happened at 08:01, and i could image some heavy cron jobs
were going on. but as i said: "it did not happen before". there are also
output of SYSRQ-T/M/P. i did SYSRQ-E to recover the machine, but then
decided to reboot back to 2.6.11-rc5-bk2.

i had a look at the changelogs too and noticed that ChangeLog-2.6.11
contains 7 occurrences of "OOM" in the patch desctiption:

[PATCH] mm: overcommit updates, 2005-01-03
[PATCH] vmscan: count writeback pages in nr_scanned, 2005-01-08
[PATCH] possible rq starvation on oom, 2005-01-13
[PATCH] mm: adjust dirty threshold for lowmem-only mappings, 2005-01-25
[PATCH] mm: oom-killer tunable, 2005-02-02
[PATCH] mm: fix several oom killer bugs, 2005-02-02
[PATCH] Fix oops in alloc_zeroed_user_highpage() when [...],2005-02-09

release dates:
2.6.11-rc5-bk1  26-Feb-2005
2.6.11-rc5-bk2  27-Feb-2005  <
2.6.11-rc5-bk3  28-Feb-2005
2.6.11-rc5-bk4  01-Mar-2005
2.6.11  02-Mar-2005

so i really don't see any patches that *could* have something to do with
the issue here.

now comes the weird part:

i was going to compile 2.6.11-rc5-bk4, to sort out the "bad" kernel.
compiling went fine. ok, finished some email, ok, suddenly my swap was
used up again, and no memory left - uh oh! OOM again, with 2.6.11-rc5-bk2!

to summarize it:
i've run 2.6.11-rc2-bk10 during whole february, then switched to
2.6.11-rc5-bk2 on 28.02.2005, then to 2.6.11 on 05.03.2005 - and only
noticed with 2.6.11 first, now with 2.6.11-rc5-bk2 too.

there is an interesting part in the logfiles:

http://nerdbynature.de/bits/sheep/2.6.11/oom/oom_2.6.11.txt
http://nerdbynature.de/bits/sheep/2.6.11/oom/oom_2.6.11_2.txt
http://nerdbynature.de/bits/sheep/2.6.11/oom/oom_2.6.11-rc5-bk2.txt

every last message before the "OOM" messages is something with pppd:

Mar 10 13:45:55 sheep pppd[1567]: Starting link
Mar 10 14:12:29 sheep kernel: oom-killer: gfp_mask=0x1d2

Mar  8 00:59:58 sheep pppd[418]: Starting link
Mar  8 01:27:33 sheep kernel: oom-killer: gfp_mask=0xd0

Mar  9 07:33:49 sheep pppd[30937]: Starting link
Mar  9 08:01:35 sheep kernel: oom-killer: gfp_mask=0x1d2

and 30min later OOM kicks in. normally, pppd (pppoe) gives messages like this:

Mar 10 14:23:38 sheep pppd[26365]: Starting link
Mar 10 14:23:38 sheep pppd[26365]: Serial connection established.
Mar 10 14:23:38 sheep pppd[26365]: Connect: ppp0 <--> /dev/pts/0
Mar 10 14:23:38 sheep pppoe[26383]: PADS: Service-Name: ''
Mar 10 14:23:38 sheep pppoe[26383]: PPP session is 6804
Mar 10 14:23:39 sheep pppd[26365]: CHAP authentication succeeded
Mar 10 14:23:40 sheep pppd[26365]: Local IP address changed to
[...]

is this strange? or not?

i hope someone has a hint for me, because "going back to the stable
kernel" would mean "being bound to 2.6.11-rc2-bk10" :(

thank you for any hints,
Christian.

PS: Steven, i've cc'ed you because you have trouble with new 2.6.11
kernels and pppd too. maybe unrelated, maybe not.
-- 
BOFH excuse #185:

system consumed all the paper for paging
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: oom with 2.6.11

2005-03-10 Thread Christian Kujau
ok,

as promised, it the OOM happened again with the same plain 2.6.11,
details here.

http://nerdbynature.de/bits/sheep/2.6.11/oom/oom_2.6.11_2.txt

the following is a quite long, but please read on
(if anyone is reading at all :))

this time it happened at 08:01, and i could image some heavy cron jobs
were going on. but as i said: it did not happen before. there are also
output of SYSRQ-T/M/P. i did SYSRQ-E to recover the machine, but then
decided to reboot back to 2.6.11-rc5-bk2.

i had a look at the changelogs too and noticed that ChangeLog-2.6.11
contains 7 occurrences of OOM in the patch desctiption:

[PATCH] mm: overcommit updates, 2005-01-03
[PATCH] vmscan: count writeback pages in nr_scanned, 2005-01-08
[PATCH] possible rq starvation on oom, 2005-01-13
[PATCH] mm: adjust dirty threshold for lowmem-only mappings, 2005-01-25
[PATCH] mm: oom-killer tunable, 2005-02-02
[PATCH] mm: fix several oom killer bugs, 2005-02-02
[PATCH] Fix oops in alloc_zeroed_user_highpage() when [...],2005-02-09

release dates:
2.6.11-rc5-bk1  26-Feb-2005
2.6.11-rc5-bk2  27-Feb-2005  
2.6.11-rc5-bk3  28-Feb-2005
2.6.11-rc5-bk4  01-Mar-2005
2.6.11  02-Mar-2005

so i really don't see any patches that *could* have something to do with
the issue here.

now comes the weird part:

i was going to compile 2.6.11-rc5-bk4, to sort out the bad kernel.
compiling went fine. ok, finished some email, ok, suddenly my swap was
used up again, and no memory left - uh oh! OOM again, with 2.6.11-rc5-bk2!

to summarize it:
i've run 2.6.11-rc2-bk10 during whole february, then switched to
2.6.11-rc5-bk2 on 28.02.2005, then to 2.6.11 on 05.03.2005 - and only
noticed with 2.6.11 first, now with 2.6.11-rc5-bk2 too.

there is an interesting part in the logfiles:

http://nerdbynature.de/bits/sheep/2.6.11/oom/oom_2.6.11.txt
http://nerdbynature.de/bits/sheep/2.6.11/oom/oom_2.6.11_2.txt
http://nerdbynature.de/bits/sheep/2.6.11/oom/oom_2.6.11-rc5-bk2.txt

every last message before the OOM messages is something with pppd:

Mar 10 13:45:55 sheep pppd[1567]: Starting link
Mar 10 14:12:29 sheep kernel: oom-killer: gfp_mask=0x1d2

Mar  8 00:59:58 sheep pppd[418]: Starting link
Mar  8 01:27:33 sheep kernel: oom-killer: gfp_mask=0xd0

Mar  9 07:33:49 sheep pppd[30937]: Starting link
Mar  9 08:01:35 sheep kernel: oom-killer: gfp_mask=0x1d2

and 30min later OOM kicks in. normally, pppd (pppoe) gives messages like this:

Mar 10 14:23:38 sheep pppd[26365]: Starting link
Mar 10 14:23:38 sheep pppd[26365]: Serial connection established.
Mar 10 14:23:38 sheep pppd[26365]: Connect: ppp0 -- /dev/pts/0
Mar 10 14:23:38 sheep pppoe[26383]: PADS: Service-Name: ''
Mar 10 14:23:38 sheep pppoe[26383]: PPP session is 6804
Mar 10 14:23:39 sheep pppd[26365]: CHAP authentication succeeded
Mar 10 14:23:40 sheep pppd[26365]: Local IP address changed to
[...]

is this strange? or not?

i hope someone has a hint for me, because going back to the stable
kernel would mean being bound to 2.6.11-rc2-bk10 :(

thank you for any hints,
Christian.

PS: Steven, i've cc'ed you because you have trouble with new 2.6.11
kernels and pppd too. maybe unrelated, maybe not.
-- 
BOFH excuse #185:

system consumed all the paper for paging
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: oom with 2.6.11

2005-03-10 Thread Andrew Morton
Christian Kujau [EMAIL PROTECTED] wrote:

 i was going to compile 2.6.11-rc5-bk4, to sort out the bad kernel.
 compiling went fine. ok, finished some email, ok, suddenly my swap was
 used up again, and no memory left - uh oh! OOM again, with 2.6.11-rc5-bk2!

Well if you ran out of swap then yes, the oom-killer will visit you.

Why did you run out of swapspace?
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: oom with 2.6.11

2005-03-10 Thread Christian Kujau
Andrew Morton wrote:
 Christian Kujau [EMAIL PROTECTED] wrote:
 
i was going to compile 2.6.11-rc5-bk4, to sort out the bad kernel.
compiling went fine. ok, finished some email, ok, suddenly my swap was
used up again, and no memory left - uh oh! OOM again, with 2.6.11-rc5-bk2!
 
 
 Well if you ran out of swap then yes, the oom-killer will visit you.
 
 Why did you run out of swapspace?

hm, if i only knew. i don't know how long it took the other night to go
from normal to OOM. but today, with 2.6.11-rc5-bk2 (well, yesterday
actually) i was working normally, and all of a sudden swap goes from 170MB
used swap (normal) to OOM. i think it took a minute or so, but i just
can't tell which application went nuts. today the first process that got
killed was ssh-agent, the other day it was mysqld. but even after this,
it should've released some memory, right? but the oom-killer goes on and
on and kills the next task.

i'll monitor memory usage tonight and see what it gives. these pppd
messages are suspicious though.

thank you,
Christian.
-- 
BOFH excuse #135:

You put the disk in upside down.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: oom with 2.6.11

2005-03-10 Thread Pasi Kärkkäinen
On Fri, Mar 11, 2005 at 02:14:14AM +0100, Christian Kujau wrote:
 Andrew Morton wrote:
  Christian Kujau [EMAIL PROTECTED] wrote:
  
 i was going to compile 2.6.11-rc5-bk4, to sort out the bad kernel.
 compiling went fine. ok, finished some email, ok, suddenly my swap was
 used up again, and no memory left - uh oh! OOM again, with 2.6.11-rc5-bk2!
  
  
  Well if you ran out of swap then yes, the oom-killer will visit you.
  
  Why did you run out of swapspace?
 
 hm, if i only knew. i don't know how long it took the other night to go
 from normal to OOM. but today, with 2.6.11-rc5-bk2 (well, yesterday
 actually) i was working normally, and all of a sudden swap goes from 170MB
 used swap (normal) to OOM. i think it took a minute or so, but i just
 can't tell which application went nuts. today the first process that got
 killed was ssh-agent, the other day it was mysqld. but even after this,
 it should've released some memory, right? but the oom-killer goes on and
 on and kills the next task.
 
 i'll monitor memory usage tonight and see what it gives. these pppd
 messages are suspicious though.
 

I've also seen this 3 times now.. I'm running Xen 2.0 and the 2.6.10-xen0
kernel (with -as5 patch) goes OOM after a couple of days operation.. and
then the box reboots.

The virtual machines are OK, it's only the dom0 kernel that goes OOM.. 
And I'm not running anything special on dom0, only xen control stuff
(which is written in python..), ntp, nfs server, ssh, lvm2 and software raid.

Now I'm running a script which logs the cpu/memory/swap usage every 1
minutes.. trying to see if I can find the cause for the OOM.

-- Pasi Kärkkäinen
   
   ^
. .
 Linux
  /-\
 Choice.of.the
   .Next.Generation.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: oom with 2.6.11

2005-03-09 Thread Christian Kujau
Mauricio Lin wrote:
> Hi Christian,
> 
> I found the 2.6.11-rc3 patch. The oom killer modification from
> Arcangeli was included in 2.6.11-rc3. Right? So this is correct, so
> the problem is not related to Arcangeli modification.
> 
> Does anyone have idea?

hi Mauricio,

thank you for your answers, but i'm unable to investigate right now, will
do so in the evening.

thanks,
Christian.
-- 
BOFH excuse #240:

Too many little pins on CPU confusing it, bend back and forth until 10-20%
are neatly removed. Do _not_ leave metal bits visible!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: oom with 2.6.11

2005-03-09 Thread Mauricio Lin
Hi Christian,

I found the 2.6.11-rc3 patch. The oom killer modification from
Arcangeli was included in 2.6.11-rc3. Right? So this is correct, so
the problem is not related to Arcangeli modification.

Does anyone have idea?

BR,

Mauricio Lin.

On Wed, 9 Mar 2005 09:18:31 -0400, Mauricio Lin <[EMAIL PROTECTED]> wrote:
> Hi Christian,
> 
> Could you check the mm/oom_kill.c for your kernel 2.6.11-rc3?
> 
> During the 2.6.11-rc development, the oom killer was changed by Andrea
> Arcangeli. I do not remember exactly which was the version that this
> modification was included, perhaps in kernel 2.6.11-rc4.
> 
> Now this oom killer modification is part of 2.6.11 vanilla kernel.
> 
> Send the mm/oom_kill.c of 2.6.11-rc3 to me, please. Let me confirm my doubt.
> 
> BR,
> 
> Mauricio Lin.
> 
> On Tue, 08 Mar 2005 16:21:21 +0100, Christian Kujau <[EMAIL PROTECTED]> wrote:
> > hallo list,
> >
> > today my machine went out out memory and noticing it several hours after
> > the first OOM message in the log, i wonder
> >1) why this happened at all and
> >2) why almost every service was killed despite the clever algorithms
> >   documented in mm/oom_kill.c.
> >
> > the first oom message went to the syslog at 01:27, i was away and no heavy
> > tasks were scheduled:
> >
> > http://nerdbynature.de/bits/sheep/2.6.11/oom/oom_2.6.11.txt
> >
> > mysqld got killed by the oom killer, so i have to suspect mysql for being
> > the reason for oom here, even that i know that mysqld is running all day
> > long. several other tasks got killed, but "Free swap" stays at 0kB and the
> > oom killer kills almost every other tasks, with no success in freeing ram.
> >
> > the log stops at 03:21, perhaps syslog-ng got killed.
> > at around 07:31 i noticed the mess, did SYSRQ-E and now i was able to
> > login again. i pressed SYSRQ-M/T/P too, they are all in the log. at this
> > time loadavg was at 249 ;)
> >
> > i went to runlevel 2, then up again to 3 and all services are up and
> > running again.
> >
> > some 2.6.11-rc3 BK snapshot was running pretty stable (no OOM) for ~30
> > days before i switched to 2.6.11 (vanilla) a few days ago. i have to (not)
> > reproduce the problem the next night, i wonder if it will happen again.
> >
> > do you vm-gurus have any idea to the points asked above?
> >
> > more infos about the box here: http://nerdbynature.de/bits/sheep/2.6.11/oom/
> >
> > thank you for your comments,
> > Christian.
> > --
> > BOFH excuse #281:
> >
> > The co-locator cannot verify the frame-relay gateway to the ISDN server.
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [EMAIL PROTECTED]
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> >
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: OOM with 2.6.11

2005-03-09 Thread Christian Kujau

...replying to myself: it happened again!
switched back to 2.6.11-rc5-bk2, details will follow.

thanks,
Christian.
-- 
BOFH excuse #311:

transient bus protocol violation
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: oom with 2.6.11

2005-03-09 Thread Mauricio Lin
Hi Christian,

Could you check the mm/oom_kill.c for your kernel 2.6.11-rc3?

During the 2.6.11-rc development, the oom killer was changed by Andrea
Arcangeli. I do not remember exactly which was the version that this
modification was included, perhaps in kernel 2.6.11-rc4.

Now this oom killer modification is part of 2.6.11 vanilla kernel.

Send the mm/oom_kill.c of 2.6.11-rc3 to me, please. Let me confirm my doubt.

BR,

Mauricio Lin. 

On Tue, 08 Mar 2005 16:21:21 +0100, Christian Kujau <[EMAIL PROTECTED]> wrote:
> hallo list,
> 
> today my machine went out out memory and noticing it several hours after
> the first OOM message in the log, i wonder
>1) why this happened at all and
>2) why almost every service was killed despite the clever algorithms
>   documented in mm/oom_kill.c.
> 
> the first oom message went to the syslog at 01:27, i was away and no heavy
> tasks were scheduled:
> 
> http://nerdbynature.de/bits/sheep/2.6.11/oom/oom_2.6.11.txt
> 
> mysqld got killed by the oom killer, so i have to suspect mysql for being
> the reason for oom here, even that i know that mysqld is running all day
> long. several other tasks got killed, but "Free swap" stays at 0kB and the
> oom killer kills almost every other tasks, with no success in freeing ram.
> 
> the log stops at 03:21, perhaps syslog-ng got killed.
> at around 07:31 i noticed the mess, did SYSRQ-E and now i was able to
> login again. i pressed SYSRQ-M/T/P too, they are all in the log. at this
> time loadavg was at 249 ;)
> 
> i went to runlevel 2, then up again to 3 and all services are up and
> running again.
> 
> some 2.6.11-rc3 BK snapshot was running pretty stable (no OOM) for ~30
> days before i switched to 2.6.11 (vanilla) a few days ago. i have to (not)
> reproduce the problem the next night, i wonder if it will happen again.
> 
> do you vm-gurus have any idea to the points asked above?
> 
> more infos about the box here: http://nerdbynature.de/bits/sheep/2.6.11/oom/
> 
> thank you for your comments,
> Christian.
> --
> BOFH excuse #281:
> 
> The co-locator cannot verify the frame-relay gateway to the ISDN server.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: oom with 2.6.11

2005-03-09 Thread Mauricio Lin
Hi Christian,

Could you check the mm/oom_kill.c for your kernel 2.6.11-rc3?

During the 2.6.11-rc development, the oom killer was changed by Andrea
Arcangeli. I do not remember exactly which was the version that this
modification was included, perhaps in kernel 2.6.11-rc4.

Now this oom killer modification is part of 2.6.11 vanilla kernel.

Send the mm/oom_kill.c of 2.6.11-rc3 to me, please. Let me confirm my doubt.

BR,

Mauricio Lin. 

On Tue, 08 Mar 2005 16:21:21 +0100, Christian Kujau [EMAIL PROTECTED] wrote:
 hallo list,
 
 today my machine went out out memory and noticing it several hours after
 the first OOM message in the log, i wonder
1) why this happened at all and
2) why almost every service was killed despite the clever algorithms
   documented in mm/oom_kill.c.
 
 the first oom message went to the syslog at 01:27, i was away and no heavy
 tasks were scheduled:
 
 http://nerdbynature.de/bits/sheep/2.6.11/oom/oom_2.6.11.txt
 
 mysqld got killed by the oom killer, so i have to suspect mysql for being
 the reason for oom here, even that i know that mysqld is running all day
 long. several other tasks got killed, but Free swap stays at 0kB and the
 oom killer kills almost every other tasks, with no success in freeing ram.
 
 the log stops at 03:21, perhaps syslog-ng got killed.
 at around 07:31 i noticed the mess, did SYSRQ-E and now i was able to
 login again. i pressed SYSRQ-M/T/P too, they are all in the log. at this
 time loadavg was at 249 ;)
 
 i went to runlevel 2, then up again to 3 and all services are up and
 running again.
 
 some 2.6.11-rc3 BK snapshot was running pretty stable (no OOM) for ~30
 days before i switched to 2.6.11 (vanilla) a few days ago. i have to (not)
 reproduce the problem the next night, i wonder if it will happen again.
 
 do you vm-gurus have any idea to the points asked above?
 
 more infos about the box here: http://nerdbynature.de/bits/sheep/2.6.11/oom/
 
 thank you for your comments,
 Christian.
 --
 BOFH excuse #281:
 
 The co-locator cannot verify the frame-relay gateway to the ISDN server.
 -
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: OOM with 2.6.11

2005-03-09 Thread Christian Kujau

...replying to myself: it happened again!
switched back to 2.6.11-rc5-bk2, details will follow.

thanks,
Christian.
-- 
BOFH excuse #311:

transient bus protocol violation
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: oom with 2.6.11

2005-03-09 Thread Mauricio Lin
Hi Christian,

I found the 2.6.11-rc3 patch. The oom killer modification from
Arcangeli was included in 2.6.11-rc3. Right? So this is correct, so
the problem is not related to Arcangeli modification.

Does anyone have idea?

BR,

Mauricio Lin.

On Wed, 9 Mar 2005 09:18:31 -0400, Mauricio Lin [EMAIL PROTECTED] wrote:
 Hi Christian,
 
 Could you check the mm/oom_kill.c for your kernel 2.6.11-rc3?
 
 During the 2.6.11-rc development, the oom killer was changed by Andrea
 Arcangeli. I do not remember exactly which was the version that this
 modification was included, perhaps in kernel 2.6.11-rc4.
 
 Now this oom killer modification is part of 2.6.11 vanilla kernel.
 
 Send the mm/oom_kill.c of 2.6.11-rc3 to me, please. Let me confirm my doubt.
 
 BR,
 
 Mauricio Lin.
 
 On Tue, 08 Mar 2005 16:21:21 +0100, Christian Kujau [EMAIL PROTECTED] wrote:
  hallo list,
 
  today my machine went out out memory and noticing it several hours after
  the first OOM message in the log, i wonder
 1) why this happened at all and
 2) why almost every service was killed despite the clever algorithms
documented in mm/oom_kill.c.
 
  the first oom message went to the syslog at 01:27, i was away and no heavy
  tasks were scheduled:
 
  http://nerdbynature.de/bits/sheep/2.6.11/oom/oom_2.6.11.txt
 
  mysqld got killed by the oom killer, so i have to suspect mysql for being
  the reason for oom here, even that i know that mysqld is running all day
  long. several other tasks got killed, but Free swap stays at 0kB and the
  oom killer kills almost every other tasks, with no success in freeing ram.
 
  the log stops at 03:21, perhaps syslog-ng got killed.
  at around 07:31 i noticed the mess, did SYSRQ-E and now i was able to
  login again. i pressed SYSRQ-M/T/P too, they are all in the log. at this
  time loadavg was at 249 ;)
 
  i went to runlevel 2, then up again to 3 and all services are up and
  running again.
 
  some 2.6.11-rc3 BK snapshot was running pretty stable (no OOM) for ~30
  days before i switched to 2.6.11 (vanilla) a few days ago. i have to (not)
  reproduce the problem the next night, i wonder if it will happen again.
 
  do you vm-gurus have any idea to the points asked above?
 
  more infos about the box here: http://nerdbynature.de/bits/sheep/2.6.11/oom/
 
  thank you for your comments,
  Christian.
  --
  BOFH excuse #281:
 
  The co-locator cannot verify the frame-relay gateway to the ISDN server.
  -
  To unsubscribe from this list: send the line unsubscribe linux-kernel in
  the body of a message to [EMAIL PROTECTED]
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
  Please read the FAQ at  http://www.tux.org/lkml/
 

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: oom with 2.6.11

2005-03-09 Thread Christian Kujau
Mauricio Lin wrote:
 Hi Christian,
 
 I found the 2.6.11-rc3 patch. The oom killer modification from
 Arcangeli was included in 2.6.11-rc3. Right? So this is correct, so
 the problem is not related to Arcangeli modification.
 
 Does anyone have idea?

hi Mauricio,

thank you for your answers, but i'm unable to investigate right now, will
do so in the evening.

thanks,
Christian.
-- 
BOFH excuse #240:

Too many little pins on CPU confusing it, bend back and forth until 10-20%
are neatly removed. Do _not_ leave metal bits visible!
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/