[SOLVED] Re: oom with 2.6.11
for the record: the "oom bug" turned out to be user generated. a *lot* of small scripts were started, triggering oom again and again, user error. the source of the problem is still pppd and the discussion continues as a debian bugreport: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=299875 thank you for all your help, Christian. -- BOFH excuse #139: UBNC (user brain not connected) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[SOLVED] Re: oom with 2.6.11
for the record: the oom bug turned out to be user generated. a *lot* of small scripts were started, triggering oom again and again, user error. the source of the problem is still pppd and the discussion continues as a debian bugreport: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=299875 thank you for all your help, Christian. -- BOFH excuse #139: UBNC (user brain not connected) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: oom with 2.6.11
Coywolf Qi Hunt wrote: > I do "grep check-route.sh oom_2.6.11.3.txt | wc" and it shows 4365 duh, good catch! really! > lines, which means there're 4365 that script processes running, from > pid 4260 to12747, mostly with pretty low points, 123. > Based on this points, suppose each script consumes 100k, that'll be > 100k*4k=400M roughly. And your box's is merely 256M MemTotal. yes, i just checked, the script is looping and crond is starting a new one, and anotherand the oom-killer does not catch it, because it's too small and of course don't know where it is coming from (crond). > Check this script and disable it; see what will happen. yes, will do that. on a (not so unimportant) side-note: i was told the whole thing should be fixed with 2.6.11.4: [PATCH] CAN-2005-0384: Remote Linux DoS on ppp servers after all it seems to be PEBKAC and bad luck...what a week. thank you for your help, Christian. -- BOFH excuse #416: We're out of slots on the server - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: oom with 2.6.11
On Thu, 17 Mar 2005 02:27:29 +0100, Christian Kujau <[EMAIL PROTECTED]> wrote: > hello again, > > unfortunately i've hit OOM again, this time with "#define DEBUG" enabled > in mm/oom_kill.c: > > http://nerdbynature.de/bits/sheep/2.6.11/oom/oom_2.6.11.3.txt > > by "Mar 16 18:32" pppd died again and OOM kicked in 30min later. > (there are a *lot* messages of a shell script named "check-route.sh". it's > a little script which runs every minute or so to check if my default route > is still ok and if ping to the outside world are possible. definitely not > a memory hog, but noisy) I do "grep check-route.sh oom_2.6.11.3.txt | wc" and it shows 4365 lines, which means there're 4365 that script processes running, from pid 4260 to12747, mostly with pretty low points, 123. Based on this points, suppose each script consumes 100k, that'll be 100k*4k=400M roughly. And your box's is merely 256M MemTotal. The currently oom algorithm fails to find out such kinds of memory hog. And the kernel kills other innocent processes because the its points is much lower than most others. Check this script and disable it; see what will happen. -- Coywolf Qi Hunt http://sosdg.org/~coywolf/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: oom with 2.6.11
On Thu, 17 Mar 2005 02:27:29 +0100, Christian Kujau [EMAIL PROTECTED] wrote: hello again, unfortunately i've hit OOM again, this time with #define DEBUG enabled in mm/oom_kill.c: http://nerdbynature.de/bits/sheep/2.6.11/oom/oom_2.6.11.3.txt by Mar 16 18:32 pppd died again and OOM kicked in 30min later. (there are a *lot* messages of a shell script named check-route.sh. it's a little script which runs every minute or so to check if my default route is still ok and if ping to the outside world are possible. definitely not a memory hog, but noisy) I do grep check-route.sh oom_2.6.11.3.txt | wc and it shows 4365 lines, which means there're 4365 that script processes running, from pid 4260 to12747, mostly with pretty low points, 123. Based on this points, suppose each script consumes 100k, that'll be 100k*4k=400M roughly. And your box's is merely 256M MemTotal. The currently oom algorithm fails to find out such kinds of memory hog. And the kernel kills other innocent processes because the its points is much lower than most others. Check this script and disable it; see what will happen. -- Coywolf Qi Hunt http://sosdg.org/~coywolf/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: oom with 2.6.11
Coywolf Qi Hunt wrote: I do grep check-route.sh oom_2.6.11.3.txt | wc and it shows 4365 duh, good catch! really! lines, which means there're 4365 that script processes running, from pid 4260 to12747, mostly with pretty low points, 123. Based on this points, suppose each script consumes 100k, that'll be 100k*4k=400M roughly. And your box's is merely 256M MemTotal. yes, i just checked, the script is looping and crond is starting a new one, and anotherand the oom-killer does not catch it, because it's too small and of course don't know where it is coming from (crond). Check this script and disable it; see what will happen. yes, will do that. on a (not so unimportant) side-note: i was told the whole thing should be fixed with 2.6.11.4: [PATCH] CAN-2005-0384: Remote Linux DoS on ppp servers after all it seems to be PEBKAC and bad luck...what a week. thank you for your help, Christian. -- BOFH excuse #416: We're out of slots on the server - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: oom with 2.6.11
Andrew Morton wrote: > > Some application went berzerk, used up all the swap and then oomed the box. > > You could perhaps run `top -d1' then hit M so the output is sorted by > bloatiness, then try to catch the culprit. i've already done that. as OOM happens when i am not around, i did that with "ps": http://lkml.org/lkml/2005/3/12/88 http://nerdbynature.de/bits/sheep/2.6.11/oom/daily_stats-2.6.11-rc5-bk2.log.gz > But it would be better to have some app which prints the N most > memory-hungry processes every second and simply scrolls that up the screen. > I'm not aware of such a thing, but it could be cooked up via > /proc/N/cmdline and /proc/N/statm. i hope the link above does reveal this information. i just wrote a bug report for the (debian), ppp package, but to know *where* the memory goes to would really help, i think. thank you, Christian. -- BOFH excuse #72: Satan did it - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: oom with 2.6.11
Christian Kujau <[EMAIL PROTECTED]> wrote: > > unfortunately i've hit OOM again, this time with "#define DEBUG" enabled > in mm/oom_kill.c: > > http://nerdbynature.de/bits/sheep/2.6.11/oom/oom_2.6.11.3.txt > > by "Mar 16 18:32" pppd died again and OOM kicked in 30min later. > (there are a *lot* messages of a shell script named "check-route.sh". it's > a little script which runs every minute or so to check if my default route > is still ok and if ping to the outside world are possible. definitely not > a memory hog, but noisy) > > since tracking the "most memory consuming applications" did not reveal any > hints [1], i have monitored /proc/slabinfo and /proc/meminfo this time: > > http://nerdbynature.de/bits/sheep/2.6.11/oom/daily_stats-2.6.11.3.gz > > as stated before, i was suspecting pppd to be the bad guy here, and yes: i > downgraded pppd to an earlier version and pppd (and the system) survived 2 > terminations of my dial-up ISP. yesterday i've upgraded back again to > current pppd (debian/unstable) and the OOM problem returned. yes, i'll bug > the debian people now (hello!), but grepping for "ppp" in > daily_stats-2.6.11.3.gz gives no hits. so "pppd" does not get *any* points > from mm/oom_kill.c and thus no attempts are made to kill it (it is always > only kill'able with "-9"). The oom-killer tries to be nicer to processes which are running as root. > furthermore, i thought /proc/slabinfo coud give > me some hints about *where* all the memory went in. scrolling down this > file to the bottom, where "SwapFree" shows "0 kB" i don't see any alarming > numbers in the "slabinfo" right above "meminfo". MemTotal: 256372 kB MemFree: 3280 kB Buffers: 608 kB Cached: 3256 kB SwapCached:664 kB Active: 105020 kB Inactive:20364 kB HighTotal: 0 kB HighFree:0 kB LowTotal: 256372 kB LowFree: 3280 kB SwapTotal: 784468 kB SwapFree:0 kB Dirty: 12 kB Writeback: 0 kB Mapped: 130332 kB Slab:61424 kB CommitLimit:912652 kB Committed_AS: 1323548 kB PageTables: 51668 kB VmallocTotal: 778184 kB VmallocUsed: 3464 kB VmallocChunk: 774492 kB Some application went berzerk, used up all the swap and then oomed the box. You could perhaps run `top -d1' then hit M so the output is sorted by bloatiness, then try to catch the culprit. But it would be better to have some app which prints the N most memory-hungry processes every second and simply scrolls that up the screen. I'm not aware of such a thing, but it could be cooked up via /proc/N/cmdline and /proc/N/statm. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: oom with 2.6.11
hello again, unfortunately i've hit OOM again, this time with "#define DEBUG" enabled in mm/oom_kill.c: http://nerdbynature.de/bits/sheep/2.6.11/oom/oom_2.6.11.3.txt by "Mar 16 18:32" pppd died again and OOM kicked in 30min later. (there are a *lot* messages of a shell script named "check-route.sh". it's a little script which runs every minute or so to check if my default route is still ok and if ping to the outside world are possible. definitely not a memory hog, but noisy) since tracking the "most memory consuming applications" did not reveal any hints [1], i have monitored /proc/slabinfo and /proc/meminfo this time: http://nerdbynature.de/bits/sheep/2.6.11/oom/daily_stats-2.6.11.3.gz as stated before, i was suspecting pppd to be the bad guy here, and yes: i downgraded pppd to an earlier version and pppd (and the system) survived 2 terminations of my dial-up ISP. yesterday i've upgraded back again to current pppd (debian/unstable) and the OOM problem returned. yes, i'll bug the debian people now (hello!), but grepping for "ppp" in daily_stats-2.6.11.3.gz gives no hits. so "pppd" does not get *any* points from mm/oom_kill.c and thus no attempts are made to kill it (it is always only kill'able with "-9"). furthermore, i thought /proc/slabinfo coud give me some hints about *where* all the memory went in. scrolling down this file to the bottom, where "SwapFree" shows "0 kB" i don't see any alarming numbers in the "slabinfo" right above "meminfo". could someone give me a hint, please? thanks, Christian. [1] http://lkml.org/lkml/2005/3/12/88 more info for this recent OOM issue: http://nerdbynature.de/bits/sheep/2.6.11/oom/dmesg.2.6.11.3 http://nerdbynature.de/bits/sheep/2.6.11/oom/lsmod_2.6.11.3 http://nerdbynature.de/bits/sheep/2.6.11/oom/config-2.6.11.3.gz -- BOFH excuse #62: need to wrap system in aluminum foil to fix problem - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: oom with 2.6.11
hello again, unfortunately i've hit OOM again, this time with #define DEBUG enabled in mm/oom_kill.c: http://nerdbynature.de/bits/sheep/2.6.11/oom/oom_2.6.11.3.txt by Mar 16 18:32 pppd died again and OOM kicked in 30min later. (there are a *lot* messages of a shell script named check-route.sh. it's a little script which runs every minute or so to check if my default route is still ok and if ping to the outside world are possible. definitely not a memory hog, but noisy) since tracking the most memory consuming applications did not reveal any hints [1], i have monitored /proc/slabinfo and /proc/meminfo this time: http://nerdbynature.de/bits/sheep/2.6.11/oom/daily_stats-2.6.11.3.gz as stated before, i was suspecting pppd to be the bad guy here, and yes: i downgraded pppd to an earlier version and pppd (and the system) survived 2 terminations of my dial-up ISP. yesterday i've upgraded back again to current pppd (debian/unstable) and the OOM problem returned. yes, i'll bug the debian people now (hello!), but grepping for ppp in daily_stats-2.6.11.3.gz gives no hits. so pppd does not get *any* points from mm/oom_kill.c and thus no attempts are made to kill it (it is always only kill'able with -9). furthermore, i thought /proc/slabinfo coud give me some hints about *where* all the memory went in. scrolling down this file to the bottom, where SwapFree shows 0 kB i don't see any alarming numbers in the slabinfo right above meminfo. could someone give me a hint, please? thanks, Christian. [1] http://lkml.org/lkml/2005/3/12/88 more info for this recent OOM issue: http://nerdbynature.de/bits/sheep/2.6.11/oom/dmesg.2.6.11.3 http://nerdbynature.de/bits/sheep/2.6.11/oom/lsmod_2.6.11.3 http://nerdbynature.de/bits/sheep/2.6.11/oom/config-2.6.11.3.gz -- BOFH excuse #62: need to wrap system in aluminum foil to fix problem - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: oom with 2.6.11
Christian Kujau [EMAIL PROTECTED] wrote: unfortunately i've hit OOM again, this time with #define DEBUG enabled in mm/oom_kill.c: http://nerdbynature.de/bits/sheep/2.6.11/oom/oom_2.6.11.3.txt by Mar 16 18:32 pppd died again and OOM kicked in 30min later. (there are a *lot* messages of a shell script named check-route.sh. it's a little script which runs every minute or so to check if my default route is still ok and if ping to the outside world are possible. definitely not a memory hog, but noisy) since tracking the most memory consuming applications did not reveal any hints [1], i have monitored /proc/slabinfo and /proc/meminfo this time: http://nerdbynature.de/bits/sheep/2.6.11/oom/daily_stats-2.6.11.3.gz as stated before, i was suspecting pppd to be the bad guy here, and yes: i downgraded pppd to an earlier version and pppd (and the system) survived 2 terminations of my dial-up ISP. yesterday i've upgraded back again to current pppd (debian/unstable) and the OOM problem returned. yes, i'll bug the debian people now (hello!), but grepping for ppp in daily_stats-2.6.11.3.gz gives no hits. so pppd does not get *any* points from mm/oom_kill.c and thus no attempts are made to kill it (it is always only kill'able with -9). The oom-killer tries to be nicer to processes which are running as root. furthermore, i thought /proc/slabinfo coud give me some hints about *where* all the memory went in. scrolling down this file to the bottom, where SwapFree shows 0 kB i don't see any alarming numbers in the slabinfo right above meminfo. MemTotal: 256372 kB MemFree: 3280 kB Buffers: 608 kB Cached: 3256 kB SwapCached:664 kB Active: 105020 kB Inactive:20364 kB HighTotal: 0 kB HighFree:0 kB LowTotal: 256372 kB LowFree: 3280 kB SwapTotal: 784468 kB SwapFree:0 kB Dirty: 12 kB Writeback: 0 kB Mapped: 130332 kB Slab:61424 kB CommitLimit:912652 kB Committed_AS: 1323548 kB PageTables: 51668 kB VmallocTotal: 778184 kB VmallocUsed: 3464 kB VmallocChunk: 774492 kB Some application went berzerk, used up all the swap and then oomed the box. You could perhaps run `top -d1' then hit M so the output is sorted by bloatiness, then try to catch the culprit. But it would be better to have some app which prints the N most memory-hungry processes every second and simply scrolls that up the screen. I'm not aware of such a thing, but it could be cooked up via /proc/N/cmdline and /proc/N/statm. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: oom with 2.6.11
Andrew Morton wrote: Some application went berzerk, used up all the swap and then oomed the box. You could perhaps run `top -d1' then hit M so the output is sorted by bloatiness, then try to catch the culprit. i've already done that. as OOM happens when i am not around, i did that with ps: http://lkml.org/lkml/2005/3/12/88 http://nerdbynature.de/bits/sheep/2.6.11/oom/daily_stats-2.6.11-rc5-bk2.log.gz But it would be better to have some app which prints the N most memory-hungry processes every second and simply scrolls that up the screen. I'm not aware of such a thing, but it could be cooked up via /proc/N/cmdline and /proc/N/statm. i hope the link above does reveal this information. i just wrote a bug report for the (debian), ppp package, but to know *where* the memory goes to would really help, i think. thank you, Christian. -- BOFH excuse #72: Satan did it - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: oom with 2.6.11
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Mauricio Lin wrote: >>> >>>Did this problem start from 2.6.11-rc2-bk10? >> >>i noticed it first at 2.6.11, then again with 2.6.11-rc5-bk2. suspecting >>pppd to be the culprit to chew up all RAM after being terminated by my ISP >>once a day - i just have to wait (must be around 2a.m.). > > Have you tried with 2.6.10 in order to check this problem? i don't think i've run plain 2.6.10 at all, syslog does not say so. i've run several versions of 2.6.9, then 2.6.10-rc1, 2.6.10-rc1-bk19, 2.6.10-rc2-bk9, 2.6.10-rc3-bk11, 2.6.11-rc2-bk10, 2.6.11-rc5-bk2. as stated earlier, the problem kicked in when i switched to 2.6.11, i switched back to 2.6.11-rc5-bk2 and it happened again. by looking at the syslog i am (still) suspecting pppd to be the source of OOM, but it did not show up as a memory pig in "ps". i am running 2.6.11.3 now and the machine survived almost 2 passes of pppd's daily hangups. i'll keep an eye on it. thank you for your concern, Christian. - -- BOFH excuse #357: I'd love to help you -- it's just that the Boss won't let me near the computer. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.0 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFCNu1e+A7rjkF8z0wRAmSPAJ49erb21bO6inHPn/zoUH2gNHJk8QCguYDA HyzXS9whrJnb9eXVz9IP8zc= =zhEx -END PGP SIGNATURE- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: oom with 2.6.11
Hi Christian, On Fri, 11 Mar 2005 16:09:24 +0100, Christian Kujau <[EMAIL PROTECTED]> wrote: > Mauricio Lin wrote: > > Hi Christian, > > > > I would like to know what are the kernel versions this problem happened. > > > > Did this problem start from 2.6.11-rc2-bk10? > > i noticed it first at 2.6.11, then again with 2.6.11-rc5-bk2. suspecting > pppd to be the culprit to chew up all RAM after being terminated by my ISP > once a day - i just have to wait (must be around 2a.m.). Have you tried with 2.6.10 in order to check this problem? BR, Mauricio Lin. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: oom with 2.6.11
Hi Christian, On Fri, 11 Mar 2005 16:09:24 +0100, Christian Kujau [EMAIL PROTECTED] wrote: Mauricio Lin wrote: Hi Christian, I would like to know what are the kernel versions this problem happened. Did this problem start from 2.6.11-rc2-bk10? i noticed it first at 2.6.11, then again with 2.6.11-rc5-bk2. suspecting pppd to be the culprit to chew up all RAM after being terminated by my ISP once a day - i just have to wait (must be around 2a.m.). Have you tried with 2.6.10 in order to check this problem? BR, Mauricio Lin. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: oom with 2.6.11
hi again, i had to wait for my pppoe session to be terminated by the remote peer [1], and now it happened again with 2.6.11-rc5-bk2: http://nerdbynature.de/bits/sheep/2.6.11/oom/oom_2.6.11-rc5-bk2_2.txt http://nerdbynature.de/bits/sheep/2.6.11/oom/lsmod_2.6.11-rc5-bk2 http://nerdbynature.de/bits/sheep/2.6.11/oom/config-2.6.11-rc5-bk2.gz i wanted to see the application chewing up the most RAM so i was running "ps aux --sort=-vsz,-rss | head -n1" every 5sec and wrote it's output to a file: http://nerdbynature.de/bits/sheep/2.6.11/oom/daily_stats-2.6.11-rc5-bk2.log.gz but even when almost all swap is used up, mysqld ist still no#1 with 131928 KB VSZ, as always. still suspecting something wrong with pppd, the binary istself does *not* show up on the top-ten-memory-pigs. real memory is almost always used up, but that's ok. i stripped it down to show only the timestamp and the "Swap: .. used" numbers: http://nerdbynature.de/bits/sheep/2.6.11/oom/daily_stats-2.6.11-rc5-bk2_stripped.log.gz as you can see, until "Sat Mar 12 01:05:51" 122MB swap was used, from then on swap-usage goes up until "Sat Mar 12 01:49:45", when all swap is in use. the system is unable to keep up with writing memory usage every 5 seconds, the next entry is from 12 "01:56:24" - 6 minutes after the first OOM message (i'm sure someone could feed this to gnuplot and make nice bars out of it - i can't) where else could i look to actually *see* where the memory goes to? somehow lost in kernel versions, Christian. [1] "remote peer" aka "ISP". is there a way to "simulate" a "LCP terminated by peer" locally? otherwise i really have to wait 24h to trigger the bug. -- BOFH excuse #373: Suspicious pointer corrupted virtual machine - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: oom with 2.6.11
hi again, i had to wait for my pppoe session to be terminated by the remote peer [1], and now it happened again with 2.6.11-rc5-bk2: http://nerdbynature.de/bits/sheep/2.6.11/oom/oom_2.6.11-rc5-bk2_2.txt http://nerdbynature.de/bits/sheep/2.6.11/oom/lsmod_2.6.11-rc5-bk2 http://nerdbynature.de/bits/sheep/2.6.11/oom/config-2.6.11-rc5-bk2.gz i wanted to see the application chewing up the most RAM so i was running ps aux --sort=-vsz,-rss | head -n1 every 5sec and wrote it's output to a file: http://nerdbynature.de/bits/sheep/2.6.11/oom/daily_stats-2.6.11-rc5-bk2.log.gz but even when almost all swap is used up, mysqld ist still no#1 with 131928 KB VSZ, as always. still suspecting something wrong with pppd, the binary istself does *not* show up on the top-ten-memory-pigs. real memory is almost always used up, but that's ok. i stripped it down to show only the timestamp and the Swap: .. used numbers: http://nerdbynature.de/bits/sheep/2.6.11/oom/daily_stats-2.6.11-rc5-bk2_stripped.log.gz as you can see, until Sat Mar 12 01:05:51 122MB swap was used, from then on swap-usage goes up until Sat Mar 12 01:49:45, when all swap is in use. the system is unable to keep up with writing memory usage every 5 seconds, the next entry is from 12 01:56:24 - 6 minutes after the first OOM message (i'm sure someone could feed this to gnuplot and make nice bars out of it - i can't) where else could i look to actually *see* where the memory goes to? somehow lost in kernel versions, Christian. [1] remote peer aka ISP. is there a way to simulate a LCP terminated by peer locally? otherwise i really have to wait 24h to trigger the bug. -- BOFH excuse #373: Suspicious pointer corrupted virtual machine - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: oom with 2.6.11
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Coywolf Qi Hunt wrote: > In file mm/oom_kill.c, uncomment line 24: /* #define DEBUG */. > And next time when oom happens again, we'll see the badness. oh, good hint. will do this before the next reboot (in a few hours i guess) thanks, Christian. - -- BOFH excuse #134: because of network lag due to too many people playing deathmatch -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.0 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFCMbTj+A7rjkF8z0wRAmdAAJ4sCIaUqZaKn7gGtkpN7Wb47acFiACgrOSe DpT3tE1/4zfPIDueDwBMhDU= =o6Lr -END PGP SIGNATURE- -- BOFH excuse #37: heavy gravity fluctuation, move computer to floor rapidly - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: oom with 2.6.11
Mauricio Lin wrote: > Hi Christian, > > I would like to know what are the kernel versions this problem happened. > > Did this problem start from 2.6.11-rc2-bk10? i noticed it first at 2.6.11, then again with 2.6.11-rc5-bk2. suspecting pppd to be the culprit to chew up all RAM after being terminated by my ISP once a day - i just have to wait (must be around 2a.m.). thanks, Christian. -- BOFH excuse #134: because of network lag due to too many people playing deathmatch - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: oom with 2.6.11
In file mm/oom_kill.c, uncomment line 24: /* #define DEBUG */. And next time when oom happens again, we'll see the badness. --coywolf On Tue, 08 Mar 2005 16:21:21 +0100, Christian Kujau <[EMAIL PROTECTED]> wrote: > hallo list, > > today my machine went out out memory and noticing it several hours after > the first OOM message in the log, i wonder >1) why this happened at all and >2) why almost every service was killed despite the clever algorithms > documented in mm/oom_kill.c. > > the first oom message went to the syslog at 01:27, i was away and no heavy > tasks were scheduled: > > http://nerdbynature.de/bits/sheep/2.6.11/oom/oom_2.6.11.txt > > mysqld got killed by the oom killer, so i have to suspect mysql for being > the reason for oom here, even that i know that mysqld is running all day > long. several other tasks got killed, but "Free swap" stays at 0kB and the > oom killer kills almost every other tasks, with no success in freeing ram. > > the log stops at 03:21, perhaps syslog-ng got killed. > at around 07:31 i noticed the mess, did SYSRQ-E and now i was able to > login again. i pressed SYSRQ-M/T/P too, they are all in the log. at this > time loadavg was at 249 ;) > > i went to runlevel 2, then up again to 3 and all services are up and > running again. > > some 2.6.11-rc3 BK snapshot was running pretty stable (no OOM) for ~30 > days before i switched to 2.6.11 (vanilla) a few days ago. i have to (not) > reproduce the problem the next night, i wonder if it will happen again. > > do you vm-gurus have any idea to the points asked above? > > more infos about the box here: http://nerdbynature.de/bits/sheep/2.6.11/oom/ > > thank you for your comments, > Christian. > -- > BOFH excuse #281: > > The co-locator cannot verify the frame-relay gateway to the ISDN server. > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > -- Coywolf Qi Hunt Homepage http://sosdg.org/~coywolf/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: oom with 2.6.11
Hi Christian, I would like to know what are the kernel versions this problem happened. Did this problem start from 2.6.11-rc2-bk10? BR, Mauricio Lin. On Thu, 10 Mar 2005 16:12:27 +0100, Christian Kujau <[EMAIL PROTECTED]> wrote: > ok, > > as "promised", it the OOM happened again with the same plain 2.6.11, > details here. > > http://nerdbynature.de/bits/sheep/2.6.11/oom/oom_2.6.11_2.txt > > the following is a quite long, but please read on > (if anyone is reading at all :)) > > this time it happened at 08:01, and i could image some heavy cron jobs > were going on. but as i said: "it did not happen before". there are also > output of SYSRQ-T/M/P. i did SYSRQ-E to recover the machine, but then > decided to reboot back to 2.6.11-rc5-bk2. > > i had a look at the changelogs too and noticed that ChangeLog-2.6.11 > contains 7 occurrences of "OOM" in the patch desctiption: > > [PATCH] mm: overcommit updates, 2005-01-03 > [PATCH] vmscan: count writeback pages in nr_scanned, 2005-01-08 > [PATCH] possible rq starvation on oom, 2005-01-13 > [PATCH] mm: adjust dirty threshold for lowmem-only mappings, 2005-01-25 > [PATCH] mm: oom-killer tunable, 2005-02-02 > [PATCH] mm: fix several oom killer bugs, 2005-02-02 > [PATCH] Fix oops in alloc_zeroed_user_highpage() when [...],2005-02-09 > > release dates: > 2.6.11-rc5-bk1 26-Feb-2005 > 2.6.11-rc5-bk2 27-Feb-2005 < > 2.6.11-rc5-bk3 28-Feb-2005 > 2.6.11-rc5-bk4 01-Mar-2005 > 2.6.11 02-Mar-2005 > > so i really don't see any patches that *could* have something to do with > the issue here. > > now comes the weird part: > > i was going to compile 2.6.11-rc5-bk4, to sort out the "bad" kernel. > compiling went fine. ok, finished some email, ok, suddenly my swap was > used up again, and no memory left - uh oh! OOM again, with 2.6.11-rc5-bk2! > > to summarize it: > i've run 2.6.11-rc2-bk10 during whole february, then switched to > 2.6.11-rc5-bk2 on 28.02.2005, then to 2.6.11 on 05.03.2005 - and only > noticed with 2.6.11 first, now with 2.6.11-rc5-bk2 too. > > there is an interesting part in the logfiles: > > http://nerdbynature.de/bits/sheep/2.6.11/oom/oom_2.6.11.txt > http://nerdbynature.de/bits/sheep/2.6.11/oom/oom_2.6.11_2.txt > http://nerdbynature.de/bits/sheep/2.6.11/oom/oom_2.6.11-rc5-bk2.txt > > every last message before the "OOM" messages is something with pppd: > > Mar 10 13:45:55 sheep pppd[1567]: Starting link > Mar 10 14:12:29 sheep kernel: oom-killer: gfp_mask=0x1d2 > > Mar 8 00:59:58 sheep pppd[418]: Starting link > Mar 8 01:27:33 sheep kernel: oom-killer: gfp_mask=0xd0 > > Mar 9 07:33:49 sheep pppd[30937]: Starting link > Mar 9 08:01:35 sheep kernel: oom-killer: gfp_mask=0x1d2 > > and 30min later OOM kicks in. normally, pppd (pppoe) gives messages like this: > > Mar 10 14:23:38 sheep pppd[26365]: Starting link > Mar 10 14:23:38 sheep pppd[26365]: Serial connection established. > Mar 10 14:23:38 sheep pppd[26365]: Connect: ppp0 <--> /dev/pts/0 > Mar 10 14:23:38 sheep pppoe[26383]: PADS: Service-Name: '' > Mar 10 14:23:38 sheep pppoe[26383]: PPP session is 6804 > Mar 10 14:23:39 sheep pppd[26365]: CHAP authentication succeeded > Mar 10 14:23:40 sheep pppd[26365]: Local IP address changed to > [...] > > is this strange? or not? > > i hope someone has a hint for me, because "going back to the stable > kernel" would mean "being bound to 2.6.11-rc2-bk10" :( > > thank you for any hints, > Christian. > > PS: Steven, i've cc'ed you because you have trouble with new 2.6.11 > kernels and pppd too. maybe unrelated, maybe not. > -- > BOFH excuse #185: > > system consumed all the paper for paging > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: oom with 2.6.11
Hi Christian, I would like to know what are the kernel versions this problem happened. Did this problem start from 2.6.11-rc2-bk10? BR, Mauricio Lin. On Thu, 10 Mar 2005 16:12:27 +0100, Christian Kujau [EMAIL PROTECTED] wrote: ok, as promised, it the OOM happened again with the same plain 2.6.11, details here. http://nerdbynature.de/bits/sheep/2.6.11/oom/oom_2.6.11_2.txt the following is a quite long, but please read on (if anyone is reading at all :)) this time it happened at 08:01, and i could image some heavy cron jobs were going on. but as i said: it did not happen before. there are also output of SYSRQ-T/M/P. i did SYSRQ-E to recover the machine, but then decided to reboot back to 2.6.11-rc5-bk2. i had a look at the changelogs too and noticed that ChangeLog-2.6.11 contains 7 occurrences of OOM in the patch desctiption: [PATCH] mm: overcommit updates, 2005-01-03 [PATCH] vmscan: count writeback pages in nr_scanned, 2005-01-08 [PATCH] possible rq starvation on oom, 2005-01-13 [PATCH] mm: adjust dirty threshold for lowmem-only mappings, 2005-01-25 [PATCH] mm: oom-killer tunable, 2005-02-02 [PATCH] mm: fix several oom killer bugs, 2005-02-02 [PATCH] Fix oops in alloc_zeroed_user_highpage() when [...],2005-02-09 release dates: 2.6.11-rc5-bk1 26-Feb-2005 2.6.11-rc5-bk2 27-Feb-2005 2.6.11-rc5-bk3 28-Feb-2005 2.6.11-rc5-bk4 01-Mar-2005 2.6.11 02-Mar-2005 so i really don't see any patches that *could* have something to do with the issue here. now comes the weird part: i was going to compile 2.6.11-rc5-bk4, to sort out the bad kernel. compiling went fine. ok, finished some email, ok, suddenly my swap was used up again, and no memory left - uh oh! OOM again, with 2.6.11-rc5-bk2! to summarize it: i've run 2.6.11-rc2-bk10 during whole february, then switched to 2.6.11-rc5-bk2 on 28.02.2005, then to 2.6.11 on 05.03.2005 - and only noticed with 2.6.11 first, now with 2.6.11-rc5-bk2 too. there is an interesting part in the logfiles: http://nerdbynature.de/bits/sheep/2.6.11/oom/oom_2.6.11.txt http://nerdbynature.de/bits/sheep/2.6.11/oom/oom_2.6.11_2.txt http://nerdbynature.de/bits/sheep/2.6.11/oom/oom_2.6.11-rc5-bk2.txt every last message before the OOM messages is something with pppd: Mar 10 13:45:55 sheep pppd[1567]: Starting link Mar 10 14:12:29 sheep kernel: oom-killer: gfp_mask=0x1d2 Mar 8 00:59:58 sheep pppd[418]: Starting link Mar 8 01:27:33 sheep kernel: oom-killer: gfp_mask=0xd0 Mar 9 07:33:49 sheep pppd[30937]: Starting link Mar 9 08:01:35 sheep kernel: oom-killer: gfp_mask=0x1d2 and 30min later OOM kicks in. normally, pppd (pppoe) gives messages like this: Mar 10 14:23:38 sheep pppd[26365]: Starting link Mar 10 14:23:38 sheep pppd[26365]: Serial connection established. Mar 10 14:23:38 sheep pppd[26365]: Connect: ppp0 -- /dev/pts/0 Mar 10 14:23:38 sheep pppoe[26383]: PADS: Service-Name: '' Mar 10 14:23:38 sheep pppoe[26383]: PPP session is 6804 Mar 10 14:23:39 sheep pppd[26365]: CHAP authentication succeeded Mar 10 14:23:40 sheep pppd[26365]: Local IP address changed to [...] is this strange? or not? i hope someone has a hint for me, because going back to the stable kernel would mean being bound to 2.6.11-rc2-bk10 :( thank you for any hints, Christian. PS: Steven, i've cc'ed you because you have trouble with new 2.6.11 kernels and pppd too. maybe unrelated, maybe not. -- BOFH excuse #185: system consumed all the paper for paging - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: oom with 2.6.11
In file mm/oom_kill.c, uncomment line 24: /* #define DEBUG */. And next time when oom happens again, we'll see the badness. --coywolf On Tue, 08 Mar 2005 16:21:21 +0100, Christian Kujau [EMAIL PROTECTED] wrote: hallo list, today my machine went out out memory and noticing it several hours after the first OOM message in the log, i wonder 1) why this happened at all and 2) why almost every service was killed despite the clever algorithms documented in mm/oom_kill.c. the first oom message went to the syslog at 01:27, i was away and no heavy tasks were scheduled: http://nerdbynature.de/bits/sheep/2.6.11/oom/oom_2.6.11.txt mysqld got killed by the oom killer, so i have to suspect mysql for being the reason for oom here, even that i know that mysqld is running all day long. several other tasks got killed, but Free swap stays at 0kB and the oom killer kills almost every other tasks, with no success in freeing ram. the log stops at 03:21, perhaps syslog-ng got killed. at around 07:31 i noticed the mess, did SYSRQ-E and now i was able to login again. i pressed SYSRQ-M/T/P too, they are all in the log. at this time loadavg was at 249 ;) i went to runlevel 2, then up again to 3 and all services are up and running again. some 2.6.11-rc3 BK snapshot was running pretty stable (no OOM) for ~30 days before i switched to 2.6.11 (vanilla) a few days ago. i have to (not) reproduce the problem the next night, i wonder if it will happen again. do you vm-gurus have any idea to the points asked above? more infos about the box here: http://nerdbynature.de/bits/sheep/2.6.11/oom/ thank you for your comments, Christian. -- BOFH excuse #281: The co-locator cannot verify the frame-relay gateway to the ISDN server. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- Coywolf Qi Hunt Homepage http://sosdg.org/~coywolf/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: oom with 2.6.11
Mauricio Lin wrote: Hi Christian, I would like to know what are the kernel versions this problem happened. Did this problem start from 2.6.11-rc2-bk10? i noticed it first at 2.6.11, then again with 2.6.11-rc5-bk2. suspecting pppd to be the culprit to chew up all RAM after being terminated by my ISP once a day - i just have to wait (must be around 2a.m.). thanks, Christian. -- BOFH excuse #134: because of network lag due to too many people playing deathmatch - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: oom with 2.6.11
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Coywolf Qi Hunt wrote: In file mm/oom_kill.c, uncomment line 24: /* #define DEBUG */. And next time when oom happens again, we'll see the badness. oh, good hint. will do this before the next reboot (in a few hours i guess) thanks, Christian. - -- BOFH excuse #134: because of network lag due to too many people playing deathmatch -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.0 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFCMbTj+A7rjkF8z0wRAmdAAJ4sCIaUqZaKn7gGtkpN7Wb47acFiACgrOSe DpT3tE1/4zfPIDueDwBMhDU= =o6Lr -END PGP SIGNATURE- -- BOFH excuse #37: heavy gravity fluctuation, move computer to floor rapidly - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: oom with 2.6.11
On Fri, Mar 11, 2005 at 02:14:14AM +0100, Christian Kujau wrote: > Andrew Morton wrote: > > Christian Kujau <[EMAIL PROTECTED]> wrote: > > > >>i was going to compile 2.6.11-rc5-bk4, to sort out the "bad" kernel. > >>compiling went fine. ok, finished some email, ok, suddenly my swap was > >>used up again, and no memory left - uh oh! OOM again, with 2.6.11-rc5-bk2! > > > > > > Well if you ran out of swap then yes, the oom-killer will visit you. > > > > Why did you run out of swapspace? > > hm, if i only knew. i don't know how long it took the other night to go > from "normal" to "OOM". but today, with 2.6.11-rc5-bk2 (well, yesterday > actually) i was working normally, and all of a sudden swap goes from 170MB > used swap (normal) to OOM. i think it took a minute or so, but i just > can't tell which application went nuts. today the first process that got > killed was "ssh-agent", the other day it was mysqld. but even after this, > it should've released some memory, right? but the oom-killer goes on and > on and kills the next task. > > i'll monitor memory usage tonight and see what it gives. these "pppd" > messages are suspicious though. > I've also seen this 3 times now.. I'm running Xen 2.0 and the 2.6.10-xen0 kernel (with -as5 patch) goes OOM after a couple of days operation.. and then the box reboots. The "virtual machines" are OK, it's only the dom0 kernel that goes OOM.. And I'm not running anything special on dom0, only xen control stuff (which is written in python..), ntp, nfs server, ssh, lvm2 and software raid. Now I'm running a script which logs the cpu/memory/swap usage every 1 minutes.. trying to see if I can find the cause for the OOM. -- Pasi Kärkkäinen ^ . . Linux /-\ Choice.of.the .Next.Generation. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: oom with 2.6.11
Andrew Morton wrote: > Christian Kujau <[EMAIL PROTECTED]> wrote: > >>i was going to compile 2.6.11-rc5-bk4, to sort out the "bad" kernel. >>compiling went fine. ok, finished some email, ok, suddenly my swap was >>used up again, and no memory left - uh oh! OOM again, with 2.6.11-rc5-bk2! > > > Well if you ran out of swap then yes, the oom-killer will visit you. > > Why did you run out of swapspace? hm, if i only knew. i don't know how long it took the other night to go from "normal" to "OOM". but today, with 2.6.11-rc5-bk2 (well, yesterday actually) i was working normally, and all of a sudden swap goes from 170MB used swap (normal) to OOM. i think it took a minute or so, but i just can't tell which application went nuts. today the first process that got killed was "ssh-agent", the other day it was mysqld. but even after this, it should've released some memory, right? but the oom-killer goes on and on and kills the next task. i'll monitor memory usage tonight and see what it gives. these "pppd" messages are suspicious though. thank you, Christian. -- BOFH excuse #135: You put the disk in upside down. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: oom with 2.6.11
Christian Kujau <[EMAIL PROTECTED]> wrote: > > i was going to compile 2.6.11-rc5-bk4, to sort out the "bad" kernel. > compiling went fine. ok, finished some email, ok, suddenly my swap was > used up again, and no memory left - uh oh! OOM again, with 2.6.11-rc5-bk2! Well if you ran out of swap then yes, the oom-killer will visit you. Why did you run out of swapspace? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: oom with 2.6.11
ok, as "promised", it the OOM happened again with the same plain 2.6.11, details here. http://nerdbynature.de/bits/sheep/2.6.11/oom/oom_2.6.11_2.txt the following is a quite long, but please read on (if anyone is reading at all :)) this time it happened at 08:01, and i could image some heavy cron jobs were going on. but as i said: "it did not happen before". there are also output of SYSRQ-T/M/P. i did SYSRQ-E to recover the machine, but then decided to reboot back to 2.6.11-rc5-bk2. i had a look at the changelogs too and noticed that ChangeLog-2.6.11 contains 7 occurrences of "OOM" in the patch desctiption: [PATCH] mm: overcommit updates, 2005-01-03 [PATCH] vmscan: count writeback pages in nr_scanned, 2005-01-08 [PATCH] possible rq starvation on oom, 2005-01-13 [PATCH] mm: adjust dirty threshold for lowmem-only mappings, 2005-01-25 [PATCH] mm: oom-killer tunable, 2005-02-02 [PATCH] mm: fix several oom killer bugs, 2005-02-02 [PATCH] Fix oops in alloc_zeroed_user_highpage() when [...],2005-02-09 release dates: 2.6.11-rc5-bk1 26-Feb-2005 2.6.11-rc5-bk2 27-Feb-2005 < 2.6.11-rc5-bk3 28-Feb-2005 2.6.11-rc5-bk4 01-Mar-2005 2.6.11 02-Mar-2005 so i really don't see any patches that *could* have something to do with the issue here. now comes the weird part: i was going to compile 2.6.11-rc5-bk4, to sort out the "bad" kernel. compiling went fine. ok, finished some email, ok, suddenly my swap was used up again, and no memory left - uh oh! OOM again, with 2.6.11-rc5-bk2! to summarize it: i've run 2.6.11-rc2-bk10 during whole february, then switched to 2.6.11-rc5-bk2 on 28.02.2005, then to 2.6.11 on 05.03.2005 - and only noticed with 2.6.11 first, now with 2.6.11-rc5-bk2 too. there is an interesting part in the logfiles: http://nerdbynature.de/bits/sheep/2.6.11/oom/oom_2.6.11.txt http://nerdbynature.de/bits/sheep/2.6.11/oom/oom_2.6.11_2.txt http://nerdbynature.de/bits/sheep/2.6.11/oom/oom_2.6.11-rc5-bk2.txt every last message before the "OOM" messages is something with pppd: Mar 10 13:45:55 sheep pppd[1567]: Starting link Mar 10 14:12:29 sheep kernel: oom-killer: gfp_mask=0x1d2 Mar 8 00:59:58 sheep pppd[418]: Starting link Mar 8 01:27:33 sheep kernel: oom-killer: gfp_mask=0xd0 Mar 9 07:33:49 sheep pppd[30937]: Starting link Mar 9 08:01:35 sheep kernel: oom-killer: gfp_mask=0x1d2 and 30min later OOM kicks in. normally, pppd (pppoe) gives messages like this: Mar 10 14:23:38 sheep pppd[26365]: Starting link Mar 10 14:23:38 sheep pppd[26365]: Serial connection established. Mar 10 14:23:38 sheep pppd[26365]: Connect: ppp0 <--> /dev/pts/0 Mar 10 14:23:38 sheep pppoe[26383]: PADS: Service-Name: '' Mar 10 14:23:38 sheep pppoe[26383]: PPP session is 6804 Mar 10 14:23:39 sheep pppd[26365]: CHAP authentication succeeded Mar 10 14:23:40 sheep pppd[26365]: Local IP address changed to [...] is this strange? or not? i hope someone has a hint for me, because "going back to the stable kernel" would mean "being bound to 2.6.11-rc2-bk10" :( thank you for any hints, Christian. PS: Steven, i've cc'ed you because you have trouble with new 2.6.11 kernels and pppd too. maybe unrelated, maybe not. -- BOFH excuse #185: system consumed all the paper for paging - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: oom with 2.6.11
ok, as promised, it the OOM happened again with the same plain 2.6.11, details here. http://nerdbynature.de/bits/sheep/2.6.11/oom/oom_2.6.11_2.txt the following is a quite long, but please read on (if anyone is reading at all :)) this time it happened at 08:01, and i could image some heavy cron jobs were going on. but as i said: it did not happen before. there are also output of SYSRQ-T/M/P. i did SYSRQ-E to recover the machine, but then decided to reboot back to 2.6.11-rc5-bk2. i had a look at the changelogs too and noticed that ChangeLog-2.6.11 contains 7 occurrences of OOM in the patch desctiption: [PATCH] mm: overcommit updates, 2005-01-03 [PATCH] vmscan: count writeback pages in nr_scanned, 2005-01-08 [PATCH] possible rq starvation on oom, 2005-01-13 [PATCH] mm: adjust dirty threshold for lowmem-only mappings, 2005-01-25 [PATCH] mm: oom-killer tunable, 2005-02-02 [PATCH] mm: fix several oom killer bugs, 2005-02-02 [PATCH] Fix oops in alloc_zeroed_user_highpage() when [...],2005-02-09 release dates: 2.6.11-rc5-bk1 26-Feb-2005 2.6.11-rc5-bk2 27-Feb-2005 2.6.11-rc5-bk3 28-Feb-2005 2.6.11-rc5-bk4 01-Mar-2005 2.6.11 02-Mar-2005 so i really don't see any patches that *could* have something to do with the issue here. now comes the weird part: i was going to compile 2.6.11-rc5-bk4, to sort out the bad kernel. compiling went fine. ok, finished some email, ok, suddenly my swap was used up again, and no memory left - uh oh! OOM again, with 2.6.11-rc5-bk2! to summarize it: i've run 2.6.11-rc2-bk10 during whole february, then switched to 2.6.11-rc5-bk2 on 28.02.2005, then to 2.6.11 on 05.03.2005 - and only noticed with 2.6.11 first, now with 2.6.11-rc5-bk2 too. there is an interesting part in the logfiles: http://nerdbynature.de/bits/sheep/2.6.11/oom/oom_2.6.11.txt http://nerdbynature.de/bits/sheep/2.6.11/oom/oom_2.6.11_2.txt http://nerdbynature.de/bits/sheep/2.6.11/oom/oom_2.6.11-rc5-bk2.txt every last message before the OOM messages is something with pppd: Mar 10 13:45:55 sheep pppd[1567]: Starting link Mar 10 14:12:29 sheep kernel: oom-killer: gfp_mask=0x1d2 Mar 8 00:59:58 sheep pppd[418]: Starting link Mar 8 01:27:33 sheep kernel: oom-killer: gfp_mask=0xd0 Mar 9 07:33:49 sheep pppd[30937]: Starting link Mar 9 08:01:35 sheep kernel: oom-killer: gfp_mask=0x1d2 and 30min later OOM kicks in. normally, pppd (pppoe) gives messages like this: Mar 10 14:23:38 sheep pppd[26365]: Starting link Mar 10 14:23:38 sheep pppd[26365]: Serial connection established. Mar 10 14:23:38 sheep pppd[26365]: Connect: ppp0 -- /dev/pts/0 Mar 10 14:23:38 sheep pppoe[26383]: PADS: Service-Name: '' Mar 10 14:23:38 sheep pppoe[26383]: PPP session is 6804 Mar 10 14:23:39 sheep pppd[26365]: CHAP authentication succeeded Mar 10 14:23:40 sheep pppd[26365]: Local IP address changed to [...] is this strange? or not? i hope someone has a hint for me, because going back to the stable kernel would mean being bound to 2.6.11-rc2-bk10 :( thank you for any hints, Christian. PS: Steven, i've cc'ed you because you have trouble with new 2.6.11 kernels and pppd too. maybe unrelated, maybe not. -- BOFH excuse #185: system consumed all the paper for paging - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: oom with 2.6.11
Christian Kujau [EMAIL PROTECTED] wrote: i was going to compile 2.6.11-rc5-bk4, to sort out the bad kernel. compiling went fine. ok, finished some email, ok, suddenly my swap was used up again, and no memory left - uh oh! OOM again, with 2.6.11-rc5-bk2! Well if you ran out of swap then yes, the oom-killer will visit you. Why did you run out of swapspace? - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: oom with 2.6.11
Andrew Morton wrote: Christian Kujau [EMAIL PROTECTED] wrote: i was going to compile 2.6.11-rc5-bk4, to sort out the bad kernel. compiling went fine. ok, finished some email, ok, suddenly my swap was used up again, and no memory left - uh oh! OOM again, with 2.6.11-rc5-bk2! Well if you ran out of swap then yes, the oom-killer will visit you. Why did you run out of swapspace? hm, if i only knew. i don't know how long it took the other night to go from normal to OOM. but today, with 2.6.11-rc5-bk2 (well, yesterday actually) i was working normally, and all of a sudden swap goes from 170MB used swap (normal) to OOM. i think it took a minute or so, but i just can't tell which application went nuts. today the first process that got killed was ssh-agent, the other day it was mysqld. but even after this, it should've released some memory, right? but the oom-killer goes on and on and kills the next task. i'll monitor memory usage tonight and see what it gives. these pppd messages are suspicious though. thank you, Christian. -- BOFH excuse #135: You put the disk in upside down. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: oom with 2.6.11
On Fri, Mar 11, 2005 at 02:14:14AM +0100, Christian Kujau wrote: Andrew Morton wrote: Christian Kujau [EMAIL PROTECTED] wrote: i was going to compile 2.6.11-rc5-bk4, to sort out the bad kernel. compiling went fine. ok, finished some email, ok, suddenly my swap was used up again, and no memory left - uh oh! OOM again, with 2.6.11-rc5-bk2! Well if you ran out of swap then yes, the oom-killer will visit you. Why did you run out of swapspace? hm, if i only knew. i don't know how long it took the other night to go from normal to OOM. but today, with 2.6.11-rc5-bk2 (well, yesterday actually) i was working normally, and all of a sudden swap goes from 170MB used swap (normal) to OOM. i think it took a minute or so, but i just can't tell which application went nuts. today the first process that got killed was ssh-agent, the other day it was mysqld. but even after this, it should've released some memory, right? but the oom-killer goes on and on and kills the next task. i'll monitor memory usage tonight and see what it gives. these pppd messages are suspicious though. I've also seen this 3 times now.. I'm running Xen 2.0 and the 2.6.10-xen0 kernel (with -as5 patch) goes OOM after a couple of days operation.. and then the box reboots. The virtual machines are OK, it's only the dom0 kernel that goes OOM.. And I'm not running anything special on dom0, only xen control stuff (which is written in python..), ntp, nfs server, ssh, lvm2 and software raid. Now I'm running a script which logs the cpu/memory/swap usage every 1 minutes.. trying to see if I can find the cause for the OOM. -- Pasi Kärkkäinen ^ . . Linux /-\ Choice.of.the .Next.Generation. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: oom with 2.6.11
Mauricio Lin wrote: > Hi Christian, > > I found the 2.6.11-rc3 patch. The oom killer modification from > Arcangeli was included in 2.6.11-rc3. Right? So this is correct, so > the problem is not related to Arcangeli modification. > > Does anyone have idea? hi Mauricio, thank you for your answers, but i'm unable to investigate right now, will do so in the evening. thanks, Christian. -- BOFH excuse #240: Too many little pins on CPU confusing it, bend back and forth until 10-20% are neatly removed. Do _not_ leave metal bits visible! - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: oom with 2.6.11
Hi Christian, I found the 2.6.11-rc3 patch. The oom killer modification from Arcangeli was included in 2.6.11-rc3. Right? So this is correct, so the problem is not related to Arcangeli modification. Does anyone have idea? BR, Mauricio Lin. On Wed, 9 Mar 2005 09:18:31 -0400, Mauricio Lin <[EMAIL PROTECTED]> wrote: > Hi Christian, > > Could you check the mm/oom_kill.c for your kernel 2.6.11-rc3? > > During the 2.6.11-rc development, the oom killer was changed by Andrea > Arcangeli. I do not remember exactly which was the version that this > modification was included, perhaps in kernel 2.6.11-rc4. > > Now this oom killer modification is part of 2.6.11 vanilla kernel. > > Send the mm/oom_kill.c of 2.6.11-rc3 to me, please. Let me confirm my doubt. > > BR, > > Mauricio Lin. > > On Tue, 08 Mar 2005 16:21:21 +0100, Christian Kujau <[EMAIL PROTECTED]> wrote: > > hallo list, > > > > today my machine went out out memory and noticing it several hours after > > the first OOM message in the log, i wonder > >1) why this happened at all and > >2) why almost every service was killed despite the clever algorithms > > documented in mm/oom_kill.c. > > > > the first oom message went to the syslog at 01:27, i was away and no heavy > > tasks were scheduled: > > > > http://nerdbynature.de/bits/sheep/2.6.11/oom/oom_2.6.11.txt > > > > mysqld got killed by the oom killer, so i have to suspect mysql for being > > the reason for oom here, even that i know that mysqld is running all day > > long. several other tasks got killed, but "Free swap" stays at 0kB and the > > oom killer kills almost every other tasks, with no success in freeing ram. > > > > the log stops at 03:21, perhaps syslog-ng got killed. > > at around 07:31 i noticed the mess, did SYSRQ-E and now i was able to > > login again. i pressed SYSRQ-M/T/P too, they are all in the log. at this > > time loadavg was at 249 ;) > > > > i went to runlevel 2, then up again to 3 and all services are up and > > running again. > > > > some 2.6.11-rc3 BK snapshot was running pretty stable (no OOM) for ~30 > > days before i switched to 2.6.11 (vanilla) a few days ago. i have to (not) > > reproduce the problem the next night, i wonder if it will happen again. > > > > do you vm-gurus have any idea to the points asked above? > > > > more infos about the box here: http://nerdbynature.de/bits/sheep/2.6.11/oom/ > > > > thank you for your comments, > > Christian. > > -- > > BOFH excuse #281: > > > > The co-locator cannot verify the frame-relay gateway to the ISDN server. > > - > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > > the body of a message to [EMAIL PROTECTED] > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > Please read the FAQ at http://www.tux.org/lkml/ > > > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: OOM with 2.6.11
...replying to myself: it happened again! switched back to 2.6.11-rc5-bk2, details will follow. thanks, Christian. -- BOFH excuse #311: transient bus protocol violation - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: oom with 2.6.11
Hi Christian, Could you check the mm/oom_kill.c for your kernel 2.6.11-rc3? During the 2.6.11-rc development, the oom killer was changed by Andrea Arcangeli. I do not remember exactly which was the version that this modification was included, perhaps in kernel 2.6.11-rc4. Now this oom killer modification is part of 2.6.11 vanilla kernel. Send the mm/oom_kill.c of 2.6.11-rc3 to me, please. Let me confirm my doubt. BR, Mauricio Lin. On Tue, 08 Mar 2005 16:21:21 +0100, Christian Kujau <[EMAIL PROTECTED]> wrote: > hallo list, > > today my machine went out out memory and noticing it several hours after > the first OOM message in the log, i wonder >1) why this happened at all and >2) why almost every service was killed despite the clever algorithms > documented in mm/oom_kill.c. > > the first oom message went to the syslog at 01:27, i was away and no heavy > tasks were scheduled: > > http://nerdbynature.de/bits/sheep/2.6.11/oom/oom_2.6.11.txt > > mysqld got killed by the oom killer, so i have to suspect mysql for being > the reason for oom here, even that i know that mysqld is running all day > long. several other tasks got killed, but "Free swap" stays at 0kB and the > oom killer kills almost every other tasks, with no success in freeing ram. > > the log stops at 03:21, perhaps syslog-ng got killed. > at around 07:31 i noticed the mess, did SYSRQ-E and now i was able to > login again. i pressed SYSRQ-M/T/P too, they are all in the log. at this > time loadavg was at 249 ;) > > i went to runlevel 2, then up again to 3 and all services are up and > running again. > > some 2.6.11-rc3 BK snapshot was running pretty stable (no OOM) for ~30 > days before i switched to 2.6.11 (vanilla) a few days ago. i have to (not) > reproduce the problem the next night, i wonder if it will happen again. > > do you vm-gurus have any idea to the points asked above? > > more infos about the box here: http://nerdbynature.de/bits/sheep/2.6.11/oom/ > > thank you for your comments, > Christian. > -- > BOFH excuse #281: > > The co-locator cannot verify the frame-relay gateway to the ISDN server. > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: oom with 2.6.11
Hi Christian, Could you check the mm/oom_kill.c for your kernel 2.6.11-rc3? During the 2.6.11-rc development, the oom killer was changed by Andrea Arcangeli. I do not remember exactly which was the version that this modification was included, perhaps in kernel 2.6.11-rc4. Now this oom killer modification is part of 2.6.11 vanilla kernel. Send the mm/oom_kill.c of 2.6.11-rc3 to me, please. Let me confirm my doubt. BR, Mauricio Lin. On Tue, 08 Mar 2005 16:21:21 +0100, Christian Kujau [EMAIL PROTECTED] wrote: hallo list, today my machine went out out memory and noticing it several hours after the first OOM message in the log, i wonder 1) why this happened at all and 2) why almost every service was killed despite the clever algorithms documented in mm/oom_kill.c. the first oom message went to the syslog at 01:27, i was away and no heavy tasks were scheduled: http://nerdbynature.de/bits/sheep/2.6.11/oom/oom_2.6.11.txt mysqld got killed by the oom killer, so i have to suspect mysql for being the reason for oom here, even that i know that mysqld is running all day long. several other tasks got killed, but Free swap stays at 0kB and the oom killer kills almost every other tasks, with no success in freeing ram. the log stops at 03:21, perhaps syslog-ng got killed. at around 07:31 i noticed the mess, did SYSRQ-E and now i was able to login again. i pressed SYSRQ-M/T/P too, they are all in the log. at this time loadavg was at 249 ;) i went to runlevel 2, then up again to 3 and all services are up and running again. some 2.6.11-rc3 BK snapshot was running pretty stable (no OOM) for ~30 days before i switched to 2.6.11 (vanilla) a few days ago. i have to (not) reproduce the problem the next night, i wonder if it will happen again. do you vm-gurus have any idea to the points asked above? more infos about the box here: http://nerdbynature.de/bits/sheep/2.6.11/oom/ thank you for your comments, Christian. -- BOFH excuse #281: The co-locator cannot verify the frame-relay gateway to the ISDN server. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: OOM with 2.6.11
...replying to myself: it happened again! switched back to 2.6.11-rc5-bk2, details will follow. thanks, Christian. -- BOFH excuse #311: transient bus protocol violation - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: oom with 2.6.11
Hi Christian, I found the 2.6.11-rc3 patch. The oom killer modification from Arcangeli was included in 2.6.11-rc3. Right? So this is correct, so the problem is not related to Arcangeli modification. Does anyone have idea? BR, Mauricio Lin. On Wed, 9 Mar 2005 09:18:31 -0400, Mauricio Lin [EMAIL PROTECTED] wrote: Hi Christian, Could you check the mm/oom_kill.c for your kernel 2.6.11-rc3? During the 2.6.11-rc development, the oom killer was changed by Andrea Arcangeli. I do not remember exactly which was the version that this modification was included, perhaps in kernel 2.6.11-rc4. Now this oom killer modification is part of 2.6.11 vanilla kernel. Send the mm/oom_kill.c of 2.6.11-rc3 to me, please. Let me confirm my doubt. BR, Mauricio Lin. On Tue, 08 Mar 2005 16:21:21 +0100, Christian Kujau [EMAIL PROTECTED] wrote: hallo list, today my machine went out out memory and noticing it several hours after the first OOM message in the log, i wonder 1) why this happened at all and 2) why almost every service was killed despite the clever algorithms documented in mm/oom_kill.c. the first oom message went to the syslog at 01:27, i was away and no heavy tasks were scheduled: http://nerdbynature.de/bits/sheep/2.6.11/oom/oom_2.6.11.txt mysqld got killed by the oom killer, so i have to suspect mysql for being the reason for oom here, even that i know that mysqld is running all day long. several other tasks got killed, but Free swap stays at 0kB and the oom killer kills almost every other tasks, with no success in freeing ram. the log stops at 03:21, perhaps syslog-ng got killed. at around 07:31 i noticed the mess, did SYSRQ-E and now i was able to login again. i pressed SYSRQ-M/T/P too, they are all in the log. at this time loadavg was at 249 ;) i went to runlevel 2, then up again to 3 and all services are up and running again. some 2.6.11-rc3 BK snapshot was running pretty stable (no OOM) for ~30 days before i switched to 2.6.11 (vanilla) a few days ago. i have to (not) reproduce the problem the next night, i wonder if it will happen again. do you vm-gurus have any idea to the points asked above? more infos about the box here: http://nerdbynature.de/bits/sheep/2.6.11/oom/ thank you for your comments, Christian. -- BOFH excuse #281: The co-locator cannot verify the frame-relay gateway to the ISDN server. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: oom with 2.6.11
Mauricio Lin wrote: Hi Christian, I found the 2.6.11-rc3 patch. The oom killer modification from Arcangeli was included in 2.6.11-rc3. Right? So this is correct, so the problem is not related to Arcangeli modification. Does anyone have idea? hi Mauricio, thank you for your answers, but i'm unable to investigate right now, will do so in the evening. thanks, Christian. -- BOFH excuse #240: Too many little pins on CPU confusing it, bend back and forth until 10-20% are neatly removed. Do _not_ leave metal bits visible! - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/