Re: PROBLEM: System Freeze on Particular workload with kernel 2.6.22.6
Hi all, Thanks all. After lots of testing, I isolated the problem to one of the memory modules. Thought it might have been a kernel problem as I thought memtest should be exhaustive enough considering I ran it for so long, but apparently not... Even now, the bad module still does not show any errors in memtest... Thanks, Yucheng Ray Lee wrote: > On 9/19/07, Low Yucheng <[EMAIL PROTECTED]> wrote: > >> [1.] Summary >> System Freeze on Particular workload with kernel 2.6.22.6 >> >> [2.] Description >> System freezes on repeated application of the following command >> for f in *png ; do convert -quality 100 $f `basename $f png`jpg; done >> >> Problem is consistent and repeatable. >> Problem persists when running on a different drive, and also in pure console >> (no X). >> >> One time, the following error logged in syslog: >> Sep 19 04:22:11 mossnew kernel: [ 301.883919] VM: killing process convert >> Sep 19 04:22:11 mossnew kernel: [ 301.884382] swap_free: Unused swap offset >> entry ff00 >> Sep 19 04:22:11 mossnew kernel: [ 301.884421] swap_free: Unused swap offset >> entry 0300 >> Sep 19 04:22:11 mossnew kernel: [ 301.884456] swap_free: Unused swap offset >> entry 0200 >> Sep 19 04:22:11 mossnew kernel: [ 301.884491] swap_free: Unused swap offset >> entry ff00 >> Sep 19 04:22:11 mossnew kernel: [ 301.884527] swap_free: Unused swap offset >> entry ff00 >> Sep 19 04:22:11 mossnew kernel: [ 301.884562] swap_free: Unused swap offset >> entry 0100 >> >> Should not be a RAM problem. RAM has survived 12 hrs of Memtest with no >> errors. >> Should not be a CPU problem either. I have been running CPU intensive tasks >> for days. >> > > The "Unused swap offset entry" is almost always a sign of bad memory, > if google can be trusted. Your workload is *extremely* CPU and memory > intensive (and even hits the disk!), so this looks like bad RAM, bad > cooling, or a marginal power supply that is failing under load. > > memtest86+ doesn't stress the CPU nearly as much, so it often doesn't > show all the problems. > > Take your RAM down to one stick and try again (looks like you have 2G > installed?). If that still fails, try different RAM. If that still > fails, then swap out the power supply for another if you can, and try > again. > > Ray > > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PROBLEM: System Freeze on Particular workload with kernel 2.6.22.6
Hi all, Thanks all. After lots of testing, I isolated the problem to one of the memory modules. Thought it might have been a kernel problem as I thought memtest should be exhaustive enough considering I ran it for so long, but apparently not... Even now, the bad module still does not show any errors in memtest... Thanks, Yucheng Ray Lee wrote: On 9/19/07, Low Yucheng [EMAIL PROTECTED] wrote: [1.] Summary System Freeze on Particular workload with kernel 2.6.22.6 [2.] Description System freezes on repeated application of the following command for f in *png ; do convert -quality 100 $f `basename $f png`jpg; done Problem is consistent and repeatable. Problem persists when running on a different drive, and also in pure console (no X). One time, the following error logged in syslog: Sep 19 04:22:11 mossnew kernel: [ 301.883919] VM: killing process convert Sep 19 04:22:11 mossnew kernel: [ 301.884382] swap_free: Unused swap offset entry ff00 Sep 19 04:22:11 mossnew kernel: [ 301.884421] swap_free: Unused swap offset entry 0300 Sep 19 04:22:11 mossnew kernel: [ 301.884456] swap_free: Unused swap offset entry 0200 Sep 19 04:22:11 mossnew kernel: [ 301.884491] swap_free: Unused swap offset entry ff00 Sep 19 04:22:11 mossnew kernel: [ 301.884527] swap_free: Unused swap offset entry ff00 Sep 19 04:22:11 mossnew kernel: [ 301.884562] swap_free: Unused swap offset entry 0100 Should not be a RAM problem. RAM has survived 12 hrs of Memtest with no errors. Should not be a CPU problem either. I have been running CPU intensive tasks for days. The Unused swap offset entry is almost always a sign of bad memory, if google can be trusted. Your workload is *extremely* CPU and memory intensive (and even hits the disk!), so this looks like bad RAM, bad cooling, or a marginal power supply that is failing under load. memtest86+ doesn't stress the CPU nearly as much, so it often doesn't show all the problems. Take your RAM down to one stick and try again (looks like you have 2G installed?). If that still fails, try different RAM. If that still fails, then swap out the power supply for another if you can, and try again. Ray - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/