Re: PROBLEM: System Freeze on Particular workload with kernel 2.6.22.6

2007-09-20 Thread Yucheng Low
Hi all,

Thanks all. After lots of testing, I isolated the problem to one of the
memory modules.

Thought it might have been a kernel problem as I thought memtest should
be exhaustive enough considering I ran it for so long, but apparently not...
Even now, the bad module still does not show any errors in memtest...

Thanks,
Yucheng

Ray Lee wrote:
> On 9/19/07, Low Yucheng <[EMAIL PROTECTED]> wrote:
>   
>> [1.] Summary
>> System Freeze on Particular workload with kernel 2.6.22.6
>>
>> [2.] Description
>> System freezes on repeated application of the following command
>> for f in *png ; do convert -quality 100 $f `basename $f png`jpg; done
>>
>> Problem is consistent and repeatable.
>> Problem persists when running on a different drive, and also in pure console 
>> (no X).
>>
>> One time, the following error logged in syslog:
>> Sep 19 04:22:11 mossnew kernel: [  301.883919] VM: killing process convert
>> Sep 19 04:22:11 mossnew kernel: [  301.884382] swap_free: Unused swap offset 
>> entry ff00
>> Sep 19 04:22:11 mossnew kernel: [  301.884421] swap_free: Unused swap offset 
>> entry 0300
>> Sep 19 04:22:11 mossnew kernel: [  301.884456] swap_free: Unused swap offset 
>> entry 0200
>> Sep 19 04:22:11 mossnew kernel: [  301.884491] swap_free: Unused swap offset 
>> entry ff00
>> Sep 19 04:22:11 mossnew kernel: [  301.884527] swap_free: Unused swap offset 
>> entry ff00
>> Sep 19 04:22:11 mossnew kernel: [  301.884562] swap_free: Unused swap offset 
>> entry 0100
>>
>> Should not be a RAM problem. RAM has survived 12 hrs of Memtest with no 
>> errors.
>> Should not be a CPU problem either. I have been running CPU intensive tasks 
>> for days.
>> 
>
> The "Unused swap offset entry" is almost always a sign of bad memory,
> if google can be trusted. Your workload is *extremely* CPU and memory
> intensive (and even hits the disk!), so this looks like bad RAM, bad
> cooling, or a marginal power supply that is failing under load.
>
> memtest86+ doesn't stress the CPU nearly as much, so it often doesn't
> show all the problems.
>
> Take your RAM down to one stick and try again (looks like you have 2G
> installed?). If that still fails, try different RAM. If that still
> fails, then swap out the power supply for another if you can, and try
> again.
>
> Ray
>
>   

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: System Freeze on Particular workload with kernel 2.6.22.6

2007-09-20 Thread Yucheng Low
Hi all,

Thanks all. After lots of testing, I isolated the problem to one of the
memory modules.

Thought it might have been a kernel problem as I thought memtest should
be exhaustive enough considering I ran it for so long, but apparently not...
Even now, the bad module still does not show any errors in memtest...

Thanks,
Yucheng

Ray Lee wrote:
 On 9/19/07, Low Yucheng [EMAIL PROTECTED] wrote:
   
 [1.] Summary
 System Freeze on Particular workload with kernel 2.6.22.6

 [2.] Description
 System freezes on repeated application of the following command
 for f in *png ; do convert -quality 100 $f `basename $f png`jpg; done

 Problem is consistent and repeatable.
 Problem persists when running on a different drive, and also in pure console 
 (no X).

 One time, the following error logged in syslog:
 Sep 19 04:22:11 mossnew kernel: [  301.883919] VM: killing process convert
 Sep 19 04:22:11 mossnew kernel: [  301.884382] swap_free: Unused swap offset 
 entry ff00
 Sep 19 04:22:11 mossnew kernel: [  301.884421] swap_free: Unused swap offset 
 entry 0300
 Sep 19 04:22:11 mossnew kernel: [  301.884456] swap_free: Unused swap offset 
 entry 0200
 Sep 19 04:22:11 mossnew kernel: [  301.884491] swap_free: Unused swap offset 
 entry ff00
 Sep 19 04:22:11 mossnew kernel: [  301.884527] swap_free: Unused swap offset 
 entry ff00
 Sep 19 04:22:11 mossnew kernel: [  301.884562] swap_free: Unused swap offset 
 entry 0100

 Should not be a RAM problem. RAM has survived 12 hrs of Memtest with no 
 errors.
 Should not be a CPU problem either. I have been running CPU intensive tasks 
 for days.
 

 The Unused swap offset entry is almost always a sign of bad memory,
 if google can be trusted. Your workload is *extremely* CPU and memory
 intensive (and even hits the disk!), so this looks like bad RAM, bad
 cooling, or a marginal power supply that is failing under load.

 memtest86+ doesn't stress the CPU nearly as much, so it often doesn't
 show all the problems.

 Take your RAM down to one stick and try again (looks like you have 2G
 installed?). If that still fails, try different RAM. If that still
 fails, then swap out the power supply for another if you can, and try
 again.

 Ray

   

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/