[CentOS] Centos 5.6 Kernel Panics

2012-04-04 Thread Jonathan Alstead
Hello,

Recently our dell sc1425 server has been locking up with kernel freezes 
and required a hard reboot on each occasion. I've looked on the centos 
forums with limited success - each problem seems slightly different 
(some failure on high load, some not). Our kernel is 2.6.18-274.17.1.el5 
and /var/log/messages show the following errors:

Apr  3 12:41:25 sp2 kernel: INFO: task mysqld:15345 blocked for more 
than 120 seconds.
Apr  3 12:41:25 sp2 kernel: echo 0  
/proc/sys/kernel/hung_task_timeout_secs disables this message.
Apr  3 12:41:25 sp2 kernel: mysqldD 0CEB  2524 15345  32083 
 15346 15167 (NOTLB)
Apr  3 12:41:25 sp2 kernel:c50c7f54 0082 bf379c08 0ceb 
ca9b1648 f43c6c5c  0001
Apr  3 12:41:25 sp2 kernel:d9d18000 bf384f01 0ceb b2f9 
0001 d9d1810c c2013ac4 edc5de40
Apr  3 12:41:25 sp2 kernel:08515c98 c6cb37b8 c2014464 c200cc80 
0020   
Apr  3 12:41:25 sp2 kernel: Call Trace:
Apr  3 12:41:25 sp2 kernel:  [c0622f16] 
rwsem_down_write_failed+0x126/0x141
Apr  3 12:41:25 sp2 kernel:  [c0439989] .text.lock.rwsem+0x2b/0x3a
Apr  3 12:41:25 sp2 kernel:  [c046aa6a] sys_mprotect+0xbd/0x1eb 

Apr  3 12:41:25 sp2 kernel:  [c0404f4b] syscall_call+0x7/0xb 

Apr  3 12:41:25 sp2 kernel:  ===
Apr  3 12:41:25 sp2 kernel: INFO: task clamd:15721 blocked for more than 
120 seconds.
Apr  3 12:41:26 sp2 kernel: echo 0  
/proc/sys/kernel/hung_task_timeout_secs disables this message.
Apr  3 12:41:26 sp2 kernel: clamd D 0D49  2528 15721  1 
 16416 15449 (NOTLB)
Apr  3 12:41:26 sp2 kernel:e848cf74 0086 8f107b57 0d49 
30ea2005 e848cf44 c08259d0 0007
Apr  3 12:41:26 sp2 kernel:e8c6aaa0 8f117848 0d49 fcf1 
 e8c6abac c200cc80 f4f5f3c0
Apr  3 12:41:26 sp2 kernel:c041f863 0184 c200d620 c2013ac4 
0020  d887f0a8 f766f0c0
Apr  3 12:41:26 sp2 kernel: Call Trace:
Apr  3 12:41:26 sp2 kernel:  [c041f863] default_wake_function+0x0/0xc
Apr  3 12:41:26 sp2 kernel:  [c048e994] destroy_inode+0x38/0x47
Apr  3 12:41:26 sp2 kernel:  [c0622f16] 
rwsem_down_write_failed+0x126/0x141
Apr  3 12:41:26 sp2 kernel:  [c0439989] .text.lock.rwsem+0x2b/0x3a
Apr  3 12:41:26 sp2 kernel:  [c046a32b] sys_munmap+0x24/0x41 

Apr  3 12:41:26 sp2 kernel:  [c0404f4b] syscall_call+0x7/0xb

Any advice would be appreciated.

regards,

Jon
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Centos 5.6 Kernel Panics

2012-04-04 Thread Nataraj
On 04/04/2012 09:16 AM, Jonathan Alstead wrote:
 Hello,

 Recently our dell sc1425 server has been locking up with kernel freezes 
 and required a hard reboot on each occasion. I've looked on the centos 
 forums with limited success - each problem seems slightly different 
 (some failure on high load, some not). Our kernel is 2.6.18-274.17.1.el5 
 and /var/log/messages show the following errors:

 Apr  3 12:41:25 sp2 kernel: INFO: task mysqld:15345 blocked for more 
 than 120 seconds.
 Apr  3 12:41:25 sp2 kernel: echo 0  
 /proc/sys/kernel/hung_task_timeout_secs disables this message.
 Apr  3 12:41:25 sp2 kernel: mysqldD 0CEB  2524 15345  32083 
  15346 15167 (NOTLB)
 Apr  3 12:41:25 sp2 kernel:c50c7f54 0082 bf379c08 0ceb 
 ca9b1648 f43c6c5c  0001
 Apr  3 12:41:25 sp2 kernel:d9d18000 bf384f01 0ceb b2f9 
 0001 d9d1810c c2013ac4 edc5de40
 Apr  3 12:41:25 sp2 kernel:08515c98 c6cb37b8 c2014464 c200cc80 
 0020   
 Apr  3 12:41:25 sp2 kernel: Call Trace:
 Apr  3 12:41:25 sp2 kernel:  [c0622f16] 
 rwsem_down_write_failed+0x126/0x141
 Apr  3 12:41:25 sp2 kernel:  [c0439989] .text.lock.rwsem+0x2b/0x3a
 Apr  3 12:41:25 sp2 kernel:  [c046aa6a] sys_mprotect+0xbd/0x1eb 

 Apr  3 12:41:25 sp2 kernel:  [c0404f4b] syscall_call+0x7/0xb 

 Apr  3 12:41:25 sp2 kernel:  ===
 Apr  3 12:41:25 sp2 kernel: INFO: task clamd:15721 blocked for more than 
 120 seconds.
 Apr  3 12:41:26 sp2 kernel: echo 0  
 /proc/sys/kernel/hung_task_timeout_secs disables this message.
 Apr  3 12:41:26 sp2 kernel: clamd D 0D49  2528 15721  1 
  16416 15449 (NOTLB)
 Apr  3 12:41:26 sp2 kernel:e848cf74 0086 8f107b57 0d49 
 30ea2005 e848cf44 c08259d0 0007
 Apr  3 12:41:26 sp2 kernel:e8c6aaa0 8f117848 0d49 fcf1 
  e8c6abac c200cc80 f4f5f3c0
 Apr  3 12:41:26 sp2 kernel:c041f863 0184 c200d620 c2013ac4 
 0020  d887f0a8 f766f0c0
 Apr  3 12:41:26 sp2 kernel: Call Trace:
 Apr  3 12:41:26 sp2 kernel:  [c041f863] default_wake_function+0x0/0xc
 Apr  3 12:41:26 sp2 kernel:  [c048e994] destroy_inode+0x38/0x47
 Apr  3 12:41:26 sp2 kernel:  [c0622f16] 
 rwsem_down_write_failed+0x126/0x141
 Apr  3 12:41:26 sp2 kernel:  [c0439989] .text.lock.rwsem+0x2b/0x3a
 Apr  3 12:41:26 sp2 kernel:  [c046a32b] sys_munmap+0x24/0x41 

 Apr  3 12:41:26 sp2 kernel:  [c0404f4b] syscall_call+0x7/0xb


It sounds like some kind of IO or memory problem.  I would probably
start by running MEMTEST and the basic diagnostic tests provided by
DELL, which if you don't have installed on your disk can be downloaded
in the form of  a CentOS based openmange liveCD from somewhere on the
dell site.  It could also be a disk problem, but from the output you
provide I think I would look for memory or IO bus problems first and
then look for disk problems if you don't find anything with the first
two.  It almost looks like a memory controller problem.

Nataraj

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Centos 5.6 Kernel Panics

2012-04-04 Thread Nataraj
On 04/04/2012 09:31 AM, Nataraj wrote:
 On 04/04/2012 09:16 AM, Jonathan Alstead wrote:
 Hello,

 Recently our dell sc1425 server has been locking up with kernel freezes 
 and required a hard reboot on each occasion. I've looked on the centos 
 forums with limited success - each problem seems slightly different 
 (some failure on high load, some not). Our kernel is 2.6.18-274.17.1.el5 
 and /var/log/messages show the following errors:

 Apr  3 12:41:25 sp2 kernel: INFO: task mysqld:15345 blocked for more 
 than 120 seconds.
 Apr  3 12:41:25 sp2 kernel: echo 0  
 /proc/sys/kernel/hung_task_timeout_secs disables this message.
 Apr  3 12:41:25 sp2 kernel: mysqldD 0CEB  2524 15345  32083 
  15346 15167 (NOTLB)
 Apr  3 12:41:25 sp2 kernel:c50c7f54 0082 bf379c08 0ceb 
 ca9b1648 f43c6c5c  0001
 Apr  3 12:41:25 sp2 kernel:d9d18000 bf384f01 0ceb b2f9 
 0001 d9d1810c c2013ac4 edc5de40
 Apr  3 12:41:25 sp2 kernel:08515c98 c6cb37b8 c2014464 c200cc80 
 0020   
 Apr  3 12:41:25 sp2 kernel: Call Trace:
 Apr  3 12:41:25 sp2 kernel:  [c0622f16] 
 rwsem_down_write_failed+0x126/0x141
 Apr  3 12:41:25 sp2 kernel:  [c0439989] .text.lock.rwsem+0x2b/0x3a
 Apr  3 12:41:25 sp2 kernel:  [c046aa6a] sys_mprotect+0xbd/0x1eb 

 Apr  3 12:41:25 sp2 kernel:  [c0404f4b] syscall_call+0x7/0xb 

 Apr  3 12:41:25 sp2 kernel:  ===
 Apr  3 12:41:25 sp2 kernel: INFO: task clamd:15721 blocked for more than 
 120 seconds.
 Apr  3 12:41:26 sp2 kernel: echo 0  
 /proc/sys/kernel/hung_task_timeout_secs disables this message.
 Apr  3 12:41:26 sp2 kernel: clamd D 0D49  2528 15721  1 
  16416 15449 (NOTLB)
 Apr  3 12:41:26 sp2 kernel:e848cf74 0086 8f107b57 0d49 
 30ea2005 e848cf44 c08259d0 0007
 Apr  3 12:41:26 sp2 kernel:e8c6aaa0 8f117848 0d49 fcf1 
  e8c6abac c200cc80 f4f5f3c0
 Apr  3 12:41:26 sp2 kernel:c041f863 0184 c200d620 c2013ac4 
 0020  d887f0a8 f766f0c0
 Apr  3 12:41:26 sp2 kernel: Call Trace:
 Apr  3 12:41:26 sp2 kernel:  [c041f863] default_wake_function+0x0/0xc
 Apr  3 12:41:26 sp2 kernel:  [c048e994] destroy_inode+0x38/0x47
 Apr  3 12:41:26 sp2 kernel:  [c0622f16] 
 rwsem_down_write_failed+0x126/0x141
 Apr  3 12:41:26 sp2 kernel:  [c0439989] .text.lock.rwsem+0x2b/0x3a
 Apr  3 12:41:26 sp2 kernel:  [c046a32b] sys_munmap+0x24/0x41 

 Apr  3 12:41:26 sp2 kernel:  [c0404f4b] syscall_call+0x7/0xb

 It sounds like some kind of IO or memory problem.  I would probably
 start by running MEMTEST and the basic diagnostic tests provided by
 DELL, which if you don't have installed on your disk can be downloaded
 in the form of  a CentOS based openmange liveCD from somewhere on the
 dell site.  It could also be a disk problem, but from the output you
 provide I think I would look for memory or IO bus problems first and
 then look for disk problems if you don't find anything with the first
 two.  It almost looks like a memory controller problem.

 Nataraj


If for any reason you think the problem started after a kernel upgrade,
then try booting with the previous version of the kernel.

Nataraj

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Centos 5.6 Kernel Panics

2012-04-04 Thread Paul (Crunch)
On 04/04/2012 12:31 PM, Nataraj wrote:
 On 04/04/2012 09:16 AM, Jonathan Alstead wrote:
 Hello,

 Recently our dell sc1425 server has been locking up with kernel freezes
 and required a hard reboot on each occasion. I've looked on the centos
 forums with limited success - each problem seems slightly different
 (some failure on high load, some not). Our kernel is 2.6.18-274.17.1.el5
 and /var/log/messages show the following errors:

 Apr  3 12:41:25 sp2 kernel: INFO: task mysqld:15345 blocked for more
 than 120 seconds.
 Apr  3 12:41:25 sp2 kernel: echo 0
 /proc/sys/kernel/hung_task_timeout_secs disables this message.
 Apr  3 12:41:25 sp2 kernel: mysqldD 0CEB  2524 15345  32083
   15346 15167 (NOTLB)
 Apr  3 12:41:25 sp2 kernel:c50c7f54 0082 bf379c08 0ceb
 ca9b1648 f43c6c5c  0001
 Apr  3 12:41:25 sp2 kernel:d9d18000 bf384f01 0ceb b2f9
 0001 d9d1810c c2013ac4 edc5de40
 Apr  3 12:41:25 sp2 kernel:08515c98 c6cb37b8 c2014464 c200cc80
 0020   
 Apr  3 12:41:25 sp2 kernel: Call Trace:
 Apr  3 12:41:25 sp2 kernel:  [c0622f16]
 rwsem_down_write_failed+0x126/0x141
 Apr  3 12:41:25 sp2 kernel:  [c0439989] .text.lock.rwsem+0x2b/0x3a
 Apr  3 12:41:25 sp2 kernel:  [c046aa6a] sys_mprotect+0xbd/0x1eb

 Apr  3 12:41:25 sp2 kernel:  [c0404f4b] syscall_call+0x7/0xb

 Apr  3 12:41:25 sp2 kernel:  ===
 Apr  3 12:41:25 sp2 kernel: INFO: task clamd:15721 blocked for more than
 120 seconds.
 Apr  3 12:41:26 sp2 kernel: echo 0
 /proc/sys/kernel/hung_task_timeout_secs disables this message.
 Apr  3 12:41:26 sp2 kernel: clamd D 0D49  2528 15721  1
   16416 15449 (NOTLB)
 Apr  3 12:41:26 sp2 kernel:e848cf74 0086 8f107b57 0d49
 30ea2005 e848cf44 c08259d0 0007
 Apr  3 12:41:26 sp2 kernel:e8c6aaa0 8f117848 0d49 fcf1
  e8c6abac c200cc80 f4f5f3c0
 Apr  3 12:41:26 sp2 kernel:c041f863 0184 c200d620 c2013ac4
 0020  d887f0a8 f766f0c0
 Apr  3 12:41:26 sp2 kernel: Call Trace:
 Apr  3 12:41:26 sp2 kernel:  [c041f863] default_wake_function+0x0/0xc
 Apr  3 12:41:26 sp2 kernel:  [c048e994] destroy_inode+0x38/0x47
 Apr  3 12:41:26 sp2 kernel:  [c0622f16]
 rwsem_down_write_failed+0x126/0x141
 Apr  3 12:41:26 sp2 kernel:  [c0439989] .text.lock.rwsem+0x2b/0x3a
 Apr  3 12:41:26 sp2 kernel:  [c046a32b] sys_munmap+0x24/0x41

 Apr  3 12:41:26 sp2 kernel:  [c0404f4b] syscall_call+0x7/0xb

 It sounds like some kind of IO or memory problem.  I would probably
 start by running MEMTEST and the basic diagnostic tests provided by
 DELL, which if you don't have installed on your disk can be downloaded
 in the form of  a CentOS based openmange liveCD from somewhere on the
 dell site.  It could also be a disk problem, but from the output you
 provide I think I would look for memory or IO bus problems first and
 then look for disk problems if you don't find anything with the first
 two.  It almost looks like a memory controller problem.

 Nataraj

 ___
 CentOS mailing list
 CentOS@centos.org
 http://lists.centos.org/mailman/listinfo/centos
I'm inclined to agree with Nataraj. A memory test first and foremost. 
Check for any corruption on the file system. Chances are rare, but the 
on disk kernel could be damaged by data corruption. Unfortunately, I 
don't know of a practical way of testing the buses  and possibly even 
then CPU aside  from swapping hardware out.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Centos 5.6 Kernel Panics

2012-04-04 Thread m . roth
Paul (Crunch) wrote:
 On 04/04/2012 12:31 PM, Nataraj wrote:
 On 04/04/2012 09:16 AM, Jonathan Alstead wrote:
 Hello,

 Recently our dell sc1425 server has been locking up with kernel freezes
 and required a hard reboot on each occasion. I've looked on the centos
 forums with limited success - each problem seems slightly different
 (some failure on high load, some not). Our kernel is
 2.6.18-274.17.1.el5
 and /var/log/messages show the following errors:

 Apr  3 12:41:25 sp2 kernel: INFO: task mysqld:15345 blocked for more
 than 120 seconds.
snip
 Apr  3 12:41:25 sp2 kernel: Call Trace:
 Apr  3 12:41:25 sp2 kernel:  [c0622f16]
 rwsem_down_write_failed+0x126/0x141
 Apr  3 12:41:25 sp2 kernel:  [c0439989] .text.lock.rwsem+0x2b/0x3a
 Apr  3 12:41:25 sp2 kernel:  [c046aa6a] sys_mprotect+0xbd/0x1eb

 Apr  3 12:41:25 sp2 kernel:  [c0404f4b] syscall_call+0x7/0xb

 Apr  3 12:41:25 sp2 kernel:  ===
 Apr  3 12:41:25 sp2 kernel: INFO: task clamd:15721 blocked for more
 than
 120 seconds.
snip
 Apr  3 12:41:26 sp2 kernel: Call Trace:
 Apr  3 12:41:26 sp2 kernel:  [c041f863] default_wake_function+0x0/0xc
 Apr  3 12:41:26 sp2 kernel:  [c048e994] destroy_inode+0x38/0x47
 Apr  3 12:41:26 sp2 kernel:  [c0622f16]
 rwsem_down_write_failed+0x126/0x141
 Apr  3 12:41:26 sp2 kernel:  [c0439989] .text.lock.rwsem+0x2b/0x3a
 Apr  3 12:41:26 sp2 kernel:  [c046a32b] sys_munmap+0x24/0x41

 Apr  3 12:41:26 sp2 kernel:  [c0404f4b] syscall_call+0x7/0xb
snip
Looking at the stack traces, and that two completely separate processes
are being blocked at the same time, I have to suggest another possibility:
the drive that /var is on may be having problems... and if it can't be
written to, then it can't log errors, either.

mark

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos