[CentOS] Centos 5.6 Kernel Panics
Hello, Recently our dell sc1425 server has been locking up with kernel freezes and required a hard reboot on each occasion. I've looked on the centos forums with limited success - each problem seems slightly different (some failure on high load, some not). Our kernel is 2.6.18-274.17.1.el5 and /var/log/messages show the following errors: Apr 3 12:41:25 sp2 kernel: INFO: task mysqld:15345 blocked for more than 120 seconds. Apr 3 12:41:25 sp2 kernel: echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. Apr 3 12:41:25 sp2 kernel: mysqldD 0CEB 2524 15345 32083 15346 15167 (NOTLB) Apr 3 12:41:25 sp2 kernel:c50c7f54 0082 bf379c08 0ceb ca9b1648 f43c6c5c 0001 Apr 3 12:41:25 sp2 kernel:d9d18000 bf384f01 0ceb b2f9 0001 d9d1810c c2013ac4 edc5de40 Apr 3 12:41:25 sp2 kernel:08515c98 c6cb37b8 c2014464 c200cc80 0020 Apr 3 12:41:25 sp2 kernel: Call Trace: Apr 3 12:41:25 sp2 kernel: [c0622f16] rwsem_down_write_failed+0x126/0x141 Apr 3 12:41:25 sp2 kernel: [c0439989] .text.lock.rwsem+0x2b/0x3a Apr 3 12:41:25 sp2 kernel: [c046aa6a] sys_mprotect+0xbd/0x1eb Apr 3 12:41:25 sp2 kernel: [c0404f4b] syscall_call+0x7/0xb Apr 3 12:41:25 sp2 kernel: === Apr 3 12:41:25 sp2 kernel: INFO: task clamd:15721 blocked for more than 120 seconds. Apr 3 12:41:26 sp2 kernel: echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. Apr 3 12:41:26 sp2 kernel: clamd D 0D49 2528 15721 1 16416 15449 (NOTLB) Apr 3 12:41:26 sp2 kernel:e848cf74 0086 8f107b57 0d49 30ea2005 e848cf44 c08259d0 0007 Apr 3 12:41:26 sp2 kernel:e8c6aaa0 8f117848 0d49 fcf1 e8c6abac c200cc80 f4f5f3c0 Apr 3 12:41:26 sp2 kernel:c041f863 0184 c200d620 c2013ac4 0020 d887f0a8 f766f0c0 Apr 3 12:41:26 sp2 kernel: Call Trace: Apr 3 12:41:26 sp2 kernel: [c041f863] default_wake_function+0x0/0xc Apr 3 12:41:26 sp2 kernel: [c048e994] destroy_inode+0x38/0x47 Apr 3 12:41:26 sp2 kernel: [c0622f16] rwsem_down_write_failed+0x126/0x141 Apr 3 12:41:26 sp2 kernel: [c0439989] .text.lock.rwsem+0x2b/0x3a Apr 3 12:41:26 sp2 kernel: [c046a32b] sys_munmap+0x24/0x41 Apr 3 12:41:26 sp2 kernel: [c0404f4b] syscall_call+0x7/0xb Any advice would be appreciated. regards, Jon ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] Centos 5.6 Kernel Panics
On 04/04/2012 09:16 AM, Jonathan Alstead wrote: Hello, Recently our dell sc1425 server has been locking up with kernel freezes and required a hard reboot on each occasion. I've looked on the centos forums with limited success - each problem seems slightly different (some failure on high load, some not). Our kernel is 2.6.18-274.17.1.el5 and /var/log/messages show the following errors: Apr 3 12:41:25 sp2 kernel: INFO: task mysqld:15345 blocked for more than 120 seconds. Apr 3 12:41:25 sp2 kernel: echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. Apr 3 12:41:25 sp2 kernel: mysqldD 0CEB 2524 15345 32083 15346 15167 (NOTLB) Apr 3 12:41:25 sp2 kernel:c50c7f54 0082 bf379c08 0ceb ca9b1648 f43c6c5c 0001 Apr 3 12:41:25 sp2 kernel:d9d18000 bf384f01 0ceb b2f9 0001 d9d1810c c2013ac4 edc5de40 Apr 3 12:41:25 sp2 kernel:08515c98 c6cb37b8 c2014464 c200cc80 0020 Apr 3 12:41:25 sp2 kernel: Call Trace: Apr 3 12:41:25 sp2 kernel: [c0622f16] rwsem_down_write_failed+0x126/0x141 Apr 3 12:41:25 sp2 kernel: [c0439989] .text.lock.rwsem+0x2b/0x3a Apr 3 12:41:25 sp2 kernel: [c046aa6a] sys_mprotect+0xbd/0x1eb Apr 3 12:41:25 sp2 kernel: [c0404f4b] syscall_call+0x7/0xb Apr 3 12:41:25 sp2 kernel: === Apr 3 12:41:25 sp2 kernel: INFO: task clamd:15721 blocked for more than 120 seconds. Apr 3 12:41:26 sp2 kernel: echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. Apr 3 12:41:26 sp2 kernel: clamd D 0D49 2528 15721 1 16416 15449 (NOTLB) Apr 3 12:41:26 sp2 kernel:e848cf74 0086 8f107b57 0d49 30ea2005 e848cf44 c08259d0 0007 Apr 3 12:41:26 sp2 kernel:e8c6aaa0 8f117848 0d49 fcf1 e8c6abac c200cc80 f4f5f3c0 Apr 3 12:41:26 sp2 kernel:c041f863 0184 c200d620 c2013ac4 0020 d887f0a8 f766f0c0 Apr 3 12:41:26 sp2 kernel: Call Trace: Apr 3 12:41:26 sp2 kernel: [c041f863] default_wake_function+0x0/0xc Apr 3 12:41:26 sp2 kernel: [c048e994] destroy_inode+0x38/0x47 Apr 3 12:41:26 sp2 kernel: [c0622f16] rwsem_down_write_failed+0x126/0x141 Apr 3 12:41:26 sp2 kernel: [c0439989] .text.lock.rwsem+0x2b/0x3a Apr 3 12:41:26 sp2 kernel: [c046a32b] sys_munmap+0x24/0x41 Apr 3 12:41:26 sp2 kernel: [c0404f4b] syscall_call+0x7/0xb It sounds like some kind of IO or memory problem. I would probably start by running MEMTEST and the basic diagnostic tests provided by DELL, which if you don't have installed on your disk can be downloaded in the form of a CentOS based openmange liveCD from somewhere on the dell site. It could also be a disk problem, but from the output you provide I think I would look for memory or IO bus problems first and then look for disk problems if you don't find anything with the first two. It almost looks like a memory controller problem. Nataraj ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] Centos 5.6 Kernel Panics
On 04/04/2012 09:31 AM, Nataraj wrote: On 04/04/2012 09:16 AM, Jonathan Alstead wrote: Hello, Recently our dell sc1425 server has been locking up with kernel freezes and required a hard reboot on each occasion. I've looked on the centos forums with limited success - each problem seems slightly different (some failure on high load, some not). Our kernel is 2.6.18-274.17.1.el5 and /var/log/messages show the following errors: Apr 3 12:41:25 sp2 kernel: INFO: task mysqld:15345 blocked for more than 120 seconds. Apr 3 12:41:25 sp2 kernel: echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. Apr 3 12:41:25 sp2 kernel: mysqldD 0CEB 2524 15345 32083 15346 15167 (NOTLB) Apr 3 12:41:25 sp2 kernel:c50c7f54 0082 bf379c08 0ceb ca9b1648 f43c6c5c 0001 Apr 3 12:41:25 sp2 kernel:d9d18000 bf384f01 0ceb b2f9 0001 d9d1810c c2013ac4 edc5de40 Apr 3 12:41:25 sp2 kernel:08515c98 c6cb37b8 c2014464 c200cc80 0020 Apr 3 12:41:25 sp2 kernel: Call Trace: Apr 3 12:41:25 sp2 kernel: [c0622f16] rwsem_down_write_failed+0x126/0x141 Apr 3 12:41:25 sp2 kernel: [c0439989] .text.lock.rwsem+0x2b/0x3a Apr 3 12:41:25 sp2 kernel: [c046aa6a] sys_mprotect+0xbd/0x1eb Apr 3 12:41:25 sp2 kernel: [c0404f4b] syscall_call+0x7/0xb Apr 3 12:41:25 sp2 kernel: === Apr 3 12:41:25 sp2 kernel: INFO: task clamd:15721 blocked for more than 120 seconds. Apr 3 12:41:26 sp2 kernel: echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. Apr 3 12:41:26 sp2 kernel: clamd D 0D49 2528 15721 1 16416 15449 (NOTLB) Apr 3 12:41:26 sp2 kernel:e848cf74 0086 8f107b57 0d49 30ea2005 e848cf44 c08259d0 0007 Apr 3 12:41:26 sp2 kernel:e8c6aaa0 8f117848 0d49 fcf1 e8c6abac c200cc80 f4f5f3c0 Apr 3 12:41:26 sp2 kernel:c041f863 0184 c200d620 c2013ac4 0020 d887f0a8 f766f0c0 Apr 3 12:41:26 sp2 kernel: Call Trace: Apr 3 12:41:26 sp2 kernel: [c041f863] default_wake_function+0x0/0xc Apr 3 12:41:26 sp2 kernel: [c048e994] destroy_inode+0x38/0x47 Apr 3 12:41:26 sp2 kernel: [c0622f16] rwsem_down_write_failed+0x126/0x141 Apr 3 12:41:26 sp2 kernel: [c0439989] .text.lock.rwsem+0x2b/0x3a Apr 3 12:41:26 sp2 kernel: [c046a32b] sys_munmap+0x24/0x41 Apr 3 12:41:26 sp2 kernel: [c0404f4b] syscall_call+0x7/0xb It sounds like some kind of IO or memory problem. I would probably start by running MEMTEST and the basic diagnostic tests provided by DELL, which if you don't have installed on your disk can be downloaded in the form of a CentOS based openmange liveCD from somewhere on the dell site. It could also be a disk problem, but from the output you provide I think I would look for memory or IO bus problems first and then look for disk problems if you don't find anything with the first two. It almost looks like a memory controller problem. Nataraj If for any reason you think the problem started after a kernel upgrade, then try booting with the previous version of the kernel. Nataraj ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] Centos 5.6 Kernel Panics
On 04/04/2012 12:31 PM, Nataraj wrote: On 04/04/2012 09:16 AM, Jonathan Alstead wrote: Hello, Recently our dell sc1425 server has been locking up with kernel freezes and required a hard reboot on each occasion. I've looked on the centos forums with limited success - each problem seems slightly different (some failure on high load, some not). Our kernel is 2.6.18-274.17.1.el5 and /var/log/messages show the following errors: Apr 3 12:41:25 sp2 kernel: INFO: task mysqld:15345 blocked for more than 120 seconds. Apr 3 12:41:25 sp2 kernel: echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. Apr 3 12:41:25 sp2 kernel: mysqldD 0CEB 2524 15345 32083 15346 15167 (NOTLB) Apr 3 12:41:25 sp2 kernel:c50c7f54 0082 bf379c08 0ceb ca9b1648 f43c6c5c 0001 Apr 3 12:41:25 sp2 kernel:d9d18000 bf384f01 0ceb b2f9 0001 d9d1810c c2013ac4 edc5de40 Apr 3 12:41:25 sp2 kernel:08515c98 c6cb37b8 c2014464 c200cc80 0020 Apr 3 12:41:25 sp2 kernel: Call Trace: Apr 3 12:41:25 sp2 kernel: [c0622f16] rwsem_down_write_failed+0x126/0x141 Apr 3 12:41:25 sp2 kernel: [c0439989] .text.lock.rwsem+0x2b/0x3a Apr 3 12:41:25 sp2 kernel: [c046aa6a] sys_mprotect+0xbd/0x1eb Apr 3 12:41:25 sp2 kernel: [c0404f4b] syscall_call+0x7/0xb Apr 3 12:41:25 sp2 kernel: === Apr 3 12:41:25 sp2 kernel: INFO: task clamd:15721 blocked for more than 120 seconds. Apr 3 12:41:26 sp2 kernel: echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. Apr 3 12:41:26 sp2 kernel: clamd D 0D49 2528 15721 1 16416 15449 (NOTLB) Apr 3 12:41:26 sp2 kernel:e848cf74 0086 8f107b57 0d49 30ea2005 e848cf44 c08259d0 0007 Apr 3 12:41:26 sp2 kernel:e8c6aaa0 8f117848 0d49 fcf1 e8c6abac c200cc80 f4f5f3c0 Apr 3 12:41:26 sp2 kernel:c041f863 0184 c200d620 c2013ac4 0020 d887f0a8 f766f0c0 Apr 3 12:41:26 sp2 kernel: Call Trace: Apr 3 12:41:26 sp2 kernel: [c041f863] default_wake_function+0x0/0xc Apr 3 12:41:26 sp2 kernel: [c048e994] destroy_inode+0x38/0x47 Apr 3 12:41:26 sp2 kernel: [c0622f16] rwsem_down_write_failed+0x126/0x141 Apr 3 12:41:26 sp2 kernel: [c0439989] .text.lock.rwsem+0x2b/0x3a Apr 3 12:41:26 sp2 kernel: [c046a32b] sys_munmap+0x24/0x41 Apr 3 12:41:26 sp2 kernel: [c0404f4b] syscall_call+0x7/0xb It sounds like some kind of IO or memory problem. I would probably start by running MEMTEST and the basic diagnostic tests provided by DELL, which if you don't have installed on your disk can be downloaded in the form of a CentOS based openmange liveCD from somewhere on the dell site. It could also be a disk problem, but from the output you provide I think I would look for memory or IO bus problems first and then look for disk problems if you don't find anything with the first two. It almost looks like a memory controller problem. Nataraj ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos I'm inclined to agree with Nataraj. A memory test first and foremost. Check for any corruption on the file system. Chances are rare, but the on disk kernel could be damaged by data corruption. Unfortunately, I don't know of a practical way of testing the buses and possibly even then CPU aside from swapping hardware out. ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] Centos 5.6 Kernel Panics
Paul (Crunch) wrote: On 04/04/2012 12:31 PM, Nataraj wrote: On 04/04/2012 09:16 AM, Jonathan Alstead wrote: Hello, Recently our dell sc1425 server has been locking up with kernel freezes and required a hard reboot on each occasion. I've looked on the centos forums with limited success - each problem seems slightly different (some failure on high load, some not). Our kernel is 2.6.18-274.17.1.el5 and /var/log/messages show the following errors: Apr 3 12:41:25 sp2 kernel: INFO: task mysqld:15345 blocked for more than 120 seconds. snip Apr 3 12:41:25 sp2 kernel: Call Trace: Apr 3 12:41:25 sp2 kernel: [c0622f16] rwsem_down_write_failed+0x126/0x141 Apr 3 12:41:25 sp2 kernel: [c0439989] .text.lock.rwsem+0x2b/0x3a Apr 3 12:41:25 sp2 kernel: [c046aa6a] sys_mprotect+0xbd/0x1eb Apr 3 12:41:25 sp2 kernel: [c0404f4b] syscall_call+0x7/0xb Apr 3 12:41:25 sp2 kernel: === Apr 3 12:41:25 sp2 kernel: INFO: task clamd:15721 blocked for more than 120 seconds. snip Apr 3 12:41:26 sp2 kernel: Call Trace: Apr 3 12:41:26 sp2 kernel: [c041f863] default_wake_function+0x0/0xc Apr 3 12:41:26 sp2 kernel: [c048e994] destroy_inode+0x38/0x47 Apr 3 12:41:26 sp2 kernel: [c0622f16] rwsem_down_write_failed+0x126/0x141 Apr 3 12:41:26 sp2 kernel: [c0439989] .text.lock.rwsem+0x2b/0x3a Apr 3 12:41:26 sp2 kernel: [c046a32b] sys_munmap+0x24/0x41 Apr 3 12:41:26 sp2 kernel: [c0404f4b] syscall_call+0x7/0xb snip Looking at the stack traces, and that two completely separate processes are being blocked at the same time, I have to suggest another possibility: the drive that /var is on may be having problems... and if it can't be written to, then it can't log errors, either. mark ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos