Re: [CentOS] EDAC Kernel Panic 2.6.9-78 and above

2009-10-22 Thread William L. Maltby

On Thu, 2009-10-22 at 04:20 -0400, ken wrote:
> 

> cat /boot/grub/menu.lst
> ...
> title CentOS (2.6.18-164.2.1.el5.plus)
> root (hd0,2)
> kernel /vmlinuz-2.6.18-164.2.1.el5.plus ro
> root=/dev/mapper/luks-3d723b4f-0184-438d-9cb9-9ebff16e683a rhgb quiet
> initrd /initrd-2.6.18-164.2.1.el5.plus.img
> title CentOS (2.6.18-164.el5)
> root (hd0,2)
> kernel /vmlinuz-2.6.18-164.el5 ro
> root=/dev/mapper/luks-3d723b4f-0184-438d-9cb9-9ebff16e683a rhgb quiet
> initrd /initrd-2.6.18-164.el5.img
> title CentOS (2.6.18-128.7.1.el5)
> root (hd0,2)
> kernel /vmlinuz-2.6.18-128.7.1.el5 ro
> root=/dev/mapper/luks-3d723b4f-0184-438d-9cb9-9ebff16e683a rhgb quiet
> initrd /initrd-2.6.18-128.7.1.el5.img
> ...
> 
> If your /boot/grub/menu.lst is similar, then you need only select a
> previously installed kernel at the boot menu.  You can access this via
> your remote KVM setup, yes?
> 
> In the past I've edited menu.lst to change what's booted, i.e., I
> rearranged the order of the stanzas to make the first one, which is the
> default (the one booted if no action is taken at the boot menu), the
> working/desired kernel.

Don't forget htat you can use the default command in the grub.conf, e.g.
"default 1", rather than rearranging all the time.

Use "info grub", select the "index" entry and then look for "default"

> 
> hth,
> ken
> 

-- 
Bill

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] EDAC Kernel Panic 2.6.9-78 and above

2009-10-22 Thread ken

On 10/21/2009 10:21 PM Philip Gwyn wrote:
> On 20-Oct-2009 Michael Schumacher wrote:
>>> I've got a production system running CentOS 4 that was rock solid
>>> until I upgraded from 2.6.9-55 to 2.6.9-78.0.13 (now running
>>> 2.6.9-89.0.11). The system now crashes intermittently after a few
>>> weeks. I finally caught the panic message :
>>> EDAC MC0: INTERNAL ERROR: channel-b out of range (4 >= 4)
>>> Kernel panic - not syncing: MC0: Uncorrected Error
> 
> I have also seen this message or something very close.  The server is 200 km
> away and the person who read it to me over the phone wasn't very fluent in
> English.
> 
> That server has a ASUS DSBF-D12 motherboard.  Kernel was
> 2.6.9-89.0.11.EL.  The crash could happen within hours or even minutes.
> 
> I downgraded to 2.6.9-55.0.9.EL, which doesn't have the i500_edac module.  Now
> that I have a PDU and remote KVM set up, I'm going to try other kernels
> tomorrow.
> 
> -Philip

When I've upgraded a kernel on CentOS, the previous kernel(s) is/are not
removed and in fact remain part of the boot menu, albeit not then the
kernel(s) booted by default.  E.g.,

cat /boot/grub/menu.lst
...
title CentOS (2.6.18-164.2.1.el5.plus)
root (hd0,2)
kernel /vmlinuz-2.6.18-164.2.1.el5.plus ro
root=/dev/mapper/luks-3d723b4f-0184-438d-9cb9-9ebff16e683a rhgb quiet
initrd /initrd-2.6.18-164.2.1.el5.plus.img
title CentOS (2.6.18-164.el5)
root (hd0,2)
kernel /vmlinuz-2.6.18-164.el5 ro
root=/dev/mapper/luks-3d723b4f-0184-438d-9cb9-9ebff16e683a rhgb quiet
initrd /initrd-2.6.18-164.el5.img
title CentOS (2.6.18-128.7.1.el5)
root (hd0,2)
kernel /vmlinuz-2.6.18-128.7.1.el5 ro
root=/dev/mapper/luks-3d723b4f-0184-438d-9cb9-9ebff16e683a rhgb quiet
initrd /initrd-2.6.18-128.7.1.el5.img
...

If your /boot/grub/menu.lst is similar, then you need only select a
previously installed kernel at the boot menu.  You can access this via
your remote KVM setup, yes?

In the past I've edited menu.lst to change what's booted, i.e., I
rearranged the order of the stanzas to make the first one, which is the
default (the one booted if no action is taken at the boot menu), the
working/desired kernel.

hth,
ken
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] EDAC Kernel Panic 2.6.9-78 and above

2009-10-21 Thread Philip Gwyn

On 20-Oct-2009 Michael Schumacher wrote:
>> I've got a production system running CentOS 4 that was rock solid
>> until I upgraded from 2.6.9-55 to 2.6.9-78.0.13 (now running
>> 2.6.9-89.0.11). The system now crashes intermittently after a few
>> weeks. I finally caught the panic message :
> 
>> EDAC MC0: INTERNAL ERROR: channel-b out of range (4 >= 4)
>> Kernel panic - not syncing: MC0: Uncorrected Error

I have also seen this message or something very close.  The server is 200 km
away and the person who read it to me over the phone wasn't very fluent in
English.

That server has a ASUS DSBF-D12 motherboard.  Kernel was
2.6.9-89.0.11.EL.  The crash could happen within hours or even minutes.

I downgraded to 2.6.9-55.0.9.EL, which doesn't have the i500_edac module.  Now
that I have a PDU and remote KVM set up, I'm going to try other kernels
tomorrow.

-Philip

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] EDAC Kernel Panic 2.6.9-78 and above

2009-10-20 Thread Chris Miller
nate wrote:

> Check your bios/system event log for any indication that it
> is logging memory errors? Most modern server class motherboards
> (past 5 years) do this, though not always reliably.


Nothing in the logs, it's a Supermicro X7DVL-E (fyi).


> I've also had trouble with memtest86 myself, I prefer to run
> ctcs:
> 
> http://sourceforge.net/projects/va-ctcs/


README.FIRST scares me. Server is 70 miles away, not feeling really
good about this. I ran memtest86+ last night for 6+ hours and it
came back clean.


> The software is really old and is picky what you build it on,
> if I recall right I could only get it to build on RHEL/CentOS 4
> not 5 (though the binaries work fine on 5).


I just booted the binary from the memtest site under Grub, it worked
fine.

Regards,
Chris
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] EDAC Kernel Panic 2.6.9-78 and above

2009-10-19 Thread Michael Schumacher
 Chris,

> I've got a production system running CentOS 4 that was rock solid
> until I upgraded from 2.6.9-55 to 2.6.9-78.0.13 (now running
> 2.6.9-89.0.11). The system now crashes intermittently after a few
> weeks. I finally caught the panic message :

> EDAC MC0: INTERNAL ERROR: channel-b out of range (4 >= 4)
> Kernel panic - not syncing: MC0: Uncorrected Error

> Looking at the kernel changelog, I see that EDAC support was added
> for the Intel 5000 chipset in 2.6.9-68.20.EL which this server runs.

Same issue here with a machine running centos 5.3. The problem began
with a kernel update that introduced the 5000 chipset. See the thread
"RAM errors after kernel-update" for more details. I couldn't solve
the problem yet, but because the machine crashes every two days with
this kernel, I had to boot an earlier kernel without chipset support.


> I'm trying to determine if this is a potential memory issue, or is
> this related to some other hardware item. Also considering disabling
> EDAC in the kernel (is "noedac" a valid option?) as a last resort. I
> will run memtest86+ on the server as soon as possible to check the
> memory, just formulating my game plan if it's something else.

Don't use the memtest86+ version that comes with the centos ISO. There is
a much newer version available from the authors website. Only the new
version identifies the chipset correctly.
-- 
Mit freundlichen Grüßen
Michael Schumacher
mailto:michael.schumac...@pamas.de


___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] EDAC Kernel Panic 2.6.9-78 and above

2009-10-19 Thread nate
Chris Miller wrote:

> Thoughts?

Check your bios/system event log for any indication that it
is logging memory errors? Most modern server class motherboards
(past 5 years) do this, though not always reliably.

I've also had trouble with memtest86 myself, I prefer to run
ctcs:

http://sourceforge.net/projects/va-ctcs/

The software is really old and is picky what you build it on,
if I recall right I could only get it to build on RHEL/CentOS 4
not 5 (though the binaries work fine on 5).

It does a good torture test which in my experience can find
problems faster than memtest86(which can take days).

nate


___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos