Hi everyone,

I'd like to start a discussion about what people with lots of HP servers and RHEL5 do to investiguate crashes and take crashdumps for analysis. I guess I'm not the only one in this boat and perhaps there are others out there with good practices that they'd be willing to share..

We have about everything intel/amd from the G1's to the G7's (DL360 to DL580 systems) - thousands- and we're having a hard time getting reliable crashdumps (The Solaris guys mock us because -they- always get a crashdump when there's a panic of some kind). We -do- have kdump properly configured, of course, so at least that is not an issue. The issue is having RHEL detect a hang and reliably take a dump..

Over the course of the years, we've run into spurrious NMI's, cciss bugs and a lot of other issues. As a result we're running with most of the sysctl panic stuff disabled:

# sysctl -a|grep panic
vm.panic_on_oom = 0
kernel.hung_task_panic = 0
kernel.softlockup_panic = 0
kernel.panic_on_unrecovered_nmi = 0
kernel.unknown_nmi_panic = 0
kernel.panic_on_oops = 1
kernel.panic = 0

In RHEL6, RedHat officialled introduced kmod-hpwdt (HP Watchdog timer) that interacts with an iLo2 or iLo3 remote controller to initiate a crash when the timer expires. As per the developpers' recomendation, and because I tought it would be a 'nice-to-have' (tm) I've backported that module to RHEL5 and asked for an RFE that RedHat officially backports it to RHEL5. My rpms are here:

http://vscojot.free.fr/dist/kmod-hpwdt/hpwdt-1.2.0/RHEL5/SRPMS/hpwdt-1.2.0-2.el5_4.src.rpm
http://vscojot.free.fr/dist/kmod-hpwdt/hpwdt-1.2.0/RHEL5/i386
http://vscojot.free.fr/dist/kmod-hpwdt/hpwdt-1.2.0/RHEL5/i386/kmod-hpwdt-1.2.0-2.el5_4.i686.rpm
http://vscojot.free.fr/dist/kmod-hpwdt/hpwdt-1.2.0/RHEL5/i386/kmod-hpwdt-PAE-1.2.0-2.el5_4.i686.rpm
http://vscojot.free.fr/dist/kmod-hpwdt/hpwdt-1.2.0/RHEL5/i386/kmod-hpwdt-xen-1.2.0-2.el5_4.i686.rpm
http://vscojot.free.fr/dist/kmod-hpwdt/hpwdt-1.2.0/RHEL5/x86_64
http://vscojot.free.fr/dist/kmod-hpwdt/hpwdt-1.2.0/RHEL5/x86_64/kmod-hpwdt-1.2.0-2.el5_4.x86_64.rpm
http://vscojot.free.fr/dist/kmod-hpwdt/hpwdt-1.2.0/RHEL5/x86_64/kmod-hpwdt-xen-1.2.0-2.el5_4.x86_64.rpm

I'm just wondering what other people here are doing. Do you trust the NMI panic stuff or do you use the HP hpwdt-1.1.3 rpm's (they require you to have a compiler on your system)? Do you panic on OOM?
Any recommandations, good or bad?

Best regards,

Vincent

_______________________________________________
rhelv5-list mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/rhelv5-list

Reply via email to