Hi,

I'm running into weird problems on a Heartbeat v1 cluster: Heartbeat 
restarts itself with the message:

heartbeat[2419]: 2010/01/22_06:30:35 WARN: Exiting HBREAD process 3272 
killed by signal 24 [SIGXCPU - CPU limit exceeded].
heartbeat[2419]: 2010/01/22_06:30:35 ERROR: Exiting HBREAD process 3272 
dumped core
heartbeat[2419]: 2010/01/22_06:30:35 ERROR: Core heartbeat process died! 
Restarting.

I've read that this could be due to debugging being turned on, however 
it continues even after I set debug 0. The heartbeat version is 
2.1.2-2.fc8 (Fedora Core 8 x86_64) running on Dell PE 2950's. Yeah I 
know the version is old and buggy, but for a simple v1 config we've 
never run into any problems.

The coredump doesn't provide any useful info:

Core was generated by `heartbeat: read: seri'.
Program terminated with signal 24, CPU time limit exceeded.
#0  0x0000003b0b2c6e00 in ?? ()

There are several "ttyS0: 1 input overrun(s)" messages on the active 
node (the Heartbeat restarts happened fortunately on the passive node). 
I've speculated it could be connected with the serial port 
communication, however tweaking the baud rate and keepalive interval 
didn't matter. The cable is completely crossed and was tested.

cat /proc/tty/driver/serial
serinfo:1.0 driver revision:
0: uart:16550A port:000003F8 irq:4 tx:1693465738 rx:1692970683 fe:6586 
brk:190 oe:114
1: uart:16550A port:000002F8 irq:3 tx:0 rx:0 CTS
2: uart:unknown port:000003E8 irq:4
3: uart:unknown port:000002E8 irq:3

setserial /dev/ttyS0
/dev/ttyS0, UART: 16550A, Port: 0x03f8, IRQ: 4

I turned off the serial line in ha.cf (interestingly I stopped seeing 
serial in /proc/interrupts afterwards) to see if that will help.

cat /proc/interrupts (active node)
            CPU0       CPU1
   0:        147          1   IO-APIC-edge      timer
   1:          1          1   IO-APIC-edge      i8042
   8:          1          0   IO-APIC-edge      rtc
   9:          0          0   IO-APIC-fasteoi   acpi
  12:          2          2   IO-APIC-edge      i8042
  14:   18637949         57   IO-APIC-edge      libata
  15:          0          0   IO-APIC-edge      libata
  20:     233135         33   IO-APIC-fasteoi   uhci_hcd:usb3
  21:      20703         13   IO-APIC-fasteoi   ehci_hcd:usb1, 
uhci_hcd:usb2, uhci_hcd:usb4
  78:  279961508       5436   IO-APIC-fasteoi   megasas
2292:  711142718         80   PCI-MSI-edge      eth0
2293:          3          3   PCI-MSI-edge      eth2
2294:      25604  462664078   PCI-MSI-edge      eth1
NMI:          0          0
LOC: 3753255022   16088395
ERR:          0

cat /proc/interrupts (passive node)
            CPU0       CPU1
   0:        148          1   IO-APIC-edge      timer
   1:          1          1   IO-APIC-edge      i8042
   8:          1          0   IO-APIC-edge      rtc
   9:          0          0   IO-APIC-fasteoi   acpi
  12:          2          2   IO-APIC-edge      i8042
  14:  136694263         55   IO-APIC-edge      libata
  15:          0          0   IO-APIC-edge      libata
  20:         31    2148038   IO-APIC-fasteoi   uhci_hcd:usb3
  21:     196116         11   IO-APIC-fasteoi   ehci_hcd:usb1, 
uhci_hcd:usb2, uhci_hcd:usb4
  78:  251713976       5915   IO-APIC-fasteoi   megasas
2292: 1859784733        103   PCI-MSI-edge      eth0
2293:          2          2   PCI-MSI-edge      eth2
2294:      12833  475741619   PCI-MSI-edge      eth1
NMI:          0          0
LOC: 1892627058  940684479
ERR:          0



/etc/ha.d/ha.cf:
keepalive 5
deadtime 20
warntime 10
initdead 40
udpport 694
bcast   bond0           # Linux
auto_failback off
node    mwcls1
node    mwcls2
debug 0
use_logd yes
conn_logd_time 10
compression     bz2


-- 
Peter LUCIAK ([email protected])
IBL Software Engineering, http://www.iblsoft.com/
Mierová 103, 82105 Bratislava, Slovakia
Phone: +421-2-32662111, Fax: +421-2-32662110
Direct: +421-2-32662175
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to