On 5/17/2011 8:25 AM, Sascha Hagedorn wrote: > Hi everyone, ... > - Pulled the HA network cable > - Put it back after a couple of seconds > > Result: > > - Node 2 is being restarted > - Load average on Node 1 increases until the system becomes > unreachable > - A massive amount of log messages are being produced by OCFS2 on > the surviving node (see below) > - The DRBD partition is not accessible > - Node 1 cannot be rebooted only a hard reset brings it back to life
FWIW I've seen it on heartbeat-2.1.4 centos 5, too. Something sends it into a tight loop spewing tons of log messages and using all CPU cycles it can get. RHEL-5 kernels have a tendency to time out sockets under high CPU load, so that effectively kills the entire cluster. Dima _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
