On Tue, 18 Sep 2012, Dejan Muhamedagic wrote: > Hi, > > On Tue, Sep 18, 2012 at 11:18:54AM +0200, Fernando Pereira wrote: >> Hi there. >> This is my first post for this list as I haven't had problems with >> heartbeat, until now :) >> >> We have a dual server fail-back configuration in place, in which the two >> servers have identical resources (nfs, drbd...). >> Last week I upgraded a system and replaced one of the servers by a virtual >> machine and installed the latest available version of heartbeat available >> via yum (3.0.4). >> >> Since then Im having a lot of problems with "Late heartbeat" and false dead >> nodes. Before we could have a "Dead time" of 10sec, while now 30 is not >> enough. >> >> Looking into the log files I could find the following entry, among other >> similar: >> "Gmain_timeout_dispatch: Dispatch function for send local status was >> delayed 30590 ms (> 1010 ms) before being called (GSource: 0x14209a0)" >> >> I guess it means that for some reason the function call took over 30 >> seconds?? >> In my understanding this number is, at least, three orders of magnitude >> higher than any acceptable value, even under the worst machine load >> scenarios. >> Is there a known problem with this version of heartbeat? Or does anybody >> experiences this kind of problems when running over a virtual machine (ESXi >> 5.0)? > > I'd suspect a scheduler issue. The VM is probably starved, hence > that long delays. You should check the vmware docs or forums.
I've seen similar logs with real hardware when the system is overheating and the CPU gets paused by the thermal protection circuits. I would agree that the host system is probably badly oversubscribed and so the VM is getting starved. David Lang _______________________________________________ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems