On 27 Jun 2012, at 00:39, Andreas Kurz wrote: > If the network is working as expected again, Heartbeat should reconnect > automatically ... if not, restart Heartbeat if you are confident the > network problem is solved.
I finally arranged for possible downtime to permit me to try this. I restarted heartbeat on one node and it fell offline. I rebooted it and it came back, but heartbeat returned to the same split-brain state where neither node could see the other. After some rummaging I found what the problem was: an ipaddr2 resource had been configured using one nodes primary static IP, which had been migrated to the other node, resulting in it falling offline, but making it look like it was up because it was pointing at the wrong node! Not pretty. I then found I couldn't delete the incorrect ip resource as it refused to stop - is there some way to force stop/delete? Once I'd resolved that, I ran into problems getting pacemaker to start - heartbeat processes were ok, but not the pacemaker ones like cib. Some reboots and networking restarts eventually solved that. This setup is running heartbeat 3.0.5 and pacemaker 1.1.6 from the ubuntu-ha-maintainers ppa. Is corosync generally more robust than heartbeat? Would it be worth upgrading to it? Marcus -- Marcus Bointon Synchromedia Limited: Creators of http://www.smartmessages.net/ UK info@hand CRM solutions mar...@synchromedia.co.uk | http://www.synchromedia.co.uk/ _______________________________________________ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems