On 27 Jun 2012, at 00:39, Andreas Kurz wrote:

> If the network is working as expected again, Heartbeat should reconnect
> automatically ... if not, restart Heartbeat if you are confident the
> network problem is solved.

I finally arranged for possible downtime to permit me to try this. I restarted 
heartbeat on one node and it fell offline. I rebooted it and it came back, but 
heartbeat returned to the same split-brain state where neither node could see 
the other.

After some rummaging I found what the problem was: an ipaddr2 resource had been 
configured using one nodes primary static IP, which had been migrated to the 
other node, resulting in it falling offline, but making it look like it was up 
because it was pointing at the wrong node! Not pretty.

I then found I couldn't delete the incorrect ip resource as it refused to stop 
- is there some way to force stop/delete? Once I'd resolved that, I ran into 
problems getting pacemaker to start - heartbeat processes were ok, but not the 
pacemaker ones like cib. Some reboots and networking restarts eventually solved 
that.

This setup is running heartbeat 3.0.5 and pacemaker 1.1.6 from the 
ubuntu-ha-maintainers ppa. Is corosync generally more robust than heartbeat? 
Would it be worth upgrading to it?

Marcus
-- 
Marcus Bointon
Synchromedia Limited: Creators of http://www.smartmessages.net/
UK info@hand CRM solutions
mar...@synchromedia.co.uk | http://www.synchromedia.co.uk/



_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to