As some of you may know, I run a simple two node cluster. After much trouble getting one node to start when both reboot, I think that I have it under control with the "ugly hack" described yesterday.
Both run Ubuntu Lucid. I am now trying to wrap up this task by 1) Adding scheduled system maintenance and 2) Adding health watching. I have this plan to do maintenance: Both servers will execute a maintenance.sh shell script, one on the 1st of every month and one of the 15th of every month. The script does the following: 1) Check "health" of other node as follows: *) SSH must work on the other node *) Heartbeat must run on the other node (checked with ssh) *) /proc/drbd should exist and have cs:Connected All of the above steps must be completed in 30 seconds or else health is considered negative. If other node is found to be healthy, proceed with system maintenance. Otherwise send a nastygram to me. 2) System maintenance: *) Restart heartbeat (thus forcing the other node to take over) *) Upgrade Ubuntu with aptitude *) Reboot *) Upon reboot send a confirmation email to me. Does this plan make sense? _______________________________________________ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems