As some of you may know, I run a simple two node cluster. After much
trouble getting one node to start when both reboot, I think that I
have it under control with the "ugly hack" described yesterday.

Both run Ubuntu Lucid.

I am now trying to wrap up this task by 1) Adding scheduled system
maintenance and 2) Adding health watching.

I have this plan to do maintenance:

Both servers will execute a maintenance.sh shell script, one on the
1st of every month and one of the 15th of every month.

The script does the following:

1) Check "health" of other node as follows:
  *) SSH must work on the other node
  *) Heartbeat must run on the other node (checked with ssh)
  *) /proc/drbd should exist and have cs:Connected

All of the above steps must be completed in 30 seconds or else health
is considered negative.

If other node is found to be healthy, proceed with system maintenance.
Otherwise send a nastygram to me.

2) System maintenance:
 *) Restart heartbeat (thus forcing the other node to take over)
 *) Upgrade Ubuntu with aptitude
 *) Reboot
 *) Upon reboot send a confirmation email to me.

Does this plan make sense?
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to