----- Original Message ----- > Hi, > > I'm running a HA overcloud configuration and as far as I'm aware, there is > currently no mechanism in place for restarting failed nodes in the cluster. > Originally, I had been wondering if we would use a corosync/pacemaker > cluster across the control plane with STONITH resources configured for each > node (a STONITH plugin for Ironic could be written).
I know some people are starting to look at how to use pacemaker for fencing/ recovery with TripleO, but I'm not aware of any proposals yet. I'm sure as soon as that is published it will hit this list. >This might be fine if a > corosync/pacemaker stack is already being used for HA of some components, > but it seems overkill otherwise. There is a pending patch to add support for using pacemaker to deal with A/P services: e.g. https://review.openstack.org/#/c/105397/ I'd expect additional patches like this in the future. >The undercloud heat could be in a good > position to restart the overcloud nodes -- is that the plan or are there > other options being considered? > > Thanks, > Tom > > _______________________________________________ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > _______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev