On Tue, Aug 12, 2014 at 04:56:06PM +1000, Andrew Beekhof wrote: > > What is ganeti doing with the information though? > Like GFS2, OCFS2 and the dlm, it might be more appropriate for it to get > membership information directly from corosync.
ganeti wants the ganeti-node-role resource to run on all nodes, but it only performs any action on the master node. It expects to receive notifications when a node is down and then sets its internal state as offline. The only OCF action that it ipmlements is 'notify'. When its invoked in this mode it does this: notify_action() { is_master || exit 0 [[ -f $NORUNFILE ]] && exit 0 # TODO: also implement the "start" operation for readding a node [[ $OCF_RESKEY_CRM_meta_notify_operation == "stop" ]] || exit 0 [[ $OCF_RESKEY_CRM_meta_notify_type == "post" ]] || exit 0 local -r target=$OCF_RESKEY_CRM_meta_notify_stop_uname local -r node=$(gnt-node list --no-headers -o name $target) # TODO: use drain_node when we can offline_node $node exit 0 } ganeti provides a harep utility that will perform actions to heal a cluster when a node is marked offline. All I need is some way to mark a node offline when it is down/fenced. This has to be done from the master. Master failover works, so if the master is down pacemaker will promote one of the other nodes to master. There are two cases: 1. A node that is not the master is down. On the master mark the node as offline and harep will do the rest. 2. A node that is the master is down. Pacemaker will start the master on another node (this works), the new master will mark the old master as offline, and then harep will do the rest. -- Steve Feehan [Contractor] _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org