Hi! I configured my nodes *not* to auto failback after a defective node comes back online. This worked nicely for a while, but now it doesn't (and, honestly, I do not know what was changed in the meantime).
What we do: We disconnect the two (virtual) interfaces of our node mgmt01 (running on vmware esxi) by means of the vsphere client. Node mgmt02 takes over the services as it should. When node mgmt01's interfaces are switched on again, everything looks alright for a minute or two, but then mgmt01 takes over the resources again. Which it should not. Here's the relevant sniplet of the configuration (full config below): location nag_loc nag_grp 100: ipfuie-mgmt01 property default-resource-stickiness="100" I thought, that because the resource-stickiness has the same value as the location constrain, the resources would stick to the node they are started on. Am I wrong? Is there any other way to let resources by default start on mgmt01 (make mgmt01 the default preferred node), but don't allow resources to migrate back after the cluster is complete again after a split brain? Thanks for your input, Andreas PS: Full config below: node ipfuie-mgmt01 node ipfuie-mgmt02 primitive ajaxterm lsb:ajaxterm \ op monitor interval="15s" \ op start interval="0" timeout="30s" \ op stop interval="0" timeout="30s" primitive drbd_r0 ocf:linbit:drbd \ params drbd_resource="r0" \ op monitor interval="15s" \ op start interval="0" timeout="240s" \ op stop interval="0" timeout="100s" primitive fs_r0 ocf:heartbeat:Filesystem \ params device="/dev/drbd0" directory="/drbd" fstype="ext4" \ op start interval="0" timeout="60s" \ op stop interval="0" timeout="60s" primitive nagios_res lsb:nagios \ op monitor interval="1min" \ op start interval="0" timeout="1min" \ op stop interval="0" timeout="1min" primitive pingy_res ocf:pacemaker:ping \ params dampen="5s" multiplier="1000" host_list="10.10.10.205 10.10.10.206 10.10.10.254" \ op monitor interval="60s" timeout="60s" \ op start interval="0" timeout="60s" primitive sharedIP ocf:heartbeat:IPaddr2 \ params ip="10.10.10.204" cidr_netmask="255.255.252.0" nic="eth0:0" primitive web_res ocf:heartbeat:apache \ params configfile="/etc/apache2/httpd.conf" \ params httpd="/usr/sbin/httpd2-prefork" \ params testregex="body" statusurl="http://localhost/server-status" \ op start interval="0" timeout="40s" \ op stop interval="0" timeout="60s" \ op monitor interval="1min" group nag_grp fs_r0 sharedIP web_res nagios_res ajaxterm \ meta target-role="Started" ms ms_drbd_r0 drbd_r0 \ meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" target-role="Started" clone pingy_clone pingy_res \ meta target-role="Started" location cli-prefer-nag_grp nag_grp \ rule $id="cli-prefer-rule-nag_grp" inf: #uname eq ipfuie-mgmt01 and #uname eq ipfuie-mgmt01 location nag_loc nag_grp 100: ipfuie-mgmt01 location only-if-connected nag_grp \ rule $id="only-if-connected-rule" -inf: not_defined pingd or pingd lte 1500 colocation nag_grp-only-on-master inf: nag_grp ms_drbd_r0:Master order apache-after-ip inf: sharedIP web_res order nag_grp-after-drbd inf: ms_drbd_r0:promote nag_grp:start order nagios-after-apache inf: web_res nagios_res property $id="cib-bootstrap-options" \ stonith-enabled="false" \ no-quorum-policy="ignore" \ stonith-action="poweroff" \ default-resource-stickiness="100" \ dc-version="1.1.2-8b9ec9ccc5060457ac761dce1de719af86895b10" \ cluster-infrastructure="openais" \ expected-quorum-votes="2" \ stop-all-resources="false" \ last-lrm-refresh="1303825164" ------------------------ CONET Solutions GmbH, Theodor-Heuss-Allee 19, 53773 Hennef. Registergericht/Registration Court: Amtsgericht Siegburg (HRB Nr. 9136) Gesch?ftsf?hrer/Managing Directors: J?rgen Zender (Sprecher/Chairman), Anke H?fer _______________________________________________ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems