Re: [Linux-HA] Auto Failback despite location constrain
On 4/29/2011 at 08:25 PM, Stallmann, Andreas astallm...@conet.de wrote: Ha! It works. But still, there are two strange (side) effects: Firstly, mgmt01 still takes over, if it was disconnected from the net for a time shorter than five minutes. I mgmt01 stays disconnected for more than 5 min, no auto fallback will happen after it's reconnected again. Secondly, when mgmt01 comes back after 5min or more, the resources *will* stay on mgmt01 (good so far), but do *restart* on mgmt02 (and that's equally bad as if the services would fallback, because we run phone conferences on the server and those disconnect on every restart of the resources/services). Any ideas *why* the resources restart and how to keep them from doing so? I think I'm confused about the exact sequence of events here. To verify, you mean mgmt01 goes offline, mgmt02 takes over resources, mgmt01 comes back online, no failback occurs, but resources are restarted on mgmt02? Which resources specifically? This may not be related, but I noticed you seem to have some redundant order constraints, I would remove them: order apache-after-ip inf: sharedIP web_res order nagios-after-apache inf: web_res nagios_res These are not necessary because they are already implied by this group: group nag_grp fs_r0 sharedIP web_res nagios_res ajaxterm Any ideas *why* mgmt01 has to stay disconnected for 5min or more to prevent an auto fallback? If the network is flapping for some reason, this would lead to flapping services, too, and that's really (really!) not desireable. No, not really. If the scores (ptest -Ls) are the same on both nodes, the resources should stay where they're already running. I wonder if your ping rule is involved somehow? location only-if-connected nag_grp \ rule $id=only-if-connected-rule -inf: not_defined pingd or pingd lte 1500 Note that -INF scores will always trump any other non-infinity score, see: http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/ch-constraints.html#s-scores-infinity Regards, Tim -Ursprüngliche Nachricht- Von: linux-ha-boun...@lists.linux-ha.org [mailto:linux-ha-boun...@lists.linux-ha.org] Im Auftrag von Stallmann, Andreas Gesendet: Freitag, 29. April 2011 10:39 An: General Linux-HA mailing list Betreff: Re: [Linux-HA] Auto Failback despite location constrain Hi! If the resource ends up on the non-preferred node, those settings will cause it to have an equal score on both nodes, so it should stay put. If you want to verify, try ptest -Ls to see what scores each resource has. Great, that's the command I was looking for! Before the failover the output is: group_color: nag_grp allocation score on ipfuie-mgmt01: 100 group_color: nag_grp allocation score on ipfuie-mgmt02: 0 When the nag_grp has failed over to ipfuie-mgmt02 it is: group_color: nag_grp allocation score on ipfuie-mgmt01: -INFINITY group_color: nag_grp allocation score on ipfuie-mgmt02: 0 Strange, isn't it? I would have expected, that the default-resource-stickiness had any influence on the values, but obviously it has not. When mgmt01 comes back, we see (pretty soon) again: group_color: nag_grp allocation score on ipfuie-mgmt01: 100 group_color: nag_grp allocation score on ipfuie-mgmt02: 0 Thus, the resource fails over to mgmt01 again. Which is not what we intended. Anyway, the problem is this constraint: location cli-prefer-nag_grp nag_grp \ rule $id=cli-prefer-rule-nag_grp inf: #uname eq ipfuie-mgmt01 and #uname eq ipfuie-mgmt01 TNX, I shortly thought of applying a vast amount of necessary cruelty on the colleague who did a migrate without the following unmigrate. I unmigrated the resources now (the location constrain is gone now), but the result is the same; the resource-stickyness is not taken into account. AAARGH!!! (As Terry Pratchett says: Three exclamation marks, a clear sign of an insane mind... that's were configuring clusters gets me ... ) Please help, otherwise I might think of doing something really nasty like, like, like... like for example switching to windows! Ha! ;-) Thanks in advance for your ongoing patience with me, Andreas ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems CONET Solutions GmbH, Theodor-Heuss-Allee 19, 53773 Hennef. Registergericht/Registration Court: Amtsgericht Siegburg (HRB Nr. 9136) Geschäftsführer/Managing Directors: Jürgen Zender (Sprecher/Chairman), Anke Höfer ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org
Re: [Linux-HA] Auto Failback despite location constrain
Ha! It works. But still, there are two strange (side) effects: Firstly, mgmt01 still takes over, if it was disconnected from the net for a time shorter than five minutes. I mgmt01 stays disconnected for more than 5 min, no auto fallback will happen after it's reconnected again. Secondly, when mgmt01 comes back after 5min or more, the resources *will* stay on mgmt01 (good so far), but do *restart* on mgmt02 (and that's equally bad as if the services would fallback, because we run phone conferences on the server and those disconnect on every restart of the resources/services). Any ideas *why* the resources restart and how to keep them from doing so? Any ideas *why* mgmt01 has to stay disconnected for 5min or more to prevent an auto fallback? If the network is flapping for some reason, this would lead to flapping services, too, and that's really (really!) not desireable. Cheers and thanks again for your support, Andreas -Ursprüngliche Nachricht- Von: linux-ha-boun...@lists.linux-ha.org [mailto:linux-ha-boun...@lists.linux-ha.org] Im Auftrag von Stallmann, Andreas Gesendet: Freitag, 29. April 2011 10:39 An: General Linux-HA mailing list Betreff: Re: [Linux-HA] Auto Failback despite location constrain Hi! If the resource ends up on the non-preferred node, those settings will cause it to have an equal score on both nodes, so it should stay put. If you want to verify, try ptest -Ls to see what scores each resource has. Great, that's the command I was looking for! Before the failover the output is: group_color: nag_grp allocation score on ipfuie-mgmt01: 100 group_color: nag_grp allocation score on ipfuie-mgmt02: 0 When the nag_grp has failed over to ipfuie-mgmt02 it is: group_color: nag_grp allocation score on ipfuie-mgmt01: -INFINITY group_color: nag_grp allocation score on ipfuie-mgmt02: 0 Strange, isn't it? I would have expected, that the default-resource-stickiness had any influence on the values, but obviously it has not. When mgmt01 comes back, we see (pretty soon) again: group_color: nag_grp allocation score on ipfuie-mgmt01: 100 group_color: nag_grp allocation score on ipfuie-mgmt02: 0 Thus, the resource fails over to mgmt01 again. Which is not what we intended. Anyway, the problem is this constraint: location cli-prefer-nag_grp nag_grp \ rule $id=cli-prefer-rule-nag_grp inf: #uname eq ipfuie-mgmt01 and #uname eq ipfuie-mgmt01 TNX, I shortly thought of applying a vast amount of necessary cruelty on the colleague who did a migrate without the following unmigrate. I unmigrated the resources now (the location constrain is gone now), but the result is the same; the resource-stickyness is not taken into account. AAARGH!!! (As Terry Pratchett says: Three exclamation marks, a clear sign of an insane mind... that's were configuring clusters gets me ... ) Please help, otherwise I might think of doing something really nasty like, like, like... like for example switching to windows! Ha! ;-) Thanks in advance for your ongoing patience with me, Andreas ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems CONET Solutions GmbH, Theodor-Heuss-Allee 19, 53773 Hennef. Registergericht/Registration Court: Amtsgericht Siegburg (HRB Nr. 9136) Geschäftsführer/Managing Directors: Jürgen Zender (Sprecher/Chairman), Anke Höfer ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems CONET Solutions GmbH, Theodor-Heuss-Allee 19, 53773 Hennef. Registergericht/Registration Court: Amtsgericht Siegburg (HRB Nr. 9136) Geschäftsführer/Managing Directors: Jürgen Zender (Sprecher/Chairman), Anke Höfer ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Auto Failback despite location constrain
On 4/29/2011 at 03:36 AM, Stallmann, Andreas astallm...@conet.de wrote: Hi! I configured my nodes *not* to auto failback after a defective node comes back online. This worked nicely for a while, but now it doesn't (and, honestly, I do not know what was changed in the meantime). What we do: We disconnect the two (virtual) interfaces of our node mgmt01 (running on vmware esxi) by means of the vsphere client. Node mgmt02 takes over the services as it should. When node mgmt01's interfaces are switched on again, everything looks alright for a minute or two, but then mgmt01 takes over the resources again. Which it should not. Here's the relevant sniplet of the configuration (full config below): location nag_loc nag_grp 100: ipfuie-mgmt01 property default-resource-stickiness=100 I thought, that because the resource-stickiness has the same value as the location constrain, the resources would stick to the node they are started on. Am I wrong? If the resource ends up on the non-preferred node, those settings will cause it to have an equal score on both nodes, so it should stay put. If you want to verify, try ptest -Ls to see what scores each resource has. Anyway, the problem is this constraint: location cli-prefer-nag_grp nag_grp \ rule $id=cli-prefer-rule-nag_grp inf: #uname eq ipfuie-mgmt01 and #uname eq ipfuie-mgmt01 Because that constraint has a score of inf, it'll take precedence. Probably crm resource move nag_grp ipfuie-mgmt01 was run at some point, to forcibly move the resource to ipfuie-mgmt01. That constraint will persist until you run crm resource unmove nag_grp Kind of weird that the hostname is listed twice in that rule though... Regards, Tim -- Tim Serong tser...@novell.com Senior Clustering Engineer, OPS Engineering, Novell Inc. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems