Re: [Linux-HA] Auto Failback despite location constrain

2011-05-01 Thread Tim Serong
On 4/29/2011 at 08:25 PM, Stallmann, Andreas astallm...@conet.de wrote: 
 Ha! 
  
 It works. But still, there are two strange (side) effects: 
  
 Firstly, mgmt01 still takes over, if it was disconnected from the net for a  
 time shorter than five minutes. I mgmt01 stays disconnected for more than 5  
 min, no auto fallback will happen after it's reconnected again. 
  
 Secondly, when mgmt01 comes back after 5min or more, the resources *will*  
 stay on mgmt01 (good so far), but do *restart* on mgmt02 (and that's equally  
 bad as if the services would fallback, because we run phone conferences on  
 the server and those disconnect on every restart of the resources/services). 

 Any ideas *why* the resources restart and how to keep them from doing so? 

I think I'm confused about the exact sequence of events here.  To verify,
you mean mgmt01 goes offline, mgmt02 takes over resources, mgmt01 comes
back online, no failback occurs, but resources are restarted on mgmt02?
Which resources specifically?

This may not be related, but I noticed you seem to have some redundant order
constraints, I would remove them:

  order apache-after-ip inf: sharedIP web_res
  order nagios-after-apache inf: web_res nagios_res

These are not necessary because they are already implied by this group:

  group nag_grp fs_r0 sharedIP web_res nagios_res ajaxterm

 Any ideas *why* mgmt01 has to stay disconnected for 5min or more to prevent  
 an auto fallback? If the network is flapping for some reason, this would lead 
  
 to flapping services, too, and that's really (really!) not desireable. 

No, not really.  If the scores (ptest -Ls) are the same on both nodes, the
resources should stay where they're already running.

I wonder if your ping rule is involved somehow?

  location only-if-connected nag_grp \
rule $id=only-if-connected-rule -inf: not_defined pingd or pingd lte 1500

Note that -INF scores will always trump any other non-infinity score, see:

http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/ch-constraints.html#s-scores-infinity

Regards,

Tim


  
 -Ursprüngliche Nachricht- 
 Von: linux-ha-boun...@lists.linux-ha.org  
 [mailto:linux-ha-boun...@lists.linux-ha.org] Im Auftrag von Stallmann,  
 Andreas 
 Gesendet: Freitag, 29. April 2011 10:39 
 An: General Linux-HA mailing list 
 Betreff: Re: [Linux-HA] Auto Failback despite location constrain 
  
 Hi! 
  
  If the resource ends up on the non-preferred node, those settings will 
  cause it to have an equal score on both nodes, so it should stay put. 
  If you want to verify, try ptest -Ls to see what scores each resource  
 has. 
 Great, that's the command I was looking for! 
  
 Before the failover the output is: 
  
 group_color: nag_grp allocation score on ipfuie-mgmt01: 100 
 group_color: nag_grp allocation score on ipfuie-mgmt02: 0 
  
 When the nag_grp has failed over to ipfuie-mgmt02 it is: 
  
 group_color: nag_grp allocation score on ipfuie-mgmt01: -INFINITY 
 group_color: nag_grp allocation score on ipfuie-mgmt02: 0 
  
 Strange, isn't it? I would have expected, that the  
 default-resource-stickiness had any influence on the values, but obviously it 
  
 has not. 
 When mgmt01 comes back, we see (pretty soon) again: 
  
 group_color: nag_grp allocation score on ipfuie-mgmt01: 100 
 group_color: nag_grp allocation score on ipfuie-mgmt02: 0 
  
 Thus, the resource fails over to mgmt01 again. Which is not what we  
 intended. 
  
  Anyway, the problem is this constraint: 
  
  location cli-prefer-nag_grp nag_grp \ 
 rule $id=cli-prefer-rule-nag_grp inf: #uname eq ipfuie-mgmt01 and  
 #uname eq ipfuie-mgmt01 
  
 TNX, I shortly thought of applying a vast amount of necessary cruelty on the  
 colleague who did a migrate without the following unmigrate. I unmigrated 
  
 the resources now (the location constrain is gone now), but the result is the 
  
 same; the  resource-stickyness is not taken into account. AAARGH!!!  
 (As Terry Pratchett says: Three exclamation marks, a clear sign of an insane  
 mind... that's were configuring clusters gets me ... ) 
  
 Please help, otherwise I might think of doing something really nasty like,  
 like, like... like for example switching to windows! Ha! ;-) 
  
 Thanks in advance for your ongoing patience with me, 
  
 Andreas 
  
 ___ 
 Linux-HA mailing list 
 Linux-HA@lists.linux-ha.org 
 http://lists.linux-ha.org/mailman/listinfo/linux-ha 
 See also: http://linux-ha.org/ReportingProblems 
  
  
 CONET Solutions GmbH, Theodor-Heuss-Allee 19, 53773 Hennef. 
 Registergericht/Registration Court: Amtsgericht Siegburg (HRB Nr. 9136)  
 Geschäftsführer/Managing Directors: Jürgen Zender (Sprecher/Chairman), Anke  
 Höfer ___ 
 Linux-HA mailing list 
 Linux-HA@lists.linux-ha.org 
 http://lists.linux-ha.org/mailman/listinfo/linux-ha 
 See also: http://linux-ha.org

Re: [Linux-HA] Auto Failback despite location constrain

2011-04-29 Thread Stallmann, Andreas
Ha!

It works. But still, there are two strange (side) effects:

Firstly, mgmt01 still takes over, if it was disconnected from the net for a 
time shorter than five minutes. I mgmt01 stays disconnected for more than 5 
min, no auto fallback will happen after it's reconnected again.

Secondly, when mgmt01 comes back after 5min or more, the resources *will* stay 
on mgmt01 (good so far), but do *restart* on mgmt02 (and that's equally bad as 
if the services would fallback, because we run phone conferences on the server 
and those disconnect on every restart of the resources/services).

Any ideas *why* the resources restart and how to keep them from doing so?

Any ideas *why* mgmt01 has to stay disconnected for 5min or more to prevent an 
auto fallback? If the network is flapping for some reason, this would lead to 
flapping services, too, and that's really (really!) not desireable.

Cheers and thanks again for your support,

Andreas

-Ursprüngliche Nachricht-
Von: linux-ha-boun...@lists.linux-ha.org 
[mailto:linux-ha-boun...@lists.linux-ha.org] Im Auftrag von Stallmann, Andreas
Gesendet: Freitag, 29. April 2011 10:39
An: General Linux-HA mailing list
Betreff: Re: [Linux-HA] Auto Failback despite location constrain

Hi!

 If the resource ends up on the non-preferred node, those settings will
 cause it to have an equal score on both nodes, so it should stay put.
 If you want to verify, try ptest -Ls to see what scores each resource has.
Great, that's the command I was looking for!

Before the failover the output is:

group_color: nag_grp allocation score on ipfuie-mgmt01: 100
group_color: nag_grp allocation score on ipfuie-mgmt02: 0

When the nag_grp has failed over to ipfuie-mgmt02 it is:

group_color: nag_grp allocation score on ipfuie-mgmt01: -INFINITY
group_color: nag_grp allocation score on ipfuie-mgmt02: 0

Strange, isn't it? I would have expected, that the default-resource-stickiness 
had any influence on the values, but obviously it has not.
When mgmt01 comes back, we see (pretty soon) again:

group_color: nag_grp allocation score on ipfuie-mgmt01: 100
group_color: nag_grp allocation score on ipfuie-mgmt02: 0

Thus, the resource fails over to mgmt01 again. Which is not what we intended.

 Anyway, the problem is this constraint:

 location cli-prefer-nag_grp nag_grp \
rule $id=cli-prefer-rule-nag_grp inf: #uname eq ipfuie-mgmt01 and 
#uname eq ipfuie-mgmt01

TNX, I shortly thought of applying a vast amount of necessary cruelty on the 
colleague who did a migrate without the following unmigrate. I unmigrated 
the resources now (the location constrain is gone now), but the result is the 
same; the  resource-stickyness is not taken into account. AAARGH!!! (As 
Terry Pratchett says: Three exclamation marks, a clear sign of an insane 
mind... that's were configuring clusters gets me ... )

Please help, otherwise I might think of doing something really nasty like, 
like, like... like for example switching to windows! Ha! ;-)

Thanks in advance for your ongoing patience with me,

Andreas

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


CONET Solutions GmbH, Theodor-Heuss-Allee 19, 53773 Hennef.
Registergericht/Registration Court: Amtsgericht Siegburg (HRB Nr. 9136) 
Geschäftsführer/Managing Directors: Jürgen Zender (Sprecher/Chairman), Anke 
Höfer ___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


CONET Solutions GmbH, Theodor-Heuss-Allee 19, 53773 Hennef.
Registergericht/Registration Court: Amtsgericht Siegburg (HRB Nr. 9136)
Geschäftsführer/Managing Directors: Jürgen Zender (Sprecher/Chairman), Anke 
Höfer
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Auto Failback despite location constrain

2011-04-28 Thread Tim Serong
On 4/29/2011 at 03:36 AM, Stallmann, Andreas astallm...@conet.de wrote: 
 Hi! 
  
 I configured my nodes *not* to auto failback after a defective node comes  
 back online. This worked nicely for a while, but now it doesn't (and,  
 honestly, I do not know what was changed in the meantime). 
  
 What we do: We disconnect the two (virtual) interfaces of our node mgmt01  
 (running on vmware esxi) by means of the vsphere client. Node mgmt02 takes  
 over the services as it should. When node mgmt01's interfaces are switched on 
  
 again, everything looks alright for a minute or two, but then mgmt01 takes  
 over the resources again. Which it should not. Here's the relevant sniplet of 
  
 the  configuration (full config below): 
  
 location nag_loc nag_grp 100: ipfuie-mgmt01 
 property default-resource-stickiness=100 
  
 I thought, that because the resource-stickiness has the same value as the  
 location constrain, the resources would stick to the node they are started  
 on. Am I wrong? 

If the resource ends up on the non-preferred node, those settings will
cause it to have an equal score on both nodes, so it should stay put.
If you want to verify, try ptest -Ls to see what scores each resource
has.

Anyway, the problem is this constraint:

location cli-prefer-nag_grp nag_grp \
rule $id=cli-prefer-rule-nag_grp inf: #uname eq ipfuie-mgmt01 and 
#uname eq ipfuie-mgmt01

Because that constraint has a score of inf, it'll take precedence.
Probably crm resource move nag_grp ipfuie-mgmt01 was run at some point,
to forcibly move the resource to ipfuie-mgmt01.  That constraint will
persist until you run crm resource unmove nag_grp

Kind of weird that the hostname is listed twice in that rule though...

Regards,

Tim


-- 
Tim Serong tser...@novell.com
Senior Clustering Engineer, OPS Engineering, Novell Inc.



___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems