Re: [Linux-HA] Auto Failback despite location constrain

Tim Serong Sun, 01 May 2011 20:53:43 -0700

On 4/29/2011 at 08:25 PM, "Stallmann, Andreas" <astallm...@conet.de> wrote: 
> Ha! 
>  
> It works. But still, there are two strange (side) effects: 
>  
> Firstly, mgmt01 still takes over, if it was disconnected from the net for a  
> time shorter than five minutes. I mgmt01 stays disconnected for more than 5  
> min, no auto fallback will happen after it's reconnected again. 
>  
> Secondly, when mgmt01 comes back after 5min or more, the resources *will*  
> stay on mgmt01 (good so far), but do *restart* on mgmt02 (and that's equally  
> bad as if the services would fallback, because we run phone conferences on  
> the server and those disconnect on every restart of the resources/services). 
>
> Any ideas *why* the resources restart and how to keep them from doing so?


I think I'm confused about the exact sequence of events here.  To verify,
you mean mgmt01 goes offline, mgmt02 takes over resources, mgmt01 comes
back online, no failback occurs, but resources are restarted on mgmt02?
Which resources specifically?

This may not be related, but I noticed you seem to have some redundant order
constraints, I would remove them:

  order apache-after-ip inf: sharedIP web_res
  order nagios-after-apache inf: web_res nagios_res

These are not necessary because they are already implied by this group:

  group nag_grp fs_r0 sharedIP web_res nagios_res ajaxterm

> Any ideas *why* mgmt01 has to stay disconnected for 5min or more to prevent  
> an auto fallback? If the network is flapping for some reason, this would lead 
>  
> to flapping services, too, and that's really (really!) not desireable. 

No, not really.  If the scores (ptest -Ls) are the same on both nodes, the
resources should stay where they're already running.

I wonder if your ping rule is involved somehow?

  location only-if-connected nag_grp \
    rule $id="only-if-connected-rule" -inf: not_defined pingd or pingd lte 1500

Note that -INF scores will always trump any other non-infinity score, see:

http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/ch-constraints.html#s-scores-infinity

Regards,

Tim


>  
> -----Ursprüngliche Nachricht----- 
> Von: linux-ha-boun...@lists.linux-ha.org  
> [mailto:linux-ha-boun...@lists.linux-ha.org] Im Auftrag von Stallmann,  
> Andreas 
> Gesendet: Freitag, 29. April 2011 10:39 
> An: General Linux-HA mailing list 
> Betreff: Re: [Linux-HA] Auto Failback despite location constrain 
>  
> Hi! 
>  
> > If the resource ends up on the non-preferred node, those settings will 
> > cause it to have an equal score on both nodes, so it should stay put. 
> > If you want to verify, try "ptest -Ls" to see what scores each resource  
> has. 
> Great, that's the command I was looking for! 
>  
> Before the failover the output is: 
>  
> group_color: nag_grp allocation score on ipfuie-mgmt01: 100 
> group_color: nag_grp allocation score on ipfuie-mgmt02: 0 
>  
> When the nag_grp has failed over to ipfuie-mgmt02 it is: 
>  
> group_color: nag_grp allocation score on ipfuie-mgmt01: -INFINITY 
> group_color: nag_grp allocation score on ipfuie-mgmt02: 0 
>  
> Strange, isn't it? I would have expected, that the  
> default-resource-stickiness had any influence on the values, but obviously it 
>  
> has not. 
> When mgmt01 comes back, we see (pretty soon) again: 
>  
> group_color: nag_grp allocation score on ipfuie-mgmt01: 100 
> group_color: nag_grp allocation score on ipfuie-mgmt02: 0 
>  
> Thus, the resource fails over to mgmt01 again. Which is not what we  
> intended. 
>  
> > Anyway, the problem is this constraint: 
>  
> >> location cli-prefer-nag_grp nag_grp \ 
>         rule $id="cli-prefer-rule-nag_grp" inf: #uname eq ipfuie-mgmt01 and  
> #uname eq ipfuie-mgmt01 
>  
> TNX, I shortly thought of applying a vast amount of necessary cruelty on the  
> colleague who did a "migrate" without the following "unmigrate". I unmigrated 
>  
> the resources now (the location constrain is gone now), but the result is the 
>  
> same; the  resource-stickyness is not taken into account. AAAAAAARGHHHHH!!!  
> (As Terry Pratchett says: Three exclamation marks, a clear sign of an insane  
> mind... that's were configuring clusters gets me ... ) 
>  
> Please help, otherwise I might think of doing something really nasty like,  
> like, like... like for example switching to windows! Ha! ;-) 
>  
> Thanks in advance for your ongoing patience with me, 
>  
> Andreas 
>  
> _______________________________________________ 
> Linux-HA mailing list 
> Linux-HA@lists.linux-ha.org 
> http://lists.linux-ha.org/mailman/listinfo/linux-ha 
> See also: http://linux-ha.org/ReportingProblems 
>  
> ------------------------ 
> CONET Solutions GmbH, Theodor-Heuss-Allee 19, 53773 Hennef. 
> Registergericht/Registration Court: Amtsgericht Siegburg (HRB Nr. 9136)  
> Geschäftsführer/Managing Directors: Jürgen Zender (Sprecher/Chairman), Anke  
> Höfer _______________________________________________ 
> Linux-HA mailing list 
> Linux-HA@lists.linux-ha.org 
> http://lists.linux-ha.org/mailman/listinfo/linux-ha 
> See also: http://linux-ha.org/ReportingProblems 
>  
> ------------------------ 
> CONET Solutions GmbH, Theodor-Heuss-Allee 19, 53773 Hennef. 
> Registergericht/Registration Court: Amtsgericht Siegburg (HRB Nr. 9136) 
> Geschäftsführer/Managing Directors: Jürgen Zender (Sprecher/Chairman), Anke  
> Höfer 
> _______________________________________________ 
> Linux-HA mailing list 
> Linux-HA@lists.linux-ha.org 
> http://lists.linux-ha.org/mailman/listinfo/linux-ha 
> See also: http://linux-ha.org/ReportingProblems 




-- 
Tim Serong <tser...@novell.com>
Senior Clustering Engineer, OPS Engineering, Novell Inc.


_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Auto Failback despite location constrain

Reply via email to