subject:"Re\: \[Linux\-HA\] Auto Failback despite location constrain"

Re: [Linux-HA] Auto Failback despite location constrain

2011-05-01 Thread Tim Serong

On 4/29/2011 at 08:25 PM, Stallmann, Andreas astallm...@conet.de wrote:
Ha!

It works. But still, there are two strange (side) effects:

Firstly, mgmt01 still takes over, if it was disconnected from the net for a
time shorter than five minutes. I mgmt01 stays disconnected for more than 5
min, no auto fallback will happen after it's reconnected again.

Secondly, when mgmt01 comes back after 5min or more, the resources *will*
stay on mgmt01 (good so far), but do *restart* on mgmt02 (and that's equally
bad as if the services would fallback, because we run phone conferences on
the server and those disconnect on every restart of the resources/services).

Any ideas *why* the resources restart and how to keep them from doing so?

I think I'm confused about the exact sequence of events here. To verify,
you mean mgmt01 goes offline, mgmt02 takes over resources, mgmt01 comes
back online, no failback occurs, but resources are restarted on mgmt02?
Which resources specifically?

This may not be related, but I noticed you seem to have some redundant order
constraints, I would remove them:

order apache-after-ip inf: sharedIP web_res
order nagios-after-apache inf: web_res nagios_res

These are not necessary because they are already implied by this group:

group nag_grp fs_r0 sharedIP web_res nagios_res ajaxterm

Any ideas *why* mgmt01 has to stay disconnected for 5min or more to prevent
an auto fallback? If the network is flapping for some reason, this would lead

to flapping services, too, and that's really (really!) not desireable.

No, not really. If the scores (ptest -Ls) are the same on both nodes, the
resources should stay where they're already running.

I wonder if your ping rule is involved somehow?

location only-if-connected nag_grp \
rule $id=only-if-connected-rule -inf: not_defined pingd or pingd lte 1500

Note that -INF scores will always trump any other non-infinity score, see:

http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/ch-constraints.html#s-scores-infinity

Regards,

Tim

-Ursprüngliche Nachricht-
Von: linux-ha-boun...@lists.linux-ha.org
[mailto:linux-ha-boun...@lists.linux-ha.org] Im Auftrag von Stallmann,
Andreas
Gesendet: Freitag, 29. April 2011 10:39
An: General Linux-HA mailing list
Betreff: Re: [Linux-HA] Auto Failback despite location constrain

Hi!

If the resource ends up on the non-preferred node, those settings will
cause it to have an equal score on both nodes, so it should stay put.
If you want to verify, try ptest -Ls to see what scores each resource
has.
Great, that's the command I was looking for!

Before the failover the output is:

group_color: nag_grp allocation score on ipfuie-mgmt01: 100
group_color: nag_grp allocation score on ipfuie-mgmt02: 0

When the nag_grp has failed over to ipfuie-mgmt02 it is:

group_color: nag_grp allocation score on ipfuie-mgmt01: -INFINITY
group_color: nag_grp allocation score on ipfuie-mgmt02: 0

Strange, isn't it? I would have expected, that the
default-resource-stickiness had any influence on the values, but obviously it

has not.
When mgmt01 comes back, we see (pretty soon) again:

group_color: nag_grp allocation score on ipfuie-mgmt01: 100
group_color: nag_grp allocation score on ipfuie-mgmt02: 0

Thus, the resource fails over to mgmt01 again. Which is not what we
intended.

Anyway, the problem is this constraint:

location cli-prefer-nag_grp nag_grp \
rule $id=cli-prefer-rule-nag_grp inf: #uname eq ipfuie-mgmt01 and
#uname eq ipfuie-mgmt01

TNX, I shortly thought of applying a vast amount of necessary cruelty on the
colleague who did a migrate without the following unmigrate. I unmigrated

the resources now (the location constrain is gone now), but the result is the

same; the resource-stickyness is not taken into account. AAARGH!!!
(As Terry Pratchett says: Three exclamation marks, a clear sign of an insane
mind... that's were configuring clusters gets me ... )

Please help, otherwise I might think of doing something really nasty like,
like, like... like for example switching to windows! Ha! ;-)

Thanks in advance for your ongoing patience with me,

Andreas

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

CONET Solutions GmbH, Theodor-Heuss-Allee 19, 53773 Hennef.
Registergericht/Registration Court: Amtsgericht Siegburg (HRB Nr. 9136)
Geschäftsführer/Managing Directors: Jürgen Zender (Sprecher/Chairman), Anke
Höfer ___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org

Re: [Linux-HA] Auto Failback despite location constrain

2011-04-29 Thread Stallmann, Andreas

Ha!

It works. But still, there are two strange (side) effects:

Firstly, mgmt01 still takes over, if it was disconnected from the net for a 
time shorter than five minutes. I mgmt01 stays disconnected for more than 5 
min, no auto fallback will happen after it's reconnected again.

Secondly, when mgmt01 comes back after 5min or more, the resources *will* stay 
on mgmt01 (good so far), but do *restart* on mgmt02 (and that's equally bad as 
if the services would fallback, because we run phone conferences on the server 
and those disconnect on every restart of the resources/services).

Any ideas *why* the resources restart and how to keep them from doing so?

Any ideas *why* mgmt01 has to stay disconnected for 5min or more to prevent an 
auto fallback? If the network is flapping for some reason, this would lead to 
flapping services, too, and that's really (really!) not desireable.

Cheers and thanks again for your support,

Andreas

-Ursprüngliche Nachricht-
Von: linux-ha-boun...@lists.linux-ha.org 
[mailto:linux-ha-boun...@lists.linux-ha.org] Im Auftrag von Stallmann, Andreas
Gesendet: Freitag, 29. April 2011 10:39
An: General Linux-HA mailing list
Betreff: Re: [Linux-HA] Auto Failback despite location constrain

Hi!

 If the resource ends up on the non-preferred node, those settings will
 cause it to have an equal score on both nodes, so it should stay put.
 If you want to verify, try ptest -Ls to see what scores each resource has.
Great, that's the command I was looking for!

Before the failover the output is:

group_color: nag_grp allocation score on ipfuie-mgmt01: 100
group_color: nag_grp allocation score on ipfuie-mgmt02: 0

When the nag_grp has failed over to ipfuie-mgmt02 it is:

group_color: nag_grp allocation score on ipfuie-mgmt01: -INFINITY
group_color: nag_grp allocation score on ipfuie-mgmt02: 0

Strange, isn't it? I would have expected, that the default-resource-stickiness 
had any influence on the values, but obviously it has not.
When mgmt01 comes back, we see (pretty soon) again:

group_color: nag_grp allocation score on ipfuie-mgmt01: 100
group_color: nag_grp allocation score on ipfuie-mgmt02: 0

Thus, the resource fails over to mgmt01 again. Which is not what we intended.

 Anyway, the problem is this constraint:

 location cli-prefer-nag_grp nag_grp \
rule $id=cli-prefer-rule-nag_grp inf: #uname eq ipfuie-mgmt01 and 
#uname eq ipfuie-mgmt01

TNX, I shortly thought of applying a vast amount of necessary cruelty on the 
colleague who did a migrate without the following unmigrate. I unmigrated 
the resources now (the location constrain is gone now), but the result is the 
same; the  resource-stickyness is not taken into account. AAARGH!!! (As 
Terry Pratchett says: Three exclamation marks, a clear sign of an insane 
mind... that's were configuring clusters gets me ... )

Please help, otherwise I might think of doing something really nasty like, 
like, like... like for example switching to windows! Ha! ;-)

Thanks in advance for your ongoing patience with me,

Andreas

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


CONET Solutions GmbH, Theodor-Heuss-Allee 19, 53773 Hennef.
Registergericht/Registration Court: Amtsgericht Siegburg (HRB Nr. 9136) 
Geschäftsführer/Managing Directors: Jürgen Zender (Sprecher/Chairman), Anke 
Höfer ___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


CONET Solutions GmbH, Theodor-Heuss-Allee 19, 53773 Hennef.
Registergericht/Registration Court: Amtsgericht Siegburg (HRB Nr. 9136)
Geschäftsführer/Managing Directors: Jürgen Zender (Sprecher/Chairman), Anke 
Höfer
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Auto Failback despite location constrain

2011-04-28 Thread Tim Serong

On 4/29/2011 at 03:36 AM, Stallmann, Andreas astallm...@conet.de wrote: 
 Hi! 
  
 I configured my nodes *not* to auto failback after a defective node comes  
 back online. This worked nicely for a while, but now it doesn't (and,  
 honestly, I do not know what was changed in the meantime). 
  
 What we do: We disconnect the two (virtual) interfaces of our node mgmt01  
 (running on vmware esxi) by means of the vsphere client. Node mgmt02 takes  
 over the services as it should. When node mgmt01's interfaces are switched on 
  
 again, everything looks alright for a minute or two, but then mgmt01 takes  
 over the resources again. Which it should not. Here's the relevant sniplet of 
  
 the  configuration (full config below): 
  
 location nag_loc nag_grp 100: ipfuie-mgmt01 
 property default-resource-stickiness=100 
  
 I thought, that because the resource-stickiness has the same value as the  
 location constrain, the resources would stick to the node they are started  
 on. Am I wrong? 

If the resource ends up on the non-preferred node, those settings will
cause it to have an equal score on both nodes, so it should stay put.
If you want to verify, try ptest -Ls to see what scores each resource
has.

Anyway, the problem is this constraint:

location cli-prefer-nag_grp nag_grp \
rule $id=cli-prefer-rule-nag_grp inf: #uname eq ipfuie-mgmt01 and 
#uname eq ipfuie-mgmt01

Because that constraint has a score of inf, it'll take precedence.
Probably crm resource move nag_grp ipfuie-mgmt01 was run at some point,
to forcibly move the resource to ipfuie-mgmt01.  That constraint will
persist until you run crm resource unmove nag_grp

Kind of weird that the hostname is listed twice in that rule though...

Regards,

Tim


-- 
Tim Serong tser...@novell.com
Senior Clustering Engineer, OPS Engineering, Novell Inc.



___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Auto Failback despite location constrain

Re: [Linux-HA] Auto Failback despite location constrain

Re: [Linux-HA] Auto Failback despite location constrain

3 matches

Site Navigation

Mail list logo

Footer information