Re: [Pacemaker] PingD Failure-Timeout

2009-05-27 Thread Andrew Beekhof
On Tue, May 26, 2009 at 3:22 PM, Eliot Gable  wrote:
> I am using 1.0.3, but the failure-timeout thing does not seem to work for 
> pingd.
>

You'll have to show us the rest of your configuration

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] PingD Failure-Timeout

2009-05-26 Thread Eliot Gable
I am using 1.0.3, but the failure-timeout thing does not seem to work for pingd.

Eliot Gable
Senior Engineer
1228 Euclid Ave, Suite 390
Cleveland, OH 44115

Direct: 216-373-4808
Fax: 216-373-4657
ega...@broadvox.net


CONFIDENTIAL COMMUNICATION.  This e-mail and any files transmitted with it are 
confidential and are intended solely for the use of the individual or entity to 
whom it is addressed. If you are not the intended recipient, please call me 
immediately.  BROADVOX is a registered trademark of Broadvox, LLC.


-Original Message-
From: Andrew Beekhof [mailto:and...@beekhof.net]
Sent: Monday, May 25, 2009 11:49 AM
To: pacemaker@oss.clusterlabs.org
Subject: Re: [Pacemaker] PingD Failure-Timeout

On Thu, May 21, 2009 at 10:20 PM, Eliot Gable  wrote:
> Is there a way to time-out the failure of PingD?

Yes, but you need version >= 1.0.0
I assume you're not running it as a clone right?

>
>
>
> In my configuration, I cannot run PingD all the time on every node. Only one
> node (the master) has public Internet access. I use PingD to cause the
> master to fail-over to one of the slaves. When a slave becomes master, it
> then gains public Internet connectivity. When it is a slave, the entire
> interface is down, so not even the gateway is reachable. So, I set up a
> PingD resource that is co-located with the master resource in the Master
> state. I also set up constraints that assign a -1000 score to a node for
> each resource if that node loses connectivity to the gateway. The result is
> that if I firewall off ICMP on the master, it correctly fails over to a
> slave. Then, it runs a stop on the master, as expected since it has a -1000
> score. The result is that my master resource runs as Master on the node that
> was the slave, and is Stopped on the node that was the master. However, it
> is still stuck with a -1000 score, and will never restart on the node that
> was the master until PingD thinks it has connectivity back. But that won't
> happen because PingD no longer runs on that node since the interface is down
> on it and it won't see anything if it did.
>
>
>
> I set a failure-timeout on the PingD resource, but it does not seem to do
> anything. Running 'crm_verify -V -L 2>&1 | less' shows that the -1000
> score stays there, even well past the failure-timeout.
>
>
>
> Anybody have any suggestions how I can automatically clear that -1000 score
> after a certain (small) interval of time?
>
>
>
>
>
> Eliot Gable
> Senior Engineer
> 1228 Euclid Ave, Suite 390
> Cleveland, OH 44115
>
> Direct: 216-373-4808
> Fax: 216-373-4657
> ega...@broadvox.net
>
> CONFIDENTIAL COMMUNICATION.  This e-mail and any files transmitted with it
> are confidential and are intended solely for the use of the individual or
> entity to whom it is addressed. If you are not the intended recipient,
> please call me immediately.  BROADVOX is a registered trademark of Broadvox,
> LLC.
>
>
>
> 
> CONFIDENTIAL. This e-mail and any attached files are confidential and should
> be destroyed and/or returned if you are not the intended and proper
> recipient.
>
> ___
> Pacemaker mailing list
> Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
>

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

CONFIDENTIAL.  This e-mail and any attached files are confidential and should 
be destroyed and/or returned if you are not the intended and proper recipient.

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] PingD Failure-Timeout

2009-05-25 Thread Andrew Beekhof
On Thu, May 21, 2009 at 10:20 PM, Eliot Gable  wrote:
> Is there a way to time-out the failure of PingD?

Yes, but you need version >= 1.0.0
I assume you're not running it as a clone right?

>
>
>
> In my configuration, I cannot run PingD all the time on every node. Only one
> node (the master) has public Internet access. I use PingD to cause the
> master to fail-over to one of the slaves. When a slave becomes master, it
> then gains public Internet connectivity. When it is a slave, the entire
> interface is down, so not even the gateway is reachable. So, I set up a
> PingD resource that is co-located with the master resource in the Master
> state. I also set up constraints that assign a -1000 score to a node for
> each resource if that node loses connectivity to the gateway. The result is
> that if I firewall off ICMP on the master, it correctly fails over to a
> slave. Then, it runs a stop on the master, as expected since it has a -1000
> score. The result is that my master resource runs as Master on the node that
> was the slave, and is Stopped on the node that was the master. However, it
> is still stuck with a -1000 score, and will never restart on the node that
> was the master until PingD thinks it has connectivity back. But that won’t
> happen because PingD no longer runs on that node since the interface is down
> on it and it won’t see anything if it did.
>
>
>
> I set a failure-timeout on the PingD resource, but it does not seem to do
> anything. Running ‘crm_verify –V –L 2>&1 | less’ shows that the -1000
> score stays there, even well past the failure-timeout.
>
>
>
> Anybody have any suggestions how I can automatically clear that -1000 score
> after a certain (small) interval of time?
>
>
>
>
>
> Eliot Gable
> Senior Engineer
> 1228 Euclid Ave, Suite 390
> Cleveland, OH 44115
>
> Direct: 216-373-4808
> Fax: 216-373-4657
> ega...@broadvox.net
>
> CONFIDENTIAL COMMUNICATION.  This e-mail and any files transmitted with it
> are confidential and are intended solely for the use of the individual or
> entity to whom it is addressed. If you are not the intended recipient,
> please call me immediately.  BROADVOX is a registered trademark of Broadvox,
> LLC.
>
>
>
> 
> CONFIDENTIAL. This e-mail and any attached files are confidential and should
> be destroyed and/or returned if you are not the intended and proper
> recipient.
>
> ___
> Pacemaker mailing list
> Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
>

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] PingD Failure-Timeout

2009-05-21 Thread Eliot Gable
I have tried several things with resource constraints and roles (Master, Slave, 
and Started), but cannot seem to get both of these conditions to be met:


-  When the master pingd fails, set a negative score to force failing 
over to the other node

-  When the failover is complete set a positive score to allow the old 
master to be able to become master again

Alternatively, I have tried to fence on a pingd failure, but cannot seem to get 
that to work. All I have is suicide and ssh set up for stonith resources. This 
is my pingd resource:

  

  
  
  
  


  
  

  

The stonith resources correctly fence on a failure of the stop action on a 
resource.

Any suggestions?

Eliot Gable
Senior Engineer
1228 Euclid Ave, Suite 390
Cleveland, OH 44115

Direct: 216-373-4808
Fax: 216-373-4657
ega...@broadvox.net<mailto:ega...@broadvox.net>

[cid:image001.gif@01C9DA3D.AABFDA50]
CONFIDENTIAL COMMUNICATION.  This e-mail and any files transmitted with it are 
confidential and are intended solely for the use of the individual or entity to 
whom it is addressed. If you are not the intended recipient, please call me 
immediately.  BROADVOX is a registered trademark of Broadvox, LLC.

From: Eliot Gable [mailto:ega...@broadvox.net]
Sent: Thursday, May 21, 2009 4:20 PM
To: pacemaker@oss.clusterlabs.org
Subject: [Pacemaker] PingD Failure-Timeout

Is there a way to time-out the failure of PingD?

In my configuration, I cannot run PingD all the time on every node. Only one 
node (the master) has public Internet access. I use PingD to cause the master 
to fail-over to one of the slaves. When a slave becomes master, it then gains 
public Internet connectivity. When it is a slave, the entire interface is down, 
so not even the gateway is reachable. So, I set up a PingD resource that is 
co-located with the master resource in the Master state. I also set up 
constraints that assign a -1000 score to a node for each resource if that node 
loses connectivity to the gateway. The result is that if I firewall off ICMP on 
the master, it correctly fails over to a slave. Then, it runs a stop on the 
master, as expected since it has a -1000 score. The result is that my master 
resource runs as Master on the node that was the slave, and is Stopped on the 
node that was the master. However, it is still stuck with a -1000 score, and 
will never restart on the node that was the master until PingD thinks it has 
connectivity back. But that won't happen because PingD no longer runs on that 
node since the interface is down on it and it won't see anything if it did.

I set a failure-timeout on the PingD resource, but it does not seem to do 
anything. Running 'crm_verify -V -L 2>&1 | less' shows that the -1000 score 
stays there, even well past the failure-timeout.

Anybody have any suggestions how I can automatically clear that -1000 score 
after a certain (small) interval of time?


Eliot Gable
Senior Engineer
1228 Euclid Ave, Suite 390
Cleveland, OH 44115

Direct: 216-373-4808
Fax: 216-373-4657
ega...@broadvox.net<mailto:ega...@broadvox.net>

[cid:image001.gif@01C9DA3D.AABFDA50]
CONFIDENTIAL COMMUNICATION.  This e-mail and any files transmitted with it are 
confidential and are intended solely for the use of the individual or entity to 
whom it is addressed. If you are not the intended recipient, please call me 
immediately.  BROADVOX is a registered trademark of Broadvox, LLC.



CONFIDENTIAL. This e-mail and any attached files are confidential and should be 
destroyed and/or returned if you are not the intended and proper recipient.


CONFIDENTIAL. This e-mail and any attached files are confidential and should be 
destroyed and/or returned if you are not the intended and proper recipient.
<>___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


[Pacemaker] PingD Failure-Timeout

2009-05-21 Thread Eliot Gable
Is there a way to time-out the failure of PingD?

In my configuration, I cannot run PingD all the time on every node. Only one 
node (the master) has public Internet access. I use PingD to cause the master 
to fail-over to one of the slaves. When a slave becomes master, it then gains 
public Internet connectivity. When it is a slave, the entire interface is down, 
so not even the gateway is reachable. So, I set up a PingD resource that is 
co-located with the master resource in the Master state. I also set up 
constraints that assign a -1000 score to a node for each resource if that node 
loses connectivity to the gateway. The result is that if I firewall off ICMP on 
the master, it correctly fails over to a slave. Then, it runs a stop on the 
master, as expected since it has a -1000 score. The result is that my master 
resource runs as Master on the node that was the slave, and is Stopped on the 
node that was the master. However, it is still stuck with a -1000 score, and 
will never restart on the node that was the master until PingD thinks it has 
connectivity back. But that won't happen because PingD no longer runs on that 
node since the interface is down on it and it won't see anything if it did.

I set a failure-timeout on the PingD resource, but it does not seem to do 
anything. Running 'crm_verify -V -L 2>&1 | less' shows that the -1000 score 
stays there, even well past the failure-timeout.

Anybody have any suggestions how I can automatically clear that -1000 score 
after a certain (small) interval of time?


Eliot Gable
Senior Engineer
1228 Euclid Ave, Suite 390
Cleveland, OH 44115

Direct: 216-373-4808
Fax: 216-373-4657
ega...@broadvox.net

[cid:image001.gif@01C9DA2F.800EEE30]
CONFIDENTIAL COMMUNICATION.  This e-mail and any files transmitted with it are 
confidential and are intended solely for the use of the individual or entity to 
whom it is addressed. If you are not the intended recipient, please call me 
immediately.  BROADVOX is a registered trademark of Broadvox, LLC.



CONFIDENTIAL. This e-mail and any attached files are confidential and should be 
destroyed and/or returned if you are not the intended and proper recipient.
<>___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker