Re: [ClusterLabs] fence_apc delay?

2016-09-06 Thread Ken Gaillot
On 09/06/2016 11:44 AM, Dan Swartzendruber wrote:
> On 2016-09-06 10:59, Ken Gaillot wrote:
>> On 09/05/2016 09:38 AM, Marek Grac wrote:
>>> Hi,
>>>
> 
> [snip]
> 
>> FYI, no special configuration is needed for this with recent pacemaker
>> versions. If multiple devices are listed in a topology level, pacemaker
>> will automatically convert reboot requests into all-off-then-all-on.
> 
> Hmmm, thinking about this some more, this just puts me back in the
> current situation (e.g. having an 'extra' delay.)  The issue for me
> would be having two fencing devices, each of which needs a brief delay
> to let its target's PS drain.  If a single PDU fencing agent does this
> (with proposed change):
> 
> power-off
> wait N seconds
> power-on
> 
> that is cool.  Unfortunately, with the all-off-then-all-on pacemaker
> would do, I would get this:
> 
> power-off node A
> wait N seconds
> power-off node B
> wait N seconds
> power-on node A
> power-on node B
> 
> or am I missing something?  If not, seems like it would be nice to have
> some sort of delay at the pacemaker level.  e.g. tell pacemaker to
> convert a reboot of node A into a 'turn off node A, wait N seconds, turn
> on node A'?

You're exactly right. Pacemaker does seem like the appropriate place to
handle this, but it would be a good bit of work. I think the best
workaround for now would be to set the delay only on the B device.

I do see now why power-wait, as a fence agent property, is not ideal for
this purpose: one fence device might be used with multiple nodes, yet
the ideal delay might vary by node (if they have different power supply
models, for example).

On the other hand, setting it as a node attribute isn't right either,
because one node might be fenceable by multiple devices, and the delay
might not be appropriate for all of them.

We'd need to specify the delay per node/device combination -- something
like pcmk_off_delay=node1:3;node2:5 as an (ugly) fence device property.

It would be a significant project. If you think it's important, please
open a feature request at bugs.clusterlabs.org.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] fence_apc delay?

2016-09-06 Thread Jan Pokorný
On 06/09/16 10:35 -0500, Ken Gaillot wrote:
> On 09/06/2016 10:20 AM, Dan Swartzendruber wrote:
>> On 2016-09-06 10:59, Ken Gaillot wrote:
>> 
>> [snip]
>> 
>>> I thought power-wait was intended for this situation, where the node's
>>> power supply can survive a brief outage, so a delay is needed to ensure
>>> it drains. In any case, I know people are using it for that.
>>> 
>>> Are there any drawbacks to using power-wait for this purpose, even if
>>> that wasn't its original intent? Is it just that the "on" will get the
>>> delay as well?
>> 
>> I can't speak to the first part of your question, but for me the second
>> part is a definite YES.  The issue is that I want a long enough delay to
>> be sure the host is D E A D and not writing to the pool anymore; but
>> that delay is now multiplied by 2, and if it gets "too long", vsphere
>> guests can start getting disk I/O errors...
> 
> Ah, Marek's suggestions are the best way out, then. Fence agents are
> usually simple shell scripts, so adding a power-wait-off option
> shouldn't be difficult.

Little correction, they are almost exclusively _Python_ scripts, but
that doesn't change much.  Just a basic understanding of fencing
library (part of fence-agents) and perhaps its slight modification
will be needed.

-- 
Jan (Poki)


pgpCy2FTeFoAF.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] fence_apc delay?

2016-09-06 Thread Dan Swartzendruber

On 2016-09-06 10:59, Ken Gaillot wrote:

On 09/05/2016 09:38 AM, Marek Grac wrote:

Hi,



[snip]


FYI, no special configuration is needed for this with recent pacemaker
versions. If multiple devices are listed in a topology level, pacemaker
will automatically convert reboot requests into all-off-then-all-on.


Hmmm, thinking about this some more, this just puts me back in the 
current situation (e.g. having an 'extra' delay.)  The issue for me 
would be having two fencing devices, each of which needs a brief delay 
to let its target's PS drain.  If a single PDU fencing agent does this 
(with proposed change):


power-off
wait N seconds
power-on

that is cool.  Unfortunately, with the all-off-then-all-on pacemaker 
would do, I would get this:


power-off node A
wait N seconds
power-off node B
wait N seconds
power-on node A
power-on node B

or am I missing something?  If not, seems like it would be nice to have 
some sort of delay at the pacemaker level.  e.g. tell pacemaker to 
convert a reboot of node A into a 'turn off node A, wait N seconds, turn 
on node A'?


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] fence_apc delay?

2016-09-06 Thread Ken Gaillot
On 09/06/2016 10:20 AM, Dan Swartzendruber wrote:
> On 2016-09-06 10:59, Ken Gaillot wrote:
> 
> [snip]
> 
>> I thought power-wait was intended for this situation, where the node's
>> power supply can survive a brief outage, so a delay is needed to ensure
>> it drains. In any case, I know people are using it for that.
>>
>> Are there any drawbacks to using power-wait for this purpose, even if
>> that wasn't its original intent? Is it just that the "on" will get the
>> delay as well?
> 
> I can't speak to the first part of your question, but for me the second
> part is a definite YES.  The issue is that I want a long enough delay to
> be sure the host is D E A D and not writing to the pool anymore; but
> that delay is now multiplied by 2, and if it gets "too long", vsphere
> guests can start getting disk I/O errors...

Ah, Marek's suggestions are the best way out, then. Fence agents are
usually simple shell scripts, so adding a power-wait-off option
shouldn't be difficult.

>>> *) Configure fence device to not use reboot but OFF, ON
>>> Very same to the situation when there are multiple power circuits; you
>>> have to switch them all OFF and afterwards turn them ON.
>>
>> FYI, no special configuration is needed for this with recent pacemaker
>> versions. If multiple devices are listed in a topology level, pacemaker
>> will automatically convert reboot requests into all-off-then-all-on.
> 
> My understanding was that applied to 1.1.14?  My CentOS 7 host has
> pacemaker 1.1.13 :(

Correct -- but most OS distributions, including CentOS, backport
specific bugfixes and features from later versions. In this case, as
long as you've applied updates (pacemaker-1.1.13-10 or later), you've
got it.


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] fence_apc delay?

2016-09-06 Thread Dan Swartzendruber

On 2016-09-06 10:59, Ken Gaillot wrote:

[snip]


I thought power-wait was intended for this situation, where the node's
power supply can survive a brief outage, so a delay is needed to ensure
it drains. In any case, I know people are using it for that.

Are there any drawbacks to using power-wait for this purpose, even if
that wasn't its original intent? Is it just that the "on" will get the
delay as well?


I can't speak to the first part of your question, but for me the second 
part is a definite YES.  The issue is that I want a long enough delay to 
be sure the host is D E A D and not writing to the pool anymore; but 
that delay is now multiplied by 2, and if it gets "too long", vsphere 
guests can start getting disk I/O errors...



*) Configure fence device to not use reboot but OFF, ON
Very same to the situation when there are multiple power circuits; you
have to switch them all OFF and afterwards turn them ON.


FYI, no special configuration is needed for this with recent pacemaker
versions. If multiple devices are listed in a topology level, pacemaker
will automatically convert reboot requests into all-off-then-all-on.


My understanding was that applied to 1.1.14?  My CentOS 7 host has 
pacemaker 1.1.13 :(


[snip]


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] fence_apc delay?

2016-09-06 Thread Ken Gaillot
On 09/05/2016 09:38 AM, Marek Grac wrote:
> Hi,
> 
> On Mon, Sep 5, 2016 at 3:46 PM, Dan Swartzendruber  > wrote:
> 
> ...
> Marek, thanks.  I have tested repeatedly (8 or so times with disk
> writes in progress) with 5-7 seconds and have had no corruption.  My
> only issue with using power_wait here (possibly I am
> misunderstanding this) is that the default action is 'reboot' which
> I *think* is 'power off, then power on'.  e.g. two operations to the
> fencing device.  The only place I need a delay though, is after the
> power off operation - doing so after power on is just wasted time
> that the resource is offline before the other node takes it over. 
> Am I misunderstanding this?  Thanks!
> 
> 
> You are right. Default sequence for reboot is:
> 
> get status, power off, delay(power-wait), get status [repeat until OFF],
> power on, delay(power-wait), get status [repeat until ON].
> 
> The power-wait was introduced because some devices respond with strange
> values when they are asked too soon after power change. It was not
> intended to be used in a way that you propose. Possible solutions:

I thought power-wait was intended for this situation, where the node's
power supply can survive a brief outage, so a delay is needed to ensure
it drains. In any case, I know people are using it for that.

Are there any drawbacks to using power-wait for this purpose, even if
that wasn't its original intent? Is it just that the "on" will get the
delay as well?

> *) Configure fence device to not use reboot but OFF, ON
> Very same to the situation when there are multiple power circuits; you
> have to switch them all OFF and afterwards turn them ON.

FYI, no special configuration is needed for this with recent pacemaker
versions. If multiple devices are listed in a topology level, pacemaker
will automatically convert reboot requests into all-off-then-all-on.

> *) Add a new option power-wait-off that will be used only in OFF case
> (and will override power-wait). It should be quite easy to do. Just,
> send us PR.
> 
> m,

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] fence_apc delay?

2016-09-05 Thread Marek Grac
Hi,

On Mon, Sep 5, 2016 at 3:46 PM, Dan Swartzendruber 
wrote:

> ...
> Marek, thanks.  I have tested repeatedly (8 or so times with disk writes
> in progress) with 5-7 seconds and have had no corruption.  My only issue
> with using power_wait here (possibly I am misunderstanding this) is that
> the default action is 'reboot' which I *think* is 'power off, then power
> on'.  e.g. two operations to the fencing device.  The only place I need a
> delay though, is after the power off operation - doing so after power on is
> just wasted time that the resource is offline before the other node takes
> it over.  Am I misunderstanding this?  Thanks!
>

You are right. Default sequence for reboot is:

get status, power off, delay(power-wait), get status [repeat until OFF],
power on, delay(power-wait), get status [repeat until ON].

The power-wait was introduced because some devices respond with strange
values when they are asked too soon after power change. It was not intended
to be used in a way that you propose. Possible solutions:

*) Configure fence device to not use reboot but OFF, ON
Very same to the situation when there are multiple power circuits; you have
to switch them all OFF and afterwards turn them ON.

*) Add a new option power-wait-off that will be used only in OFF case (and
will override power-wait). It should be quite easy to do. Just, send us PR.

m,
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] fence_apc delay?

2016-09-03 Thread Dan Swartzendruber

On 2016-09-03 08:41, Marek Grac wrote:

Hi,

There are two problems mentioned in the email.

1) power-wait

Power-wait is a quite advanced option and there are only few fence
devices/agent where it makes sense. And only because the HW/firmware
on the device is somewhat broken. Basically, when we execute power
ON/OFF operation, we wait for power-wait seconds before we send next
command. I don't remember any issue with APC and this kind of
problems.

2) the only theory I could come up with was that maybe the fencing
operation was considered complete too quickly?

That is virtually not possible. Even when power ON/OFF is
asynchronous, we test status of device and fence agent wait until
status of the plug/VM/... matches what user wants.


I think you misunderstood my point (possibly I wasn't clear.)  Not 
saying anything is wrong with either the fencing agent or the PDU, 
rather, my theory is that if the agent flips the power off, then back 
on, if the interval it is off is 'too short', possibly a host like the 
R905 can continue to operate for a couple of seconds, continuing to 
write data to the disks past the point where the other node begins to do 
likewise.  If power_wait is not the right way to wait, say, 10 seconds 
to make 100% sure node A is dead as a doornail, what *is* the right way?


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] fence_apc delay?

2016-09-03 Thread Marek Grac
Hi,

There are two problems mentioned in the email.

1) power-wait

Power-wait is a quite advanced option and there are only few fence
devices/agent where it makes sense. And only because the HW/firmware on the
device is somewhat broken. Basically, when we execute power ON/OFF
operation, we wait for power-wait seconds before we send next command. I
don't remember any issue with APC and this kind of problems.


2) the only theory I could come up with was that maybe the fencing
operation was considered complete too quickly?

That is virtually not possible. Even when power ON/OFF is asynchronous, we
test status of device and fence agent wait until status of the plug/VM/...
matches what user wants.


m,


On Fri, Sep 2, 2016 at 3:14 PM, Dan Swartzendruber 
wrote:

>
> So, I was testing my ZFS dual-head JBOD 2-node cluster.  Manual failovers
> worked just fine.  I then went to try an acid-test by logging in to node A
> and doing 'systemctl stop network'.  Sure enough, pacemaker told the APC
> fencing agent to power-cycle node A.  The ZFS pool moved to node B as
> expected.  As soon as node A was back up, I migrated the pool/IP back to
> node A.  I *thought* all was okay, until a bit later, I did 'zpool status',
> and saw checksum errors on both sides of several of the vdevs.  After much
> digging and poking, the only theory I could come up with was that maybe the
> fencing operation was considered complete too quickly?  I googled for
> examples using this, and the best tutorial I found showed using a
> power-wait=5, whereas the default seems to be power-wait=0?  (this is
> CentOS 7, btw...)  I changed it to use 5 instead of 0, and did a several
> fencing operations while a guest VM (vsphere via NFS) was writing to the
> pool.  So far, no evidence of corruption.  BTW, the way I was creating and
> managing the cluster was with the lcmc java gui.  Possibly the power-wait
> default of 0 comes from there, I can't really tell.  Any thoughts or ideas
> appreciated :)
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] fence_apc delay?

2016-09-02 Thread Dan Swartzendruber

On 2016-09-02 10:09, Ken Gaillot wrote:

On 09/02/2016 08:14 AM, Dan Swartzendruber wrote:


So, I was testing my ZFS dual-head JBOD 2-node cluster.  Manual
failovers worked just fine.  I then went to try an acid-test by 
logging
in to node A and doing 'systemctl stop network'.  Sure enough, 
pacemaker
told the APC fencing agent to power-cycle node A.  The ZFS pool moved 
to

node B as expected.  As soon as node A was back up, I migrated the
pool/IP back to node A.  I *thought* all was okay, until a bit later, 
I
did 'zpool status', and saw checksum errors on both sides of several 
of
the vdevs.  After much digging and poking, the only theory I could 
come
up with was that maybe the fencing operation was considered complete 
too

quickly?  I googled for examples using this, and the best tutorial I
found showed using a power-wait=5, whereas the default seems to be
power-wait=0?  (this is CentOS 7, btw...)  I changed it to use 5 
instead


That's a reasonable theory -- that's why power_wait is available. It
would be nice if there were a page collecting users' experience with 
the

ideal power_wait for various devices. Even better if fence-agents used
those values as the defaults.


Ken, thanks.  FWIW, this is a Dell Poweredge R905.  I have no idea how 
long the power supplies in that thing can keep things going when A/C 
goes away.  Always wary of small sample sizes, but I got filesystem 
corruption after 1 fencing event with power_wait=0, and none after 3 
fencing events with power_wait=5.




___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] fence_apc delay?

2016-09-02 Thread Ken Gaillot
On 09/02/2016 08:14 AM, Dan Swartzendruber wrote:
> 
> So, I was testing my ZFS dual-head JBOD 2-node cluster.  Manual
> failovers worked just fine.  I then went to try an acid-test by logging
> in to node A and doing 'systemctl stop network'.  Sure enough, pacemaker
> told the APC fencing agent to power-cycle node A.  The ZFS pool moved to
> node B as expected.  As soon as node A was back up, I migrated the
> pool/IP back to node A.  I *thought* all was okay, until a bit later, I
> did 'zpool status', and saw checksum errors on both sides of several of
> the vdevs.  After much digging and poking, the only theory I could come
> up with was that maybe the fencing operation was considered complete too
> quickly?  I googled for examples using this, and the best tutorial I
> found showed using a power-wait=5, whereas the default seems to be
> power-wait=0?  (this is CentOS 7, btw...)  I changed it to use 5 instead

That's a reasonable theory -- that's why power_wait is available. It
would be nice if there were a page collecting users' experience with the
ideal power_wait for various devices. Even better if fence-agents used
those values as the defaults.

> of 0, and did a several fencing operations while a guest VM (vsphere via
> NFS) was writing to the pool.  So far, no evidence of corruption.  BTW,
> the way I was creating and managing the cluster was with the lcmc java
> gui.  Possibly the power-wait default of 0 comes from there, I can't
> really tell.  Any thoughts or ideas appreciated :)

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] fence_apc delay?

2016-09-02 Thread Dan Swartzendruber


It occurred to me folks reading this might not have any knowledge about 
ZFS.  Think of my setup as an mdraid pool with a filesystem mounted on 
it, shared out via NFS.  Same basic idea...


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] fence_apc delay?

2016-09-02 Thread Dan Swartzendruber


So, I was testing my ZFS dual-head JBOD 2-node cluster.  Manual 
failovers worked just fine.  I then went to try an acid-test by logging 
in to node A and doing 'systemctl stop network'.  Sure enough, pacemaker 
told the APC fencing agent to power-cycle node A.  The ZFS pool moved to 
node B as expected.  As soon as node A was back up, I migrated the 
pool/IP back to node A.  I *thought* all was okay, until a bit later, I 
did 'zpool status', and saw checksum errors on both sides of several of 
the vdevs.  After much digging and poking, the only theory I could come 
up with was that maybe the fencing operation was considered complete too 
quickly?  I googled for examples using this, and the best tutorial I 
found showed using a power-wait=5, whereas the default seems to be 
power-wait=0?  (this is CentOS 7, btw...)  I changed it to use 5 instead 
of 0, and did a several fencing operations while a guest VM (vsphere via 
NFS) was writing to the pool.  So far, no evidence of corruption.  BTW, 
the way I was creating and managing the cluster was with the lcmc java 
gui.  Possibly the power-wait default of 0 comes from there, I can't 
really tell.  Any thoughts or ideas appreciated :)


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org