Re: [ClusterLabs] What triggers fencing?

2018-07-09 Thread Klaus Wenninger
On 07/09/2018 05:53 PM, Digimer wrote:
> On 2018-07-09 11:45 AM, Klaus Wenninger wrote:
>> On 07/09/2018 05:33 PM, Digimer wrote:
>>> On 2018-07-09 09:56 AM, Klaus Wenninger wrote:
 On 07/09/2018 03:49 PM, Digimer wrote:
> On 2018-07-09 08:31 AM, Klaus Wenninger wrote:
>> On 07/09/2018 02:04 PM, Confidential Company wrote:
>>> Hi,
>>>
>>> Any ideas what triggers fencing script or stonith?
>>>
>>> Given the setup below:
>>> 1. I have two nodes
>>> 2. Configured fencing on both nodes
>>> 3. Configured delay=15 and delay=30 on fence1(for Node1) and
>>> fence2(for Node2) respectively
>>>
>>> *What does it mean to configured delay in stonith? wait for 15 seconds
>>> before it fence the node?
>> Given that on a 2-node-cluster you don't have real quorum to make one
>> partial cluster fence the rest of the nodes the different delays are 
>> meant
>> to prevent a fencing-race.
>> Without different delays that would lead to both nodes fencing each
>> other at the same time - finally both being down.
> Not true, the faster node will kill the slower node first. It is
> possible that through misconfiguration, both could die, but it's rare
> and easily avoided with a 'delay="15"' set on the fence config for the
> node you want to win.
 What exactly is not true? Aren't we saying the same?
 Of course one of the delays can be 0 (most important is that
 they are different).
>>> Perhaps I misunderstood your message. It seemed to me that the
>>> implication was that fencing in 2-node without a delay always ends up
>>> with both nodes being down, which isn't the case. It can happen if the
>>> fence methods are not setup right (ie: the node isn't set to immediately
>>> power off on ACPI power button event).
>> Yes, a misunderstanding I guess.
>>
>> Should have been more verbose in saying that due to the
>> time between the fencing-command fired off to the fencing
>> device and the actual fencing taking place (as you state
>> dependent on how it is configured in detail - but a measurable
>> time in all cases) there is a certain probability that when
>> both nodes start fencing at roughly the same time we will
>> end up with 2 nodes down.
>>
>> Everybody has to find his own tradeoff between reliability
>> fence-races are prevented and fencing delay I guess.
> We've used this;
>
> 1. IPMI (with the guest OS set to immediately power off) as primary,
> with a 15 second delay on the active node.
>
> 2. Two Switched PDUs (two power circuits, two PSUs) as backup fencing
> for when IPMI fails, with no delay.
>
> In ~8 years, across dozens and dozens of clusters and countless fence
> actions, we've never had a dual-fence event (where both nodes go down).
> So it can be done safely, but as always, test test test before prod.

No doubt about that this setup is working reliably.
You just have to know your fencing-devices and
which delays they involve.

If we are talking about SBD (with disk as otherwise
it doesn't work in a sensible way in 2-node-clusters)
for instance I would strongly advise using a delay.

So I guess it is important to understand the basic
idea behind this different delay-based fence-race
avoidance.
Afterwards you can still decide why it is no issue
in your own setup.

>
>>> If the delay is set on both nodes, and they are different, it will work
>>> fine. The reason not to do this is that if you use 0, then don't use
>>> anything at all (0 is default), and any other value causes avoidable
>>> fence delays.
>>>
> Don't use a delay on the other node, just the node you want to live in
> such a case.
>
>>> *Given Node1 is active and Node2 goes down, does it mean fence1 will
>>> first execute and shutdowns Node1 even though Node2 goes down?
>> If Node2 managed to sign off properly it will not.
>> If network-connection is down so that Node2 can't inform Node1 that it
>> is going
>> down and finally has stopped all resources it will be fenced by Node1.
>>
>> Regards,
>> Klaus
> Fencing occurs in two cases;
>
> 1. The node stops responding (meaning it's in an unknown state, so it is
> fenced to force it into a known state).
> 2. A resource / service fails to stop stop. In this case, the service is
> in an unknown state, so the node is fenced to force the service into a
> known state so that it can be safely recovered on the peer.
>
> Graceful withdrawal of the node from the cluster, and graceful stopping
> of services will not lead to a fence (because in both cases, the node /
> service are in a known state - off).
>
>>>   
>

___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] What triggers fencing?

2018-07-09 Thread Digimer
On 2018-07-09 11:45 AM, Klaus Wenninger wrote:
> On 07/09/2018 05:33 PM, Digimer wrote:
>> On 2018-07-09 09:56 AM, Klaus Wenninger wrote:
>>> On 07/09/2018 03:49 PM, Digimer wrote:
 On 2018-07-09 08:31 AM, Klaus Wenninger wrote:
> On 07/09/2018 02:04 PM, Confidential Company wrote:
>> Hi,
>>
>> Any ideas what triggers fencing script or stonith?
>>
>> Given the setup below:
>> 1. I have two nodes
>> 2. Configured fencing on both nodes
>> 3. Configured delay=15 and delay=30 on fence1(for Node1) and
>> fence2(for Node2) respectively
>>
>> *What does it mean to configured delay in stonith? wait for 15 seconds
>> before it fence the node?
> Given that on a 2-node-cluster you don't have real quorum to make one
> partial cluster fence the rest of the nodes the different delays are meant
> to prevent a fencing-race.
> Without different delays that would lead to both nodes fencing each
> other at the same time - finally both being down.
 Not true, the faster node will kill the slower node first. It is
 possible that through misconfiguration, both could die, but it's rare
 and easily avoided with a 'delay="15"' set on the fence config for the
 node you want to win.
>>> What exactly is not true? Aren't we saying the same?
>>> Of course one of the delays can be 0 (most important is that
>>> they are different).
>> Perhaps I misunderstood your message. It seemed to me that the
>> implication was that fencing in 2-node without a delay always ends up
>> with both nodes being down, which isn't the case. It can happen if the
>> fence methods are not setup right (ie: the node isn't set to immediately
>> power off on ACPI power button event).
> Yes, a misunderstanding I guess.
> 
> Should have been more verbose in saying that due to the
> time between the fencing-command fired off to the fencing
> device and the actual fencing taking place (as you state
> dependent on how it is configured in detail - but a measurable
> time in all cases) there is a certain probability that when
> both nodes start fencing at roughly the same time we will
> end up with 2 nodes down.
> 
> Everybody has to find his own tradeoff between reliability
> fence-races are prevented and fencing delay I guess.

We've used this;

1. IPMI (with the guest OS set to immediately power off) as primary,
with a 15 second delay on the active node.

2. Two Switched PDUs (two power circuits, two PSUs) as backup fencing
for when IPMI fails, with no delay.

In ~8 years, across dozens and dozens of clusters and countless fence
actions, we've never had a dual-fence event (where both nodes go down).
So it can be done safely, but as always, test test test before prod.

>> If the delay is set on both nodes, and they are different, it will work
>> fine. The reason not to do this is that if you use 0, then don't use
>> anything at all (0 is default), and any other value causes avoidable
>> fence delays.
>>
 Don't use a delay on the other node, just the node you want to live in
 such a case.

>> *Given Node1 is active and Node2 goes down, does it mean fence1 will
>> first execute and shutdowns Node1 even though Node2 goes down?
> If Node2 managed to sign off properly it will not.
> If network-connection is down so that Node2 can't inform Node1 that it
> is going
> down and finally has stopped all resources it will be fenced by Node1.
>
> Regards,
> Klaus
 Fencing occurs in two cases;

 1. The node stops responding (meaning it's in an unknown state, so it is
 fenced to force it into a known state).
 2. A resource / service fails to stop stop. In this case, the service is
 in an unknown state, so the node is fenced to force the service into a
 known state so that it can be safely recovered on the peer.

 Graceful withdrawal of the node from the cluster, and graceful stopping
 of services will not lead to a fence (because in both cases, the node /
 service are in a known state - off).

>>
>>   


-- 
Digimer
Papers and Projects: https://alteeve.com/w/
"I am, somehow, less interested in the weight and convolutions of
Einstein’s brain than in the near certainty that people of equal talent
have lived and died in cotton fields and sweatshops." - Stephen Jay Gould
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] What triggers fencing?

2018-07-09 Thread Klaus Wenninger
On 07/09/2018 05:33 PM, Digimer wrote:
> On 2018-07-09 09:56 AM, Klaus Wenninger wrote:
>> On 07/09/2018 03:49 PM, Digimer wrote:
>>> On 2018-07-09 08:31 AM, Klaus Wenninger wrote:
 On 07/09/2018 02:04 PM, Confidential Company wrote:
> Hi,
>
> Any ideas what triggers fencing script or stonith?
>
> Given the setup below:
> 1. I have two nodes
> 2. Configured fencing on both nodes
> 3. Configured delay=15 and delay=30 on fence1(for Node1) and
> fence2(for Node2) respectively
>
> *What does it mean to configured delay in stonith? wait for 15 seconds
> before it fence the node?
 Given that on a 2-node-cluster you don't have real quorum to make one
 partial cluster fence the rest of the nodes the different delays are meant
 to prevent a fencing-race.
 Without different delays that would lead to both nodes fencing each
 other at the same time - finally both being down.
>>> Not true, the faster node will kill the slower node first. It is
>>> possible that through misconfiguration, both could die, but it's rare
>>> and easily avoided with a 'delay="15"' set on the fence config for the
>>> node you want to win.
>> What exactly is not true? Aren't we saying the same?
>> Of course one of the delays can be 0 (most important is that
>> they are different).
> Perhaps I misunderstood your message. It seemed to me that the
> implication was that fencing in 2-node without a delay always ends up
> with both nodes being down, which isn't the case. It can happen if the
> fence methods are not setup right (ie: the node isn't set to immediately
> power off on ACPI power button event).
Yes, a misunderstanding I guess.

Should have been more verbose in saying that due to the
time between the fencing-command fired off to the fencing
device and the actual fencing taking place (as you state
dependent on how it is configured in detail - but a measurable
time in all cases) there is a certain probability that when
both nodes start fencing at roughly the same time we will
end up with 2 nodes down.

Everybody has to find his own tradeoff between reliability
fence-races are prevented and fencing delay I guess.
 
>
> If the delay is set on both nodes, and they are different, it will work
> fine. The reason not to do this is that if you use 0, then don't use
> anything at all (0 is default), and any other value causes avoidable
> fence delays.
>
>>> Don't use a delay on the other node, just the node you want to live in
>>> such a case.
>>>
> *Given Node1 is active and Node2 goes down, does it mean fence1 will
> first execute and shutdowns Node1 even though Node2 goes down?
 If Node2 managed to sign off properly it will not.
 If network-connection is down so that Node2 can't inform Node1 that it
 is going
 down and finally has stopped all resources it will be fenced by Node1.

 Regards,
 Klaus
>>> Fencing occurs in two cases;
>>>
>>> 1. The node stops responding (meaning it's in an unknown state, so it is
>>> fenced to force it into a known state).
>>> 2. A resource / service fails to stop stop. In this case, the service is
>>> in an unknown state, so the node is fenced to force the service into a
>>> known state so that it can be safely recovered on the peer.
>>>
>>> Graceful withdrawal of the node from the cluster, and graceful stopping
>>> of services will not lead to a fence (because in both cases, the node /
>>> service are in a known state - off).
>>>
>
>   
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Upgrade corosync problem

2018-07-09 Thread Jan Pokorný
On 06/07/18 15:25 +0200, Salvatore D'angelo wrote:
> On 6 Jul 2018, at 14:40, Christine Caulfield  wrote:
>> Yes. you can't randomly swap in and out hand-compiled libqb versions.
>> Find one that works and stick to it. It's an annoying 'feature' of newer
>> linkers that we had to workaround in libqb. So if you rebuild libqb
>> 1.0.3 then you will, in all likelihood, need to rebuild corosync to
>> match it.
> 
> The problem is opposite to what you are saying.
> 
> When I build corosync with old libqb and I verified the new updated
> node worked properly I updated with new libqb hand-compiled and it
> works fine.
> But in a normal upgrade procedure I first build libqb (removing
> first the old one) and then corosync, when I follow this order it
> does not work.
> This is what make me crazy. I do not understand this behavior.

I will assume you have all the steps right, like issuing equivalent
of "make install" once you've built libqb, assuring that system-native
(e.g. distribution packages) libqb and corosync won't be mixed here,
simply that you are cautious.

>> On 06/07/18 13:24, Salvatore D'angelo wrote:
>>> if I launch corosync -f I got:
>>> *corosync: main.c:143: logsys_qb_init: Assertion `"implicit callsite
>>> section is populated, otherwise target's build is at fault, preventing
>>> reliable logging" && __start___verbose != __stop___verbose' failed.*

The extreme and discouraged workaround is to compile corosync with
something like: "make CPPFLAGS=-DQB_KILL_ATTRIBUTE_SECTION", even though
we'd rather diagnose why this happens in the first place.

That apart, it'd be helpful to know output from the following commands
once you have corosync binary (symbolically referred to as $COROSYNC)
in a state the avouve error will be reproduced and you didn't change
anything about your build environment:

  # version of the linker
  bash -c 'paste <(ld --version) <(ld.bfd --version) | head -n1'

  # how does the ELF section/respective symbols appear in the binary
  readelf -s $COROSYNC | grep ___verbose

Hopefully, it will allow us to advance here.

-- 
Nazdar,
Jan (Poki)


pgpfpmmbY_IKV.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Clearing failed actions

2018-07-09 Thread Ken Gaillot
On Mon, 2018-07-09 at 09:11 +0200, Jehan-Guillaume de Rorthais wrote:
> On Fri, 06 Jul 2018 10:15:08 -0600
> Casey Allen Shobe  wrote:
> 
> > Hi,
> > 
> > I found a web page which suggested to clear the Failed Actions, to
> > use
> > `crm_resource -P`.  Although this appears to work, it's not
> > documented on the
> > man page at all.  Is this deprecated and is there a more correct
> > way to be
> > doing this?
> 
> -P means "reprobe", so I guess this is a side effect or a pre-
> requisit, but not
> only to clean failcounts.

In the 1.1 series, -P is a deprecated synonym for --cleanup / -C. The
options clear fail counts and resource operation history (for a
specific resource and/or node if specified with -r and/or -N, otherwise
all).

In the 2.0 series, -P is gone. --refresh / -R now does what cleanup
used to; --cleanup / -C now cleans up only resources that have had
failures. In other words, the old --cleanup and new --refresh clean
resource history, forcing a re-probe, regardless of whether a resource
failed or not, whereas the new --cleanup will skip resources that
didn't have failures. 

> > Also, is there a way to clear one specific item from the list, or
> > is clearing
> > all the only option?
> 
> pcs failcount reset  [node]

With the low level tools, you can use -r / --resource and/or -N / --
node with crm_resource to limit the clean-up.
-- 
Ken Gaillot 
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] What triggers fencing?

2018-07-09 Thread Klaus Wenninger
On 07/09/2018 03:49 PM, Digimer wrote:
> On 2018-07-09 08:31 AM, Klaus Wenninger wrote:
>> On 07/09/2018 02:04 PM, Confidential Company wrote:
>>> Hi,
>>>
>>> Any ideas what triggers fencing script or stonith?
>>>
>>> Given the setup below:
>>> 1. I have two nodes
>>> 2. Configured fencing on both nodes
>>> 3. Configured delay=15 and delay=30 on fence1(for Node1) and
>>> fence2(for Node2) respectively
>>>
>>> *What does it mean to configured delay in stonith? wait for 15 seconds
>>> before it fence the node?
>> Given that on a 2-node-cluster you don't have real quorum to make one
>> partial cluster fence the rest of the nodes the different delays are meant
>> to prevent a fencing-race.
>> Without different delays that would lead to both nodes fencing each
>> other at the same time - finally both being down.
> Not true, the faster node will kill the slower node first. It is
> possible that through misconfiguration, both could die, but it's rare
> and easily avoided with a 'delay="15"' set on the fence config for the
> node you want to win.
What exactly is not true? Aren't we saying the same?
Of course one of the delays can be 0 (most important is that
they are different).

>
> Don't use a delay on the other node, just the node you want to live in
> such a case.
>
>>> *Given Node1 is active and Node2 goes down, does it mean fence1 will
>>> first execute and shutdowns Node1 even though Node2 goes down?
>> If Node2 managed to sign off properly it will not.
>> If network-connection is down so that Node2 can't inform Node1 that it
>> is going
>> down and finally has stopped all resources it will be fenced by Node1.
>>
>> Regards,
>> Klaus
> Fencing occurs in two cases;
>
> 1. The node stops responding (meaning it's in an unknown state, so it is
> fenced to force it into a known state).
> 2. A resource / service fails to stop stop. In this case, the service is
> in an unknown state, so the node is fenced to force the service into a
> known state so that it can be safely recovered on the peer.
>
> Graceful withdrawal of the node from the cluster, and graceful stopping
> of services will not lead to a fence (because in both cases, the node /
> service are in a known state - off).
>

___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Pacemaker 2.0.0 has been released

2018-07-09 Thread Digimer
On 2018-07-06 07:24 PM, Ken Gaillot wrote:
> I am very happy to announce that source code for the final release of
> Pacemaker version 2.0.0 is now available at:
> 
> https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-2.0.0
> 
> The main goal of the change from Pacemaker 1 to 2 is to drop support
> for deprecated legacy usage, in order to make the code base more
> maintainable going into the future.
> 
> Rolling (live) upgrades are possible only from Pacemaker 1.1.11 or
> later, on top of corosync 2 or later. Other setups can be upgraded with
> the cluster stopped.
> 
> If upgrading an existing cluster, it is recommended to run "cibadmin --
> upgrade" (or the equivalent in your higher-level tool of choice) both
> before and after the upgrade.
> 
> Extensive details about the changes in this release are available in
> the change log:
> 
>   https://github.com/ClusterLabs/pacemaker/blob/2.0/ChangeLog
> 
> and in a special wiki page for the 2.0 release:
> 
>   https://wiki.clusterlabs.org/wiki/Pacemaker_2.0_Changes
> 
> Highlights:
> 
> * Support has been dropped for heartbeat and corosync 1 (whether using
> CMAN or plugin), and many legacy aliases for cluster options (including
> default-resource-stickiness, which should be set as resource-
> stickiness in rsc_defaults instead).
> 
> * The logs should be a little more user-friendly. The Pacemaker daemons
> have been renamed for easier log searching. The default location of the
> Pacemaker detail log is now /var/log/pacemaker/pacemaker.log, and
> Pacemaker will no longer use Corosync's logging preferences.
> 
> * The master XML tag is deprecated (though still supported) in favor of
> using the standard clone tag with a new "promotable" meta-attribute set
> to true. The "master-max" and "master-node-max" master meta-attributes
> are deprecated in favor of new "promoted-max" and "promoted-node-max"
> clone meta-attributes. Documentation now refers to these as promotable
> clones rather than master/slave, stateful or multistate clones.
> 
> * The record-pending option now defaults to true, which means pending
> actions will be shown in status displays.
> 
> * The "Pacemaker Explained" document has grown large enough that topics
> related to cluster administration have been moved to their own new
> document, "Pacemaker Administration":
> 
>   http://clusterlabs.org/pacemaker/doc/
> 
> Many thanks to all contributors of source code to this release,
> including Andrew Beekhof, Bin Liu, Bruno Travouillon, Gao,Yan, Hideo
> Yamauchi, Jan Pokorný, Ken Gaillot, and Klaus Wenninger.
> 

Huge congrats to all!!

-- 
Digimer
Papers and Projects: https://alteeve.com/w/
"I am, somehow, less interested in the weight and convolutions of
Einstein’s brain than in the near certainty that people of equal talent
have lived and died in cotton fields and sweatshops." - Stephen Jay Gould
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] What triggers fencing?

2018-07-09 Thread Digimer
On 2018-07-09 08:31 AM, Klaus Wenninger wrote:
> On 07/09/2018 02:04 PM, Confidential Company wrote:
>> Hi,
>>
>> Any ideas what triggers fencing script or stonith?
>>
>> Given the setup below:
>> 1. I have two nodes
>> 2. Configured fencing on both nodes
>> 3. Configured delay=15 and delay=30 on fence1(for Node1) and
>> fence2(for Node2) respectively
>>
>> *What does it mean to configured delay in stonith? wait for 15 seconds
>> before it fence the node?
> 
> Given that on a 2-node-cluster you don't have real quorum to make one
> partial cluster fence the rest of the nodes the different delays are meant
> to prevent a fencing-race.
> Without different delays that would lead to both nodes fencing each
> other at the same time - finally both being down.

Not true, the faster node will kill the slower node first. It is
possible that through misconfiguration, both could die, but it's rare
and easily avoided with a 'delay="15"' set on the fence config for the
node you want to win.

Don't use a delay on the other node, just the node you want to live in
such a case.

>> *Given Node1 is active and Node2 goes down, does it mean fence1 will
>> first execute and shutdowns Node1 even though Node2 goes down?
> 
> If Node2 managed to sign off properly it will not.
> If network-connection is down so that Node2 can't inform Node1 that it
> is going
> down and finally has stopped all resources it will be fenced by Node1.
> 
> Regards,
> Klaus

Fencing occurs in two cases;

1. The node stops responding (meaning it's in an unknown state, so it is
fenced to force it into a known state).
2. A resource / service fails to stop stop. In this case, the service is
in an unknown state, so the node is fenced to force the service into a
known state so that it can be safely recovered on the peer.

Graceful withdrawal of the node from the cluster, and graceful stopping
of services will not lead to a fence (because in both cases, the node /
service are in a known state - off).

-- 
Digimer
Papers and Projects: https://alteeve.com/w/
"I am, somehow, less interested in the weight and convolutions of
Einstein’s brain than in the near certainty that people of equal talent
have lived and died in cotton fields and sweatshops." - Stephen Jay Gould
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] What triggers fencing?

2018-07-09 Thread Klaus Wenninger
On 07/09/2018 02:04 PM, Confidential Company wrote:
> Hi,
>
> Any ideas what triggers fencing script or stonith?
>
> Given the setup below:
> 1. I have two nodes
> 2. Configured fencing on both nodes
> 3. Configured delay=15 and delay=30 on fence1(for Node1) and
> fence2(for Node2) respectively
>
> *What does it mean to configured delay in stonith? wait for 15 seconds
> before it fence the node?

Given that on a 2-node-cluster you don't have real quorum to make one
partial cluster fence the rest of the nodes the different delays are meant
to prevent a fencing-race.
Without different delays that would lead to both nodes fencing each
other at the same time - finally both being down.

>
> *Given Node1 is active and Node2 goes down, does it mean fence1 will
> first execute and shutdowns Node1 even though Node2 goes down?

If Node2 managed to sign off properly it will not.
If network-connection is down so that Node2 can't inform Node1 that it
is going
down and finally has stopped all resources it will be fenced by Node1.

Regards,
Klaus
>  
>
> Thanks
>
> imnotarobot
>
>
> ___
> Users mailing list: Users@clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] What triggers fencing?

2018-07-09 Thread Confidential Company
Hi,

Any ideas what triggers fencing script or stonith?

Given the setup below:
1. I have two nodes
2. Configured fencing on both nodes
3. Configured delay=15 and delay=30 on fence1(for Node1) and fence2(for
Node2) respectively

*What does it mean to configured delay in stonith? wait for 15 seconds
before it fence the node?

*Given Node1 is active and Node2 goes down, does it mean fence1 will first
execute and shutdowns Node1 even though Node2 goes down?

Thanks

imnotarobot
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Clearing failed actions

2018-07-09 Thread Jehan-Guillaume de Rorthais
On Fri, 06 Jul 2018 10:15:08 -0600
Casey Allen Shobe  wrote:

> Hi,
> 
> I found a web page which suggested to clear the Failed Actions, to use
> `crm_resource -P`.  Although this appears to work, it's not documented on the
> man page at all.  Is this deprecated and is there a more correct way to be
> doing this?

-P means "reprobe", so I guess this is a side effect or a pre-requisit, but not
only to clean failcounts.

> Also, is there a way to clear one specific item from the list, or is clearing
> all the only option?

pcs failcount reset  [node]
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org