[ClusterLabs] Antw: [EXT] Re: Fedora 31 - systemd based resources don't start

2020-02-26 Thread Ulrich Windl
>>> Maverick  schrieb am 22.02.2020 um 16:26 in Nachricht
<15958_1582385175_5E514815_15958_1385_1_b76a96fe-0120-afb0-c90e-0bf6ffb71d26@sap
.pt>:
> Hi,
> 
> As i don't have much time to dig into this pacemaker vs systemd problem,
> i decided to dump systemd.

+1 ;-)


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] Antw: [EXT] Coming in Pacemaker 2.0.4: shutdown locks

2020-02-26 Thread Ulrich Windl
>>> Ken Gaillot  schrieb am 25.02.2020 um 23:30 in
Nachricht
<29058_1582669837_5E55A00B_29058_3341_1_f8e8426d0c2cf098f88fb6330e8a80586f03043a
ca...@redhat.com>:
> Hi all,
> 
> We are a couple of months away from starting the release cycle for
> Pacemaker 2.0.4. I'll highlight some new features between now and then.
> 
> First we have shutdown locks. This is a narrow use case that I don't
> expect a lot of interest in, but it helps give pacemaker feature parity
> with proprietary HA systems, which can help users feel more comfortable
> switching to pacemaker and open source.
> 
> The use case is a large organization with few cluster experts and many
> junior system administrators who reboot hosts for OS updates during
> planned maintenance windows, without any knowledge of what the host
> does. The cluster runs services that have a preferred node and take a
> very long time to start.
> 
> In this scenario, pacemaker's default behavior of moving the service to
> a failover node when the node shuts down, and moving it back when the
> node comes back up, results in needless downtime compared to just
> leaving the service down for the few minutes needed for a reboot.
> 
> The goal could be accomplished with existing pacemaker features.
> Maintenance mode wouldn't work because the node is being rebooted. But
> you could figure out what resources are active on the node, and use a
> location constraint with a rule to ban them on all other nodes before
> shutting down. That's a lot of work for something the cluster can
> figure out automatically.
> 
> Pacemaker 2.0.4 will offer a new cluster property, shutdown‑lock,
> defaulting to false to keep the current behavior. If shutdown‑lock is
> set to true, any resources active on a node when it is cleanly shut
> down will be "locked" to the node (kept down rather than recovered
> elsewhere). Once the node comes back up and rejoins the cluster, they
> will be "unlocked" (free to move again if circumstances warrant).

I'm not very happy with the wording: What about a per-resource feature
"tolerate-downtime" that specifies how long this resource may be down without
causing actions from the cluster. I think it would be more useful than some
global setting. Maybe complement that per-resource feature with a per-node
feature using the same name.
I think it's very important to specify and document that mode comparing it to
maintenance mode.

Regards,
Ulrich

> 
> An additional cluster property, shutdown‑lock‑limit, allows you to set
> a timeout for the locks so that if the node doesn't come back within
> that time, the resources are free to be recovered elsewhere. This
> defaults to no limit.
> 
> If you decide while the node is down that you need the resource to be
> recovered, you can manually clear a lock with "crm_resource ‑‑refresh"
> specifying both ‑‑node and ‑‑resource.
> 
> There are some limitations using shutdown locks with Pacemaker Remote
> nodes, so I'd avoid that with the upcoming release, though it is
> possible.
> ‑‑ 
> Ken Gaillot 
> 
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/ 



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] connection timed out fence_virsh monitor stonith

2020-02-26 Thread Luke Camilleri
Hi there, first of all thank you both for your suggestions and observations and 
apologies for my late reply.

I will check the logs on both hosts (although only one of them seems to be the 
issue) and will revert with any findings.

Just to confirm the error message for the monitor operation:

It seems that host zc-mail-2.zylacloud.com has a connection timeout to monitor 
the resource fence_zc-mail-1_virsh right?

My question here is, what is the monitor operation doing to confirm that the 
monitor operation is successful?

Is it doing the same operation as specified in the stonith resource and 
expecting a particular exit code?

Thanks once again

-Original Message-
From: Dan Swartzendruber 
mailto:dan%20swartzendruber%20%3cdswa...@druber.com%3e>>
To: Cluster Labs - All topics related to open-source clustering welcomed 
mailto:cluster%20labs%20-%20all%20topics%20related%20to%20open-source%20clustering%20welcomed%20%3cus...@clusterlabs.org%3e>>
Cc: Luke Camilleri 
mailto:luke%20camilleri%20%3cluke.camill...@zylacomputing.com%3e>>
Subject: Re: [ClusterLabs] connection timed out fence_virsh monitor stonith
Date: Mon, 24 Feb 2020 12:24:16 -0500


On 2020-02-24 12:17, Strahil Nikolov wrote:

On February 24, 2020 4:56:07 PM GMT+02:00, Luke Camilleri

mailto:luke.camill...@zylacomputing.com>> 
wrote:

Hello users, I would like to ask for assistance on the below setup

please, mainly on the monitor fence timeout:


I notice that the issue happens at 00:00 on both days .

Have you checked  for a backup or other cron job that is 'overloading'

the virtualization host ?


This is a very good point.  I had a similar problem with a vsphere

cluster.  Two hyper-converged storage appliances.  I used the

fence-vmware-rest (or soap) stonith agent to fence the storage apps.

Worked just fine.  Until the vcenter server appliance got busy doing

something or other.  Next thing I know, I'm getting stonith agent

timeouts.  I ended up switching to fence_scsi.  Not sure there is a good

answer.  I saw on a vmware forum a recommendation to increase the

stonith timeout, but the recommended timeout was close to a minute,

which is enough to be a problem for the VMs in that cluster...

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

[ClusterLabs] Q: pseudo actions load_stopped_*, all_stopped

2020-02-26 Thread Ulrich Windl
Hi!

I'm wondering what the pseudo actions in output of crm simulate are: 
"load_stopped" and "stopped_all". Are these some synchronization points? 
see them between (monitor, stop) and (start, monitor)

Regards,
Ulrich


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] DRBD not failing over

2020-02-26 Thread Jaap Winius



Hi folks,

My 2-node test system has a DRBD resource that is configured as follows:

~# pcs resource defaults resource-stickiness=100 ; \
   pcs resource create drbd ocf:linbit:drbd drbd_resource=r0 \
   op monitor interval=60s ; \
   pcs resource master drbd master-max=1 master-node-max=1 \
   clone-max=2 clone-node-max=1 notify=true

The resource-stickiness setting is to prevent failbacks. I've got that  
to work with NFS and and VIP resources, but not with DRBD. Moreover,  
when configured as shown above, the DRBD master does not even want to  
fail over when the node it started up on is shut down.


Any idea what I'm missing or doing wrong?

Thanks,

Jaap

PS -- I can only get it to fail over if I first move the DRBD resource  
to the other node, which creates a "cli-prefer-drbd-master" location  
constraint for that node, but then it ignores the resource-stickiness  
setting and always performs the failbacks.


PPS -- I'm using CentOS 7.7.1908, DRBD 9.10.0, Corosync 2.4.3,  
Pacemaker 1.1.20 and PCS 0.9.167.


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] DRBD not failing over

2020-02-26 Thread Nickle, Richard
I spent many, many hours tackling the two-node problem and I had exactly
the same symptoms (only able to get the resource to move if I moved it
manually) until I did the following:

* Switch to DRBD 9 (added LINBIT repo because DRBD 8 is the default in the
Ubuntu repo)
* Build a third diskless quorum arbitration node.

My DRBD configuration now looks like this:

hatst2:$ sudo drbdadm status

r0 role:*Primary*

  disk:*UpToDate*

  hatst1 role:Secondary

peer-disk:UpToDate

  hatst4 role:Secondary

peer-disk:Diskless

On Wed, Feb 26, 2020 at 6:59 AM Jaap Winius  wrote:

>
> Hi folks,
>
> My 2-node test system has a DRBD resource that is configured as follows:
>
> ~# pcs resource defaults resource-stickiness=100 ; \
> pcs resource create drbd ocf:linbit:drbd drbd_resource=r0 \
> op monitor interval=60s ; \
> pcs resource master drbd master-max=1 master-node-max=1 \
> clone-max=2 clone-node-max=1 notify=true
>
> The resource-stickiness setting is to prevent failbacks. I've got that
> to work with NFS and and VIP resources, but not with DRBD. Moreover,
> when configured as shown above, the DRBD master does not even want to
> fail over when the node it started up on is shut down.
>
> Any idea what I'm missing or doing wrong?
>
> Thanks,
>
> Jaap
>
> PS -- I can only get it to fail over if I first move the DRBD resource
> to the other node, which creates a "cli-prefer-drbd-master" location
> constraint for that node, but then it ignores the resource-stickiness
> setting and always performs the failbacks.
>
> PPS -- I'm using CentOS 7.7.1908, DRBD 9.10.0, Corosync 2.4.3,
> Pacemaker 1.1.20 and PCS 0.9.167.
>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
>
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] DRBD not failing over

2020-02-26 Thread Jaap Winius



Quoting "Nickle, Richard" :


* Switch to DRBD 9 ...
* Build a third diskless quorum arbitration node.


Very interesting. I'm already running DRBD 9, so that base has already  
been covered, but here's some extra information: My test system  
actually consists of a single 4-node DRBD cluster that spans two data  
centers, with each data center having a 2-node Pacemaker cluster to  
fail resources over between the two DRBD nodes in that data center.  
But, for the purpose of quorum arbitration I guess these extra DRBD  
nodes don't matter, perhaps because four is not an odd number?


Cheers,

Jaap

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Coming in Pacemaker 2.0.4: shutdown locks

2020-02-26 Thread Ken Gaillot
On Wed, 2020-02-26 at 14:45 +0900, Ondrej wrote:
> Hi Ken,
> 
> On 2/26/20 7:30 AM, Ken Gaillot wrote:
> > The use case is a large organization with few cluster experts and
> > many
> > junior system administrators who reboot hosts for OS updates during
> > planned maintenance windows, without any knowledge of what the host
> > does. The cluster runs services that have a preferred node and take
> > a
> > very long time to start.
> > 
> > In this scenario, pacemaker's default behavior of moving the
> > service to
> > a failover node when the node shuts down, and moving it back when
> > the
> > node comes back up, results in needless downtime compared to just
> > leaving the service down for the few minutes needed for a reboot.
> 
> 1. Do I understand it correctly that scenario will be when system 
> gracefully reboots (pacemaker service is stopped by system shutting 
> down) and also in case that users for example manually stop cluster
> but 
> doesn't reboot the node - something like `pcs cluster stop`?

Exactly. The idea is the user wants HA for node or resource failures,
but not clean cluster stops.

> > If you decide while the node is down that you need the resource to
> > be
> > recovered, you can manually clear a lock with "crm_resource --
> > refresh"
> > specifying both --node and --resource.
> 
> 2. I'm interested how the situation will look like in the 'crm_mon' 
> output or in 'crm_simulate'. Will there be some indication why the 
> resources are not moving like 'blocked-shutdown-lock' or they will
> just 
> appear as not moving (Stopped)?

Yes, resources will be shown as "Stopped (LOCKED)".

> Will this look differently from situation where for example the
> resource 
> is just not allowed by constraint to run on other nodes?

Only in logs and cluster status; internally it is implemented as
implicit constraints banning the resources from every other node.

Another point I should clarify is that the lock/constraint remains in
place until the node rejoins the cluster *and* the resource starts
again on that node. That ensures that the node is preferred even if
stickiness was the only thing holding the resource to the node
previously.

However once the resource starts on the node, the lock/constraint is
lifted, and the resource could theoretically immediately move to
another node. An example would be if there were no stickiness and new
resources were added to the configuration while the node was down, so
load balancing calculations end up different. Another would be if a
time-based rule kicked in while the node was down. However this feature
is only expected or likely to be used in a cluster where there are
preferred nodes, enforced by stickiness and/or location constraints, so
it shouldn't be significant in practice.

Special care was taken in a number of corner cases:

* If the resource start on the rejoined node fails, the lock is lifted.

* If the node is fenced (e.g. manually via stonith_admin) while it is
down, the lock is lifted.

* If the resource somehow started on another node while the node was
down (which shouldn't be possible, but just as a fail-safe), the lock
is ignored when the node rejoins.

* Maintenance mode, unmanaged resources, etc., work the same with
shutdown locks as they would with any other constraint.

> Thanks for heads up
> 
> --
> Ondrej Famera
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Antw: [EXT] Coming in Pacemaker 2.0.4: shutdown locks

2020-02-26 Thread Ken Gaillot
On Wed, 2020-02-26 at 10:33 +0100, Ulrich Windl wrote:
> > > > Ken Gaillot  schrieb am 25.02.2020 um
> > > > 23:30 in
> 
> Nachricht
> <29058_1582669837_5E55A00B_29058_3341_1_f8e8426d0c2cf098f88fb6330e8a8
> 0586f03043a
> ca...@redhat.com>:
> > Hi all,
> > 
> > We are a couple of months away from starting the release cycle for
> > Pacemaker 2.0.4. I'll highlight some new features between now and
> > then.
> > 
> > First we have shutdown locks. This is a narrow use case that I
> > don't
> > expect a lot of interest in, but it helps give pacemaker feature
> > parity
> > with proprietary HA systems, which can help users feel more
> > comfortable
> > switching to pacemaker and open source.
> > 
> > The use case is a large organization with few cluster experts and
> > many
> > junior system administrators who reboot hosts for OS updates during
> > planned maintenance windows, without any knowledge of what the host
> > does. The cluster runs services that have a preferred node and take
> > a
> > very long time to start.
> > 
> > In this scenario, pacemaker's default behavior of moving the
> > service to
> > a failover node when the node shuts down, and moving it back when
> > the
> > node comes back up, results in needless downtime compared to just
> > leaving the service down for the few minutes needed for a reboot.
> > 
> > The goal could be accomplished with existing pacemaker features.
> > Maintenance mode wouldn't work because the node is being rebooted.
> > But
> > you could figure out what resources are active on the node, and use
> > a
> > location constraint with a rule to ban them on all other nodes
> > before
> > shutting down. That's a lot of work for something the cluster can
> > figure out automatically.
> > 
> > Pacemaker 2.0.4 will offer a new cluster property, shutdown‑lock,
> > defaulting to false to keep the current behavior. If shutdown‑lock
> > is
> > set to true, any resources active on a node when it is cleanly shut
> > down will be "locked" to the node (kept down rather than recovered
> > elsewhere). Once the node comes back up and rejoins the cluster,
> > they
> > will be "unlocked" (free to move again if circumstances warrant).
> 
> I'm not very happy with the wording: What about a per-resource
> feature
> "tolerate-downtime" that specifies how long this resource may be down
> without
> causing actions from the cluster. I think it would be more useful
> than some
> global setting. Maybe complement that per-resource feature with a
> per-node
> feature using the same name.

I considered a per-resource and/or per-node setting, but the target
audience is someone who wants things as simple as possible. A per-node
setting would mean that newly added nodes don't have it by default,
which could be easily overlooked. (As an aside, I would someday like to
see a "node defaults" section that would provide default values for
node attributes. That could potentially replace several current
cluster-wide options. But it's a low priority.)

I didn't mention this in the announcements, but certain resource types
are excluded:

Stonith resources and Pacemaker Remote connection resources are never
locked. That makes sense because they are more a sort of internal
pseudo-resource than an actual end-user service. Stonith resources are
just monitors of the fence device, and a connection resource starts a
(remote) node rather than a service.

Also, with the current implementation, clone and bundle instances are
not locked. This would only matter for unique clones, and
clones/bundles with clone-max/replicas set below the total number of
nodes. If this becomes a high demand, we could add it in the future.
Similarly for the master role of promotable clones.

Given those limitations, I think a per-resource option would have more
potential to be confusing than helpful. But, it should be relatively
simple to extend this as a per-resource option, with the global option
as a backward-compatible default, if the demand arises.

> I think it's very important to specify and document that mode
> comparing it to
> maintenance mode.

The proposed documentation is in the master branch if you want to proof
it and make suggestions. If you have the prerequisites installed you
can run "make -C doc" and view it locally, otherwise you can browse the
source (search for "shutdown-lock"):

https://github.com/ClusterLabs/pacemaker/blob/master/doc/Pacemaker_Explained/en-US/Ch-Options.txt

There is currently no explicit comparison with maintenance-mode because
maintenance-mode still behaves according to its documention ("Should
the cluster refrain from monitoring, starting and stopping
resources?").

However I can see the value in adding a section somewhere (probably in
"Pacemaker Administration") comparing all the various "don't touch"
settings -- maintenance-mode, maintenance node/resource attributes,
standby, is-managed, shutdown-lock, and the monitor enable option. The
current "Monitoring Resources When Administration is Disabled" sectio

Re: [ClusterLabs] Q: pseudo actions load_stopped_*, all_stopped

2020-02-26 Thread Ken Gaillot
On Wed, 2020-02-26 at 11:58 +0100, Ulrich Windl wrote:
> Hi!
> 
> I'm wondering what the pseudo actions in output of crm simulate are:
> "load_stopped" and "stopped_all". Are these some
> synchronization points? see them between (monitor, stop) and (start,
> monitor)
> 
> Regards,
> Ulrich

Exactly, they typically exist as points other actions can be internally
ordered against.
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] DRBD not failing over

2020-02-26 Thread Strahil Nikolov
On February 26, 2020 2:36:46 PM GMT+02:00, "Nickle, Richard" 
 wrote:
>I spent many, many hours tackling the two-node problem and I had
>exactly
>the same symptoms (only able to get the resource to move if I moved it
>manually) until I did the following:
>
>* Switch to DRBD 9 (added LINBIT repo because DRBD 8 is the default in
>the
>Ubuntu repo)
>* Build a third diskless quorum arbitration node.
>
>My DRBD configuration now looks like this:
>
>hatst2:$ sudo drbdadm status
>
>r0 role:*Primary*
>
>  disk:*UpToDate*
>
>  hatst1 role:Secondary
>
>peer-disk:UpToDate
>
>  hatst4 role:Secondary
>
>peer-disk:Diskless
>
>On Wed, Feb 26, 2020 at 6:59 AM Jaap Winius  wrote:
>
>>
>> Hi folks,
>>
>> My 2-node test system has a DRBD resource that is configured as
>follows:
>>
>> ~# pcs resource defaults resource-stickiness=100 ; \
>> pcs resource create drbd ocf:linbit:drbd drbd_resource=r0 \
>> op monitor interval=60s ; \
>> pcs resource master drbd master-max=1 master-node-max=1 \
>> clone-max=2 clone-node-max=1 notify=true
>>
>> The resource-stickiness setting is to prevent failbacks. I've got
>that
>> to work with NFS and and VIP resources, but not with DRBD. Moreover,
>> when configured as shown above, the DRBD master does not even want to
>> fail over when the node it started up on is shut down.
>>
>> Any idea what I'm missing or doing wrong?
>>
>> Thanks,
>>
>> Jaap
>>
>> PS -- I can only get it to fail over if I first move the DRBD
>resource
>> to the other node, which creates a "cli-prefer-drbd-master" location
>> constraint for that node, but then it ignores the resource-stickiness
>> setting and always performs the failbacks.
>>
>> PPS -- I'm using CentOS 7.7.1908, DRBD 9.10.0, Corosync 2.4.3,
>> Pacemaker 1.1.20 and PCS 0.9.167.
>>
>> ___
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> ClusterLabs home: https://www.clusterlabs.org/
>>
>>

Is  your DRBD used  as  LVM PV  -> like  as a disk for iSCSI  LUN ?
If yes, ensure that you have an LVM global filter  for the /dev/drbdXYZ and the 
physical devices (like /dev/sdXYZ ) and the wwid .

Best Regards,
Strahil Nikolov
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Antw: [EXT] Coming in Pacemaker 2.0.4: shutdown locks

2020-02-26 Thread wferi
Ken Gaillot  writes:

> I think a per-resource option would have more potential to be
> confusing than helpful. But, it should be relatively simple to extend
> this as a per-resource option, with the global option as a
> backward-compatible default, if the demand arises.

And then you could immediately replace the global option with an
rsc-default.  But that's one more transition (not in the PE sense).
It indeed looks like this is more a resource option than a global
one, but the default mechanism provides an easy way to set it
globally for those who prefer that.  Unless somebody wants to
default it to twice (or so) the resource start timeout instead...
-- 
Feri
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Coming in Pacemaker 2.0.4: shutdown locks

2020-02-26 Thread Ken Gaillot
On Wed, 2020-02-26 at 06:52 +0200, Strahil Nikolov wrote:
> On February 26, 2020 12:30:24 AM GMT+02:00, Ken Gaillot <
> kgail...@redhat.com> wrote:
> > Hi all,
> > 
> > We are a couple of months away from starting the release cycle for
> > Pacemaker 2.0.4. I'll highlight some new features between now and
> > then.
> > 
> > First we have shutdown locks. This is a narrow use case that I
> > don't
> > expect a lot of interest in, but it helps give pacemaker feature
> > parity
> > with proprietary HA systems, which can help users feel more
> > comfortable
> > switching to pacemaker and open source.
> > 
> > The use case is a large organization with few cluster experts and
> > many
> > junior system administrators who reboot hosts for OS updates during
> > planned maintenance windows, without any knowledge of what the host
> > does. The cluster runs services that have a preferred node and take
> > a
> > very long time to start.
> > 
> > In this scenario, pacemaker's default behavior of moving the
> > service to
> > a failover node when the node shuts down, and moving it back when
> > the
> > node comes back up, results in needless downtime compared to just
> > leaving the service down for the few minutes needed for a reboot.
> > 
> > The goal could be accomplished with existing pacemaker features.
> > Maintenance mode wouldn't work because the node is being rebooted.
> > But
> > you could figure out what resources are active on the node, and use
> > a
> > location constraint with a rule to ban them on all other nodes
> > before
> > shutting down. That's a lot of work for something the cluster can
> > figure out automatically.
> > 
> > Pacemaker 2.0.4 will offer a new cluster property, shutdown-lock,
> > defaulting to false to keep the current behavior. If shutdown-lock
> > is
> > set to true, any resources active on a node when it is cleanly shut
> > down will be "locked" to the node (kept down rather than recovered
> > elsewhere). Once the node comes back up and rejoins the cluster,
> > they
> > will be "unlocked" (free to move again if circumstances warrant).
> > 
> > An additional cluster property, shutdown-lock-limit, allows you to
> > set
> > a timeout for the locks so that if the node doesn't come back
> > within
> > that time, the resources are free to be recovered elsewhere. This
> > defaults to no limit.
> > 
> > If you decide while the node is down that you need the resource to
> > be
> > recovered, you can manually clear a lock with "crm_resource --
> > refresh"
> > specifying both --node and --resource.
> > 
> > There are some limitations using shutdown locks with Pacemaker
> > Remote
> > nodes, so I'd avoid that with the upcoming release, though it is
> > possible.
> 
> Hi Ken,
> 
> Can it be 'shutdown-lock-timeout' instead of 'shutdown-lock-limit' ?

I thought about that, but I wanted to be clear that this is a maximum
bound. "timeout" could be a little ambiguous as to whether it is a
maximum or how long a lock will always last. On the other hand "limit"
is not obvious that it should be a time duration. I could see it going
either way.

> Also, I think that the default value could be something more
> reasonable - like 30min. Usually 30min are OK if you don't patch the
> firmware and 180min are the maximum if you do patch the firmware.

The primary goal is to ease the transition from other HA software,
which doesn't even offer the equivalent of shutdown-lock-limit, so I
wanted the default to match that behavior. Also "usually" is a mine
field :)

> The use case is odd. I have been in the same situation, and our
> solution was to train the team (internally) instead of using such
> feature.

Right, this is designed for situations where that isn't feasible :)

Though even with trained staff, this does make it easier, since you
don't have to figure out yourself what's active on the node.

> The interesting part will be the behaviour of the local cluster
> stack, when updates  happen. The risk is high for the node to be
> fenced due to unresponsiveness (during the update) or if
> corosync/pacemaker  use an old function changed in the libs.

That is a risk, but presumably one that a user transitioning from
another product would already be familiar with.

> Best Regards,
> Strahil Nikolov
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] FYI clusterlabs.org planned outage this Friday 2020-02-28

2020-02-26 Thread Ken Gaillot
Hi all,

We will be upgrading the OS on clusterlabs.org this Friday, Feb. 28,
2020, sometime after 18:00 UTC.

This will result in outages of the clusterlabs.org website, bugzilla,
and wiki. The mailing lists will also be unavailable, but mail gateways
will generally retry sent messages so there shouldn't be any missed
messages.
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] Antw: Re: Antw: [EXT] Coming in Pacemaker 2.0.4: shutdown locks

2020-02-26 Thread Ulrich Windl
>>> Ken Gaillot  schrieb am 26.02.2020 um 16:41 in 
>>> Nachricht
<2257e2a1e5fd88ae2b915b8241a8e8c9e150b95b.ca...@redhat.com>:

[...]
> I considered a per-resource and/or per-node setting, but the target
> audience is someone who wants things as simple as possible. A per-node

Actually, while it may seem simple, it adds quite a lot of additional 
complexity, and I'm still not convinced that this is really needed.

[...]

Regards,
Ulrich


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] Antw: [EXT] FYI clusterlabs.org planned outage this Friday2020-02-28

2020-02-26 Thread Ulrich Windl
>>> Ken Gaillot  schrieb am 26.02.2020 um 22:28 in
Nachricht
<16740_1582752507_5E56E2FA_16740_135_1_fbd6b92a7fac4cd5b26191f002153aeaed8d40cc.
a...@redhat.com>:
> Hi all,
> 
> We will be upgrading the OS on clusterlabs.org this Friday, Feb. 28,
> 2020, sometime after 18:00 UTC.
> 
> This will result in outages of the clusterlabs.org website, bugzilla,
> and wiki. The mailing lists will also be unavailable, but mail gateways
> will generally retry sent messages so there shouldn't be any missed
> messages.

No HA being used there? ;-)

> ‑‑ 
> Ken Gaillot 
> 
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/ 



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/