Re: [ClusterLabs] Antw: Re: Antw: [EXT] Coming in Pacemaker 2.0.4: shutdown locks

2020-02-27 Thread Andrei Borzenkov
28.02.2020 01:55, Ken Gaillot пишет:
> On Thu, 2020-02-27 at 22:39 +0300, Andrei Borzenkov wrote:
>> 27.02.2020 20:54, Ken Gaillot пишет:
>>> On Thu, 2020-02-27 at 18:43 +0100, Jehan-Guillaume de Rorthais
>>> wrote:
>> Speaking about shutdown, what is the status of clean shutdown
>> of
>> the
>> cluster handled by Pacemaker? Currently, I advice to stop
>> resources
>> gracefully (eg. using pcs resource disable [...]) before
>> shutting
>> down each
>> nodes either by hand or using some higher level tool (eg. pcs
>> cluster stop
>> --all).  
>
> I'm not sure why that would be necessary. It should be
> perfectly
> fine
> to stop pacemaker in any order without disabling resources.

 Because resources might move around during the shutdown sequence.
 It
 might
 not be desirable as some resource migration can be heavy, long,
 interfere
 with shutdown, etc. I'm pretty sure this has been discussed in
 the
 past.
>>>
>>> Ah, that makes sense, I hadn't thought about that.
>>
>> Is not it exactly what shutdown-lock does? It prevents resource
>> migration when stopping pacemaker so my expectation is that if we
>> stop
>> pacemaker on all nodes no resource is moved. Or what am I missing?
> 
> shutdown-lock would indeed handle this, if you want the behavior
> whenever any node is shut down. However for this purpose, I could see
> some users wanting the behavior when shutting down all nodes, but not
> when shutting down just one node.
> 

Well, this requires pacemaker supporting notion of "cluster wide
shutdown" in the first place.

> BTW if all nodes shut down, any shutdown locks are cleared.
> Practically, this is because they are stored in the CIB status section,
> which goes away with the cluster. Logically, I could see arguments for
> and against, but this makes sense.
> 

This actually allows poor man implementation of cluster wide shutdown by
setting lock immediately before stopping nodes; it could probably even
be integrated directly into "pcs cluster stop --all". I wish crmsh
offered something similar too.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Antw: Re: Antw: [EXT] Coming in Pacemaker 2.0.4: shutdown locks

2020-02-27 Thread Ken Gaillot
On Thu, 2020-02-27 at 22:39 +0300, Andrei Borzenkov wrote:
> 27.02.2020 20:54, Ken Gaillot пишет:
> > On Thu, 2020-02-27 at 18:43 +0100, Jehan-Guillaume de Rorthais
> > wrote:
> > > > > Speaking about shutdown, what is the status of clean shutdown
> > > > > of
> > > > > the
> > > > > cluster handled by Pacemaker? Currently, I advice to stop
> > > > > resources
> > > > > gracefully (eg. using pcs resource disable [...]) before
> > > > > shutting
> > > > > down each
> > > > > nodes either by hand or using some higher level tool (eg. pcs
> > > > > cluster stop
> > > > > --all).  
> > > > 
> > > > I'm not sure why that would be necessary. It should be
> > > > perfectly
> > > > fine
> > > > to stop pacemaker in any order without disabling resources.
> > > 
> > > Because resources might move around during the shutdown sequence.
> > > It
> > > might
> > > not be desirable as some resource migration can be heavy, long,
> > > interfere
> > > with shutdown, etc. I'm pretty sure this has been discussed in
> > > the
> > > past.
> > 
> > Ah, that makes sense, I hadn't thought about that.
> 
> Is not it exactly what shutdown-lock does? It prevents resource
> migration when stopping pacemaker so my expectation is that if we
> stop
> pacemaker on all nodes no resource is moved. Or what am I missing?

shutdown-lock would indeed handle this, if you want the behavior
whenever any node is shut down. However for this purpose, I could see
some users wanting the behavior when shutting down all nodes, but not
when shutting down just one node.

BTW if all nodes shut down, any shutdown locks are cleared.
Practically, this is because they are stored in the CIB status section,
which goes away with the cluster. Logically, I could see arguments for
and against, but this makes sense.
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Antw: Re: Antw: [EXT] Coming in Pacemaker 2.0.4: shutdown locks

2020-02-27 Thread Ken Gaillot
On Thu, 2020-02-27 at 20:42 +0200, Strahil Nikolov wrote:
> On February 27, 2020 7:00:36 PM GMT+02:00, Ken Gaillot <
> kgail...@redhat.com> wrote:
> > On Thu, 2020-02-27 at 17:28 +0100, Jehan-Guillaume de Rorthais
> > wrote:
> > > On Thu, 27 Feb 2020 09:48:23 -0600
> > > Ken Gaillot  wrote:
> > > 
> > > > On Thu, 2020-02-27 at 15:01 +0100, Jehan-Guillaume de Rorthais
> > > > wrote:
> > > > > On Thu, 27 Feb 2020 12:24:46 +0100
> > > > > "Ulrich Windl"  wrote:
> > > > >   
> > > > > > > > > Jehan-Guillaume de Rorthais  schrieb
> > > > > > > > > am
> > > > > > > > > 27.02.2020 um
> > > > > > 
> > > > > > 11:05 in
> > > > > > Nachricht <20200227110502.3624cb87@firost>:
> > > > > > 
> > > > > > [...]  
> > > > > > > What about something like "lock‑location=bool" and
> > > > > > 
> > > > > > For "lock-location" I would assume the value is a
> > > > > > "location". I
> > > > > > guess you
> > > > > > wanted a "use-lock-location" Boolean value.  
> > > > > 
> > > > > Mh, maybe "lock-current-location" would better reflect what I
> > > > > meant.
> > > > > 
> > > > > The point is to lock the resource on the node currently
> > > > > running
> > > > > it.  
> > > > 
> > > > Though it only applies for a clean node shutdown, so that has
> > > > to be
> > > > in
> > > > the name somewhere. The resource isn't locked during normal
> > > > cluster
> > > > operation (it can move for resource or node failures, load
> > > > rebalancing,
> > > > etc.).
> > > 
> > > Well, I was trying to make the new feature a bit wider than just
> > > the
> > > narrow shutdown feature.
> > > 
> > > Speaking about shutdown, what is the status of clean shutdown of
> > > the
> > > cluster
> > > handled by Pacemaker? Currently, I advice to stop resources
> > > gracefully (eg.
> > > using pcs resource disable [...]) before shutting down each nodes
> > > either by hand
> > > or using some higher level tool (eg. pcs cluster stop --all).
> > 
> > I'm not sure why that would be necessary. It should be perfectly
> > fine
> > to stop pacemaker in any order without disabling resources.
> > 
> > Start-up is actually more of an issue ... if you start corosync and
> > pacemaker on nodes one by one, and you're not quick enough, then
> > once
> > quorum is reached, the cluster will fence all the nodes that
> > haven't
> > yet come up. So on start-up, it makes sense to start corosync on
> > all
> > nodes, which will establish membership and quorum, then start
> > pacemaker
> > on all nodes. Obviously that can't be done within pacemaker so that
> > has
> > to be done manually or by a higher-level tool.
> > 
> > > Shouldn't this feature be discussed in this context as well?
> > > 
> > > [...] 
> > > > > > > it would lock the resource location (unique or clones)
> > > > > > > until
> > > > > > > the
> > > > > > > operator unlock it or the "lock‑location‑timeout" expire.
> > > > > > > No
> > > > > > > matter what
> > > > > > > happen to the resource, maintenance mode or not.
> > > > > > > 
> > > > > > > At a first look, it looks to peer nicely with
> > > > > > > maintenance‑mode
> > > > > > > and avoid resource migration after node reboot.
> > > > 
> > > > Maintenance mode is useful if you're updating the cluster stack
> > > > itself
> > > > -- put in maintenance mode, stop the cluster services (leaving
> > > > the
> > > > managed services still running), update the cluster services,
> > > > start
> > > > the
> > > > cluster services again, take out of maintenance mode.
> > > > 
> > > > This is useful if you're rebooting the node for a kernel update
> > > > (for
> > > > example). Apply the update, reboot the node. The cluster takes
> > > > care
> > > > of
> > > > everything else for you (stop the services before shutting down
> > > > and
> > > > do
> > > > not recover them until the node comes back).
> > > 
> > > I'm a bit lost. If resource doesn't move during maintenance mode,
> > > could you detail a scenario where we should ban it explicitly
> > > from
> > > other node to
> > > secure its current location when getting out of maintenance?
> > > Isn't it
> > 
> > Sorry, I was unclear -- I was contrasting maintenance mode with
> > shutdown locks.
> > 
> > You wouldn't need a ban with maintenance mode. However maintenance
> > mode
> > leaves any active resources running. That means the node shouldn't
> > be
> > rebooted in maintenance mode, because those resources will not be
> > cleanly stopped.
> > 
> > With shutdown locks, the active resources are cleanly stopped. That
> > does require a ban of some sort because otherwise the resources
> > will be
> > recovered on another node.
> > 
> > > excessive
> > > precaution? Is it just to avoid is to move somewhere else when
> > > exiting
> > > maintenance-mode? If the resource has a preferred node, I suppose
> > > the
> > > location
> > > constraint should take care of this, isn't it?
> > 
> > Having a preferred node doesn't prevent the resource from starting
> > elsewhere if the preferred node is down (or in 

Re: [ClusterLabs] Antw: Re: Antw: [EXT] Coming in Pacemaker 2.0.4: shutdown locks

2020-02-27 Thread Andrei Borzenkov
27.02.2020 20:54, Ken Gaillot пишет:
> On Thu, 2020-02-27 at 18:43 +0100, Jehan-Guillaume de Rorthais wrote:
 Speaking about shutdown, what is the status of clean shutdown of
 the
 cluster handled by Pacemaker? Currently, I advice to stop
 resources
 gracefully (eg. using pcs resource disable [...]) before shutting
 down each
 nodes either by hand or using some higher level tool (eg. pcs
 cluster stop
 --all).  
>>>
>>> I'm not sure why that would be necessary. It should be perfectly
>>> fine
>>> to stop pacemaker in any order without disabling resources.
>>
>> Because resources might move around during the shutdown sequence. It
>> might
>> not be desirable as some resource migration can be heavy, long,
>> interfere
>> with shutdown, etc. I'm pretty sure this has been discussed in the
>> past.
> 
> Ah, that makes sense, I hadn't thought about that.

Is not it exactly what shutdown-lock does? It prevents resource
migration when stopping pacemaker so my expectation is that if we stop
pacemaker on all nodes no resource is moved. Or what am I missing?
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Antw: Re: Antw: [EXT] Coming in Pacemaker 2.0.4: shutdown locks

2020-02-27 Thread Strahil Nikolov
On February 27, 2020 7:00:36 PM GMT+02:00, Ken Gaillot  
wrote:
>On Thu, 2020-02-27 at 17:28 +0100, Jehan-Guillaume de Rorthais wrote:
>> On Thu, 27 Feb 2020 09:48:23 -0600
>> Ken Gaillot  wrote:
>> 
>> > On Thu, 2020-02-27 at 15:01 +0100, Jehan-Guillaume de Rorthais
>> > wrote:
>> > > On Thu, 27 Feb 2020 12:24:46 +0100
>> > > "Ulrich Windl"  wrote:
>> > >   
>> > > > > > > Jehan-Guillaume de Rorthais  schrieb am
>> > > > > > > 27.02.2020 um
>> > > > 
>> > > > 11:05 in
>> > > > Nachricht <20200227110502.3624cb87@firost>:
>> > > > 
>> > > > [...]  
>> > > > > What about something like "lock‑location=bool" and
>> > > > 
>> > > > For "lock-location" I would assume the value is a "location". I
>> > > > guess you
>> > > > wanted a "use-lock-location" Boolean value.  
>> > > 
>> > > Mh, maybe "lock-current-location" would better reflect what I
>> > > meant.
>> > > 
>> > > The point is to lock the resource on the node currently running
>> > > it.  
>> > 
>> > Though it only applies for a clean node shutdown, so that has to be
>> > in
>> > the name somewhere. The resource isn't locked during normal cluster
>> > operation (it can move for resource or node failures, load
>> > rebalancing,
>> > etc.).
>> 
>> Well, I was trying to make the new feature a bit wider than just the
>> narrow shutdown feature.
>> 
>> Speaking about shutdown, what is the status of clean shutdown of the
>> cluster
>> handled by Pacemaker? Currently, I advice to stop resources
>> gracefully (eg.
>> using pcs resource disable [...]) before shutting down each nodes
>> either by hand
>> or using some higher level tool (eg. pcs cluster stop --all).
>
>I'm not sure why that would be necessary. It should be perfectly fine
>to stop pacemaker in any order without disabling resources.
>
>Start-up is actually more of an issue ... if you start corosync and
>pacemaker on nodes one by one, and you're not quick enough, then once
>quorum is reached, the cluster will fence all the nodes that haven't
>yet come up. So on start-up, it makes sense to start corosync on all
>nodes, which will establish membership and quorum, then start pacemaker
>on all nodes. Obviously that can't be done within pacemaker so that has
>to be done manually or by a higher-level tool.
>
>> Shouldn't this feature be discussed in this context as well?
>> 
>> [...] 
>> > > > > it would lock the resource location (unique or clones) until
>> > > > > the
>> > > > > operator unlock it or the "lock‑location‑timeout" expire. No
>> > > > > matter what
>> > > > > happen to the resource, maintenance mode or not.
>> > > > > 
>> > > > > At a first look, it looks to peer nicely with
>> > > > > maintenance‑mode
>> > > > > and avoid resource migration after node reboot.
>> > 
>> > Maintenance mode is useful if you're updating the cluster stack
>> > itself
>> > -- put in maintenance mode, stop the cluster services (leaving the
>> > managed services still running), update the cluster services, start
>> > the
>> > cluster services again, take out of maintenance mode.
>> > 
>> > This is useful if you're rebooting the node for a kernel update
>> > (for
>> > example). Apply the update, reboot the node. The cluster takes care
>> > of
>> > everything else for you (stop the services before shutting down and
>> > do
>> > not recover them until the node comes back).
>> 
>> I'm a bit lost. If resource doesn't move during maintenance mode,
>> could you detail a scenario where we should ban it explicitly from
>> other node to
>> secure its current location when getting out of maintenance? Isn't it
>
>Sorry, I was unclear -- I was contrasting maintenance mode with
>shutdown locks.
>
>You wouldn't need a ban with maintenance mode. However maintenance mode
>leaves any active resources running. That means the node shouldn't be
>rebooted in maintenance mode, because those resources will not be
>cleanly stopped.
>
>With shutdown locks, the active resources are cleanly stopped. That
>does require a ban of some sort because otherwise the resources will be
>recovered on another node.
>
>> excessive
>> precaution? Is it just to avoid is to move somewhere else when
>> exiting
>> maintenance-mode? If the resource has a preferred node, I suppose the
>> location
>> constraint should take care of this, isn't it?
>
>Having a preferred node doesn't prevent the resource from starting
>elsewhere if the preferred node is down (or in standby, or otherwise
>ineligible to run the resource). Even a +INFINITY constraint allows
>recovery elsewhere if the node is not available. To keep a resource
>from being recovered, you have to put a ban (-INFINITY location
>constraint) on any nodes that could otherwise run it.
>
>> > > > I wonder: Where is it different from a time-limited "ban"
>> > > > (wording
>> > > > also exists
>> > > > already)? If you ban all resources from running on a specific
>> > > > node,
>> > > > resources
>> > > > would be move away, and when booting the node, resources won't
>> > > > come
>> > > > back.  
>> > 

Re: [ClusterLabs] Antw: Re: Antw: [EXT] Coming in Pacemaker 2.0.4: shutdown locks

2020-02-27 Thread Jehan-Guillaume de Rorthais
On Thu, 27 Feb 2020 11:54:52 -0600
Ken Gaillot  wrote:

> On Thu, 2020-02-27 at 18:43 +0100, Jehan-Guillaume de Rorthais wrote:
> > > > Speaking about shutdown, what is the status of clean shutdown of
> > > > the
> > > > cluster handled by Pacemaker? Currently, I advice to stop
> > > > resources
> > > > gracefully (eg. using pcs resource disable [...]) before shutting
> > > > down each
> > > > nodes either by hand or using some higher level tool (eg. pcs
> > > > cluster stop
> > > > --all).
> > > 
> > > I'm not sure why that would be necessary. It should be perfectly
> > > fine
> > > to stop pacemaker in any order without disabling resources.  
> > 
> > Because resources might move around during the shutdown sequence. It
> > might
> > not be desirable as some resource migration can be heavy, long,
> > interfere
> > with shutdown, etc. I'm pretty sure this has been discussed in the
> > past.  
> 
> Ah, that makes sense, I hadn't thought about that. FYI, there is a
> stop-all-resources cluster property that would let you disable
> everything in one step.

Yes, I discovered it some weeks ago, thanks :)

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Antw: Re: Antw: [EXT] Coming in Pacemaker 2.0.4: shutdown locks

2020-02-27 Thread Ken Gaillot
On Thu, 2020-02-27 at 18:43 +0100, Jehan-Guillaume de Rorthais wrote:
> > > Speaking about shutdown, what is the status of clean shutdown of
> > > the
> > > cluster handled by Pacemaker? Currently, I advice to stop
> > > resources
> > > gracefully (eg. using pcs resource disable [...]) before shutting
> > > down each
> > > nodes either by hand or using some higher level tool (eg. pcs
> > > cluster stop
> > > --all).  
> > 
> > I'm not sure why that would be necessary. It should be perfectly
> > fine
> > to stop pacemaker in any order without disabling resources.
> 
> Because resources might move around during the shutdown sequence. It
> might
> not be desirable as some resource migration can be heavy, long,
> interfere
> with shutdown, etc. I'm pretty sure this has been discussed in the
> past.

Ah, that makes sense, I hadn't thought about that. FYI, there is a
stop-all-resources cluster property that would let you disable
everything in one step.
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Antw: Re: Antw: [EXT] Coming in Pacemaker 2.0.4: shutdown locks

2020-02-27 Thread Jehan-Guillaume de Rorthais
On Thu, 27 Feb 2020 11:00:36 -0600
Ken Gaillot  wrote:

> On Thu, 2020-02-27 at 17:28 +0100, Jehan-Guillaume de Rorthais wrote:
> > On Thu, 27 Feb 2020 09:48:23 -0600
> > Ken Gaillot  wrote:
> >   
> > > On Thu, 2020-02-27 at 15:01 +0100, Jehan-Guillaume de Rorthais
> > > wrote:  
> > > > On Thu, 27 Feb 2020 12:24:46 +0100
> > > > "Ulrich Windl"  wrote:
> > > > 
> > > > > > > > Jehan-Guillaume de Rorthais  schrieb am
> > > > > > > > 27.02.2020 um  
> > > > > 
> > > > > 11:05 in
> > > > > Nachricht <20200227110502.3624cb87@firost>:
> > > > > 
> > > > > [...]
> > > > > > What about something like "lock‑location=bool" and  
> > > > > 
> > > > > For "lock-location" I would assume the value is a "location". I
> > > > > guess you
> > > > > wanted a "use-lock-location" Boolean value.
> > > > 
> > > > Mh, maybe "lock-current-location" would better reflect what I
> > > > meant.
> > > > 
> > > > The point is to lock the resource on the node currently running
> > > > it.
> > > 
> > > Though it only applies for a clean node shutdown, so that has to be
> > > in the name somewhere. The resource isn't locked during normal cluster
> > > operation (it can move for resource or node failures, load
> > > rebalancing,
> > > etc.).  
> > 
> > Well, I was trying to make the new feature a bit wider than just the
> > narrow shutdown feature.
> > 
> > Speaking about shutdown, what is the status of clean shutdown of the
> > cluster handled by Pacemaker? Currently, I advice to stop resources
> > gracefully (eg. using pcs resource disable [...]) before shutting down each
> > nodes either by hand or using some higher level tool (eg. pcs cluster stop
> > --all).  
> 
> I'm not sure why that would be necessary. It should be perfectly fine
> to stop pacemaker in any order without disabling resources.

Because resources might move around during the shutdown sequence. It might
not be desirable as some resource migration can be heavy, long, interfere
with shutdown, etc. I'm pretty sure this has been discussed in the past.

> Start-up is actually more of an issue ... if you start corosync and
> pacemaker on nodes one by one, and you're not quick enough, then once
> quorum is reached, the cluster will fence all the nodes that haven't
> yet come up. So on start-up, it makes sense to start corosync on all
> nodes, which will establish membership and quorum, then start pacemaker
> on all nodes. Obviously that can't be done within pacemaker so that has
> to be done manually or by a higher-level tool.

Indeed.
Or use wait-for-all.

> > Shouldn't this feature be discussed in this context as well?
> > 
> > [...]   
> > > > > > it would lock the resource location (unique or clones) until
> > > > > > the
> > > > > > operator unlock it or the "lock‑location‑timeout" expire. No
> > > > > > matter what
> > > > > > happen to the resource, maintenance mode or not.
> > > > > > 
> > > > > > At a first look, it looks to peer nicely with
> > > > > > maintenance‑mode
> > > > > > and avoid resource migration after node reboot.  
> > > 
> > > Maintenance mode is useful if you're updating the cluster stack
> > > itself
> > > -- put in maintenance mode, stop the cluster services (leaving the
> > > managed services still running), update the cluster services, start
> > > the cluster services again, take out of maintenance mode.
> > > 
> > > This is useful if you're rebooting the node for a kernel update
> > > (for example). Apply the update, reboot the node. The cluster takes care
> > > of everything else for you (stop the services before shutting down and
> > > do not recover them until the node comes back).  
> > 
> > I'm a bit lost. If resource doesn't move during maintenance mode,
> > could you detail a scenario where we should ban it explicitly from
> > other node to secure its current location when getting out of maintenance?
> > Isn't it  
> 
> Sorry, I was unclear -- I was contrasting maintenance mode with
> shutdown locks.
> 
> You wouldn't need a ban with maintenance mode. However maintenance mode
> leaves any active resources running. That means the node shouldn't be
> rebooted in maintenance mode, because those resources will not be
> cleanly stopped.
> 
> With shutdown locks, the active resources are cleanly stopped. That
> does require a ban of some sort because otherwise the resources will be
> recovered on another node.

ok, thanks,

> > excessive precaution? Is it just to avoid is to move somewhere else when
> > exiting maintenance-mode? If the resource has a preferred node, I suppose
> > the location constraint should take care of this, isn't it?  
> 
> Having a preferred node doesn't prevent the resource from starting
> elsewhere if the preferred node is down (or in standby, or otherwise
> ineligible to run the resource). Even a +INFINITY constraint allows
> recovery elsewhere if the node is not available. To keep a resource
> from being recovered, you have to put a ban (-INFINITY location
> constraint) on any nodes 

[ClusterLabs] More summit photos

2020-02-27 Thread Ken Gaillot
Hi all,

The ClusterLabs Summit wiki has been updated with a few more photos.
Enjoy ...

http://plan.alteeve.ca/index.php/HA_Cluster_Summit_2020#Photos
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Antw: Re: Antw: [EXT] Coming in Pacemaker 2.0.4: shutdown locks

2020-02-27 Thread Ken Gaillot
On Thu, 2020-02-27 at 17:28 +0100, Jehan-Guillaume de Rorthais wrote:
> On Thu, 27 Feb 2020 09:48:23 -0600
> Ken Gaillot  wrote:
> 
> > On Thu, 2020-02-27 at 15:01 +0100, Jehan-Guillaume de Rorthais
> > wrote:
> > > On Thu, 27 Feb 2020 12:24:46 +0100
> > > "Ulrich Windl"  wrote:
> > >   
> > > > > > > Jehan-Guillaume de Rorthais  schrieb am
> > > > > > > 27.02.2020 um
> > > > 
> > > > 11:05 in
> > > > Nachricht <20200227110502.3624cb87@firost>:
> > > > 
> > > > [...]  
> > > > > What about something like "lock‑location=bool" and
> > > > 
> > > > For "lock-location" I would assume the value is a "location". I
> > > > guess you
> > > > wanted a "use-lock-location" Boolean value.  
> > > 
> > > Mh, maybe "lock-current-location" would better reflect what I
> > > meant.
> > > 
> > > The point is to lock the resource on the node currently running
> > > it.  
> > 
> > Though it only applies for a clean node shutdown, so that has to be
> > in
> > the name somewhere. The resource isn't locked during normal cluster
> > operation (it can move for resource or node failures, load
> > rebalancing,
> > etc.).
> 
> Well, I was trying to make the new feature a bit wider than just the
> narrow shutdown feature.
> 
> Speaking about shutdown, what is the status of clean shutdown of the
> cluster
> handled by Pacemaker? Currently, I advice to stop resources
> gracefully (eg.
> using pcs resource disable [...]) before shutting down each nodes
> either by hand
> or using some higher level tool (eg. pcs cluster stop --all).

I'm not sure why that would be necessary. It should be perfectly fine
to stop pacemaker in any order without disabling resources.

Start-up is actually more of an issue ... if you start corosync and
pacemaker on nodes one by one, and you're not quick enough, then once
quorum is reached, the cluster will fence all the nodes that haven't
yet come up. So on start-up, it makes sense to start corosync on all
nodes, which will establish membership and quorum, then start pacemaker
on all nodes. Obviously that can't be done within pacemaker so that has
to be done manually or by a higher-level tool.

> Shouldn't this feature be discussed in this context as well?
> 
> [...] 
> > > > > it would lock the resource location (unique or clones) until
> > > > > the
> > > > > operator unlock it or the "lock‑location‑timeout" expire. No
> > > > > matter what
> > > > > happen to the resource, maintenance mode or not.
> > > > > 
> > > > > At a first look, it looks to peer nicely with
> > > > > maintenance‑mode
> > > > > and avoid resource migration after node reboot.
> > 
> > Maintenance mode is useful if you're updating the cluster stack
> > itself
> > -- put in maintenance mode, stop the cluster services (leaving the
> > managed services still running), update the cluster services, start
> > the
> > cluster services again, take out of maintenance mode.
> > 
> > This is useful if you're rebooting the node for a kernel update
> > (for
> > example). Apply the update, reboot the node. The cluster takes care
> > of
> > everything else for you (stop the services before shutting down and
> > do
> > not recover them until the node comes back).
> 
> I'm a bit lost. If resource doesn't move during maintenance mode,
> could you detail a scenario where we should ban it explicitly from
> other node to
> secure its current location when getting out of maintenance? Isn't it

Sorry, I was unclear -- I was contrasting maintenance mode with
shutdown locks.

You wouldn't need a ban with maintenance mode. However maintenance mode
leaves any active resources running. That means the node shouldn't be
rebooted in maintenance mode, because those resources will not be
cleanly stopped.

With shutdown locks, the active resources are cleanly stopped. That
does require a ban of some sort because otherwise the resources will be
recovered on another node.

> excessive
> precaution? Is it just to avoid is to move somewhere else when
> exiting
> maintenance-mode? If the resource has a preferred node, I suppose the
> location
> constraint should take care of this, isn't it?

Having a preferred node doesn't prevent the resource from starting
elsewhere if the preferred node is down (or in standby, or otherwise
ineligible to run the resource). Even a +INFINITY constraint allows
recovery elsewhere if the node is not available. To keep a resource
from being recovered, you have to put a ban (-INFINITY location
constraint) on any nodes that could otherwise run it.

> > > > I wonder: Where is it different from a time-limited "ban"
> > > > (wording
> > > > also exists
> > > > already)? If you ban all resources from running on a specific
> > > > node,
> > > > resources
> > > > would be move away, and when booting the node, resources won't
> > > > come
> > > > back.  
> > 
> > It actually is equivalent to this process:
> > 
> > 1. Determine what resources are active on the node about to be shut
> > down.
> > 2. For each of those resources, configure a ban 

Re: [ClusterLabs] Antw: Re: Antw: [EXT] Coming in Pacemaker 2.0.4: shutdown locks

2020-02-27 Thread Jehan-Guillaume de Rorthais
On Thu, 27 Feb 2020 09:48:23 -0600
Ken Gaillot  wrote:

> On Thu, 2020-02-27 at 15:01 +0100, Jehan-Guillaume de Rorthais wrote:
> > On Thu, 27 Feb 2020 12:24:46 +0100
> > "Ulrich Windl"  wrote:
> >   
> > > > > > Jehan-Guillaume de Rorthais  schrieb am
> > > > > > 27.02.2020 um
> > > 
> > > 11:05 in
> > > Nachricht <20200227110502.3624cb87@firost>:
> > > 
> > > [...]  
> > > > What about something like "lock‑location=bool" and
> > > 
> > > For "lock-location" I would assume the value is a "location". I
> > > guess you
> > > wanted a "use-lock-location" Boolean value.  
> > 
> > Mh, maybe "lock-current-location" would better reflect what I meant.
> > 
> > The point is to lock the resource on the node currently running it.  
> 
> Though it only applies for a clean node shutdown, so that has to be in
> the name somewhere. The resource isn't locked during normal cluster
> operation (it can move for resource or node failures, load rebalancing,
> etc.).

Well, I was trying to make the new feature a bit wider than just the
narrow shutdown feature.

Speaking about shutdown, what is the status of clean shutdown of the cluster
handled by Pacemaker? Currently, I advice to stop resources gracefully (eg.
using pcs resource disable [...]) before shutting down each nodes either by hand
or using some higher level tool (eg. pcs cluster stop --all).

Shouldn't this feature be discussed in this context as well?

[...] 
> > > > it would lock the resource location (unique or clones) until the
> > > > operator unlock it or the "lock‑location‑timeout" expire. No matter what
> > > > happen to the resource, maintenance mode or not.
> > > > 
> > > > At a first look, it looks to peer nicely with maintenance‑mode
> > > > and avoid resource migration after node reboot.
> 
> Maintenance mode is useful if you're updating the cluster stack itself
> -- put in maintenance mode, stop the cluster services (leaving the
> managed services still running), update the cluster services, start the
> cluster services again, take out of maintenance mode.
> 
> This is useful if you're rebooting the node for a kernel update (for
> example). Apply the update, reboot the node. The cluster takes care of
> everything else for you (stop the services before shutting down and do
> not recover them until the node comes back).

I'm a bit lost. If resource doesn't move during maintenance mode,
could you detail a scenario where we should ban it explicitly from other node to
secure its current location when getting out of maintenance? Isn't it excessive
precaution? Is it just to avoid is to move somewhere else when exiting
maintenance-mode? If the resource has a preferred node, I suppose the location
constraint should take care of this, isn't it?

> > > I wonder: Where is it different from a time-limited "ban" (wording
> > > also exists
> > > already)? If you ban all resources from running on a specific node,
> > > resources
> > > would be move away, and when booting the node, resources won't come
> > > back.  
> 
> It actually is equivalent to this process:
> 
> 1. Determine what resources are active on the node about to be shut
> down.
> 2. For each of those resources, configure a ban (location constraint
> with -INFINITY score) using a rule where node name is not the node
> being shut down.
> 3. Apply the updates and reboot the node. The cluster will stop the
> resources (due to shutdown) and not start them anywhere else (due to
> the bans).

In maintenance mode, this would not move either.

> 4. Wait for the node to rejoin and the resources to start on it again,
> then remove all the bans.
> 
> The advantage is automation, and in particular the sysadmin applying
> the updates doesn't need to even know that the host is part of a
> cluster.

Could you elaborate? I suppose the operator still need to issue a command to
set the shutdown‑lock before reboot, isn't it?

Moreover, if shutdown‑lock is just a matter of setting ±infinity constraint on
nodes, maybe a higher level tool can take care of this?

> > This is the standby mode.  
> 
> Standby mode will stop all resources on a node, but it doesn't prevent
> recovery elsewhere.

Yes, I was just commenting on Ulrich's description (history context crop'ed
here).
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

[ClusterLabs] *** Correction *** clusterlabs.org/corosync.org/kronosnet.org planned outage this Saturday 2020-02-29

2020-02-27 Thread Ken Gaillot
We've rescheduled the window for the OS upgrade to this Saturday, Feb.
29, 2020, from roughly 09:00 UTC to 18:00 UTC.

This will result in outages of the clusterlabs.org website, bugzilla,
and wiki. The mailing lists will also be unavailable, but mail gateways
will generally retry sent messages so there shouldn't be any missed messages.

This server also hosts some corosync.org and kronosnet.org services,
which will experience outages as well.

And no the server isn't HA. :) That would be nice but even in that case
there would be some downtime for a major OS upgrade since database
tables etc. will need upgrading.
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Antw: Re: Antw: [EXT] Coming in Pacemaker 2.0.4: shutdown locks

2020-02-27 Thread Ken Gaillot
On Thu, 2020-02-27 at 08:12 +0100, Ulrich Windl wrote:
> > > > Ken Gaillot  schrieb am 26.02.2020 um
> > > > 16:41 in Nachricht
> 
> <2257e2a1e5fd88ae2b915b8241a8e8c9e150b95b.ca...@redhat.com>:
> 
> [...]
> > I considered a per-resource and/or per-node setting, but the target
> > audience is someone who wants things as simple as possible. A per-
> > node
> 
> Actually, while it may seem simple, it adds quite a lot of additional
> complexity, and I'm still not convinced that this is really needed.
> 
> [...]
> 
> Regards,
> Ulrich

I think that was the reaction of just about everyone (including myself)
the first time they heard about it :)

The main justification is that other HA software offers the capability,
so this removes an obstacle to those users switching to pacemaker.

However the fact that it's a blocking point for users who might
otherwise switch points out that it does have real-world value.

It might be a narrow use case, but it's one that involves scale, which
is something we're always striving to better support. If an
organization has hundreds or thousands of clusters, yet those still are
just a small fraction of the total servers being administered at the
organization, expertise becomes a major limiting factor. In such a case
you don't want to waste your cluster admins' time on late-night routine
OS updates.
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Antw: Re: Antw: [EXT] Coming in Pacemaker 2.0.4: shutdown locks

2020-02-27 Thread Ken Gaillot
On Thu, 2020-02-27 at 15:01 +0100, Jehan-Guillaume de Rorthais wrote:
> On Thu, 27 Feb 2020 12:24:46 +0100
> "Ulrich Windl"  wrote:
> 
> > > > > Jehan-Guillaume de Rorthais  schrieb am
> > > > > 27.02.2020 um  
> > 
> > 11:05 in
> > Nachricht <20200227110502.3624cb87@firost>:
> > 
> > [...]
> > > What about something like "lock‑location=bool" and  
> > 
> > For "lock-location" I would assume the value is a "location". I
> > guess you
> > wanted a "use-lock-location" Boolean value.
> 
> Mh, maybe "lock-current-location" would better reflect what I meant.
> 
> The point is to lock the resource on the node currently running it.

Though it only applies for a clean node shutdown, so that has to be in
the name somewhere. The resource isn't locked during normal cluster
operation (it can move for resource or node failures, load rebalancing,
etc.).

> > > "lock‑location‑timeout=duration" (for those who like automatic
> > > steps)? I 
> > > imagine  
> > 
> > I'm still unhappy with "lock-location": What is a "location", and
> > what does it
> > mean to be "locked"?
> > Is that fundamentally different from "freeze/frozen" or "ignore"
> > (all those
> > phrases exist already)?
> 
> A "location" define where a resource is located in the cluster, on
> what node.
> Eg., a location constraint express where a ressource //can// run:
> 
>   «Location constraints tell the cluster which nodes a resource can
> run on. »
>   
> https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/2.0/html/Pacemaker_Explained/_deciding_which_nodes_a_resource_can_run_on.html
> 
> Here, "constraints" applies to a location. So, if you remove this
> constraint,
> the natural definition location would be:
> 
>   «Location tell the cluster what node a resource is running on.»
> 
> > > it would lock the resource location (unique or clones) until the
> > > operator
> > > unlock it or the "lock‑location‑timeout" expire. No matter what
> > > happen to  
> > > the resource, maintenance mode or not.
> > > 
> > > At a first look, it looks to peer nicely with maintenance‑mode
> > > and avoid
> > > resource migration after node reboot.  

Maintenance mode is useful if you're updating the cluster stack itself
-- put in maintenance mode, stop the cluster services (leaving the
managed services still running), update the cluster services, start the
cluster services again, take out of maintenance mode.

This is useful if you're rebooting the node for a kernel update (for
example). Apply the update, reboot the node. The cluster takes care of
everything else for you (stop the services before shutting down and do
not recover them until the node comes back).

> > I wonder: Where is it different from a time-limited "ban" (wording
> > also exists
> > already)? If you ban all resources from running on a specific node,
> > resources
> > would be move away, and when booting the node, resources won't come
> > back.

It actually is equivalent to this process:

1. Determine what resources are active on the node about to be shut
down.
2. For each of those resources, configure a ban (location constraint
with -INFINITY score) using a rule where node name is not the node
being shut down.
3. Apply the updates and reboot the node. The cluster will stop the
resources (due to shutdown) and not start them anywhere else (due to
the bans).
4. Wait for the node to rejoin and the resources to start on it again,
then remove all the bans.

The advantage is automation, and in particular the sysadmin applying
the updates doesn't need to even know that the host is part of a
cluster.

> This is the standby mode.

Standby mode will stop all resources on a node, but it doesn't prevent
recovery elsewhere.

> Moreover, note that Ken explicitly wrote: «The cluster runs services
> that have
> a preferred node». So if the resource moved elsewhere, the resource
> **must**
> come back.

Right, the point of preventing recovery elsewhere is to avoid the extra
outage:

Without shutdown lock:
1. When node is stopped, resource stops on that node, and starts on
another node. (First outage)
2. When node rejoins, resource stops on the alternate node, and starts
on original node. (Second outage)

With shutdown lock, there's one outage when the node is rebooted, but
then it starts on the same node so there is no second outage. If the
resource start time is much longer (e.g. a half hour for an extremely
large database) than the reboot time (a couple of minutes), the feature
becomes worthwhile.

> > But you want the resources to be down while the node boots, right?
> > How can
> > that concept be "married with" the concept of high availablility?
> 
> The point here is to avoid moving resources during planed
> maintenance/downtime
> as it would require longer maintenance duration (thus longer
> downtime) than a
> simple reboot with no resource migration.
> 
> Even a resource in HA can have planed maintenance :)

Right. I jokingly call this feature "medium availability" but really it
is just another way to set a 

Re: [ClusterLabs] Antw: Re: Antw: [EXT] Coming in Pacemaker 2.0.4: shutdown locks

2020-02-27 Thread Jehan-Guillaume de Rorthais
On Thu, 27 Feb 2020 12:24:46 +0100
"Ulrich Windl"  wrote:

> >>> Jehan-Guillaume de Rorthais  schrieb am 27.02.2020 um  
> 11:05 in
> Nachricht <20200227110502.3624cb87@firost>:
> 
> [...]
> > What about something like "lock‑location=bool" and  
> 
> For "lock-location" I would assume the value is a "location". I guess you
> wanted a "use-lock-location" Boolean value.

Mh, maybe "lock-current-location" would better reflect what I meant.

The point is to lock the resource on the node currently running it.

> > "lock‑location‑timeout=duration" (for those who like automatic steps)? I 
> > imagine  
> 
> I'm still unhappy with "lock-location": What is a "location", and what does it
> mean to be "locked"?
> Is that fundamentally different from "freeze/frozen" or "ignore" (all those
> phrases exist already)?

A "location" define where a resource is located in the cluster, on what node.
Eg., a location constraint express where a ressource //can// run:

  «Location constraints tell the cluster which nodes a resource can run on. »
  
https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/2.0/html/Pacemaker_Explained/_deciding_which_nodes_a_resource_can_run_on.html

Here, "constraints" applies to a location. So, if you remove this constraint,
the natural definition location would be:

  «Location tell the cluster what node a resource is running on.»

> > it would lock the resource location (unique or clones) until the operator
> > unlock it or the "lock‑location‑timeout" expire. No matter what happen to  
> > the resource, maintenance mode or not.
> > 
> > At a first look, it looks to peer nicely with maintenance‑mode and avoid
> > resource migration after node reboot.  
> 
> I wonder: Where is it different from a time-limited "ban" (wording also exists
> already)? If you ban all resources from running on a specific node, resources
> would be move away, and when booting the node, resources won't come back.

This is the standby mode.

Moreover, note that Ken explicitly wrote: «The cluster runs services that have
a preferred node». So if the resource moved elsewhere, the resource **must**
come back.

> But you want the resources to be down while the node boots, right? How can
> that concept be "married with" the concept of high availablility?

The point here is to avoid moving resources during planed maintenance/downtime
as it would require longer maintenance duration (thus longer downtime) than a
simple reboot with no resource migration.

Even a resource in HA can have planed maintenance :)

> "We have a HA cluster and HA resources, but when we boot a node those
> HA-resources will be down while the node boots." How is that different from
> not having a HA cluster, or taking those resources temporarily away from the
> HA cluster? (That was my intitial objection: Why not simply ignore resource
> failures for some time?)

Unless I'm wrong, maintenance mode does not secure the current location of
resources after reboots.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

[ClusterLabs] Antw: Re: Antw: [EXT] Coming in Pacemaker 2.0.4: shutdown locks

2020-02-27 Thread Ulrich Windl
>>> Jehan-Guillaume de Rorthais  schrieb am 27.02.2020 um
11:05 in
Nachricht <20200227110502.3624cb87@firost>:

[...]
> What about something like "lock‑location=bool" and

For "lock-location" I would assume the value is a "location". I guess you
wanted a "use-lock-location" Boolean value.

> "lock‑location‑timeout=duration" (for those who like automatic steps)? I 
> imagine

I'm still unhappy with "lock-location": What is a "location", and what does it
mean to be "locked"?
Is that fundamentally different from "freeze/frozen" or "ignore" (all those
phrases exist already)?

> it would lock the resource location (unique or clones) until the operator
> unlock it or the "lock‑location‑timeout" expire. No matter what happen to
the
> resource, maintenance mode or not.
> 
> At a first look, it looks to peer nicely with maintenance‑mode and avoid
> resource migration after node reboot.

I wonder: Where is it different from a time-limited "ban" (wording also exists
already)? If you ban all resources from running on a specific node, resources
would be move away, and when booting the node, resources won't come back.
But you want the resources to be down while the node boots, right? How can
that concept be "married with" the concept of high availablility? "We have a HA
cluster and HA resources, but when we boot a node those HA-resources will be
down while the node boots." How is that different from not having a HA cluster,
or taking those resources temporarily away from the HA cluster?
(That was my intitial objection: Why not simply ignore resource failures for
some time?)

Regards,
Ulrich

> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/ 



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Antw: [EXT] Coming in Pacemaker 2.0.4: shutdown locks

2020-02-27 Thread Jehan-Guillaume de Rorthais
On Wed, 26 Feb 2020 19:11:36 +0100
wf...@niif.hu wrote:

> Ken Gaillot  writes:
> 
> > I think a per-resource option would have more potential to be
> > confusing than helpful. But, it should be relatively simple to extend
> > this as a per-resource option, with the global option as a
> > backward-compatible default, if the demand arises.  
> 
> And then you could immediately replace the global option with an
> rsc-default.  But that's one more transition (not in the PE sense).
> It indeed looks like this is more a resource option than a global
> one, but the default mechanism provides an easy way to set it
> globally for those who prefer that.  Unless somebody wants to
> default it to twice (or so) the resource start timeout instead...

Well, for what it worth, I agree this looks like a per-resource option.

Moreover, this feature seems too focused on one specific use-case, making it
narrow and confusing to make it as automatic as possible. I like the idea of
mirrored actions, enabling AND disabling something few steps later in the same
procedure. It makes it cleaner to understand.

What about something like "lock-location=bool" and
"lock-location-timeout=duration" (for those who like automatic steps)? I imagine
it would lock the resource location (unique or clones) until the operator
unlock it or the "lock-location-timeout" expire. No matter what happen to the
resource, maintenance mode or not.

At a first look, it looks to peer nicely with maintenance-mode and avoid
resource migration after node reboot.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/