Re: [ClusterLabs] Antw: Re: Antw: [EXT] Coming in Pacemaker 2.0.4: shutdown locks

2020-02-27 Thread Andrei Borzenkov
28.02.2020 01:55, Ken Gaillot пишет:
> On Thu, 2020-02-27 at 22:39 +0300, Andrei Borzenkov wrote:
>> 27.02.2020 20:54, Ken Gaillot пишет:
>>> On Thu, 2020-02-27 at 18:43 +0100, Jehan-Guillaume de Rorthais
>>> wrote:
>> Speaking about shutdown, what is the status of clean shutdown
>> of
>> the
>> cluster handled by Pacemaker? Currently, I advice to stop
>> resources
>> gracefully (eg. using pcs resource disable [...]) before
>> shutting
>> down each
>> nodes either by hand or using some higher level tool (eg. pcs
>> cluster stop
>> --all).  
>
> I'm not sure why that would be necessary. It should be
> perfectly
> fine
> to stop pacemaker in any order without disabling resources.

 Because resources might move around during the shutdown sequence.
 It
 might
 not be desirable as some resource migration can be heavy, long,
 interfere
 with shutdown, etc. I'm pretty sure this has been discussed in
 the
 past.
>>>
>>> Ah, that makes sense, I hadn't thought about that.
>>
>> Is not it exactly what shutdown-lock does? It prevents resource
>> migration when stopping pacemaker so my expectation is that if we
>> stop
>> pacemaker on all nodes no resource is moved. Or what am I missing?
> 
> shutdown-lock would indeed handle this, if you want the behavior
> whenever any node is shut down. However for this purpose, I could see
> some users wanting the behavior when shutting down all nodes, but not
> when shutting down just one node.
> 

Well, this requires pacemaker supporting notion of "cluster wide
shutdown" in the first place.

> BTW if all nodes shut down, any shutdown locks are cleared.
> Practically, this is because they are stored in the CIB status section,
> which goes away with the cluster. Logically, I could see arguments for
> and against, but this makes sense.
> 

This actually allows poor man implementation of cluster wide shutdown by
setting lock immediately before stopping nodes; it could probably even
be integrated directly into "pcs cluster stop --all". I wish crmsh
offered something similar too.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Antw: Re: Antw: [EXT] Coming in Pacemaker 2.0.4: shutdown locks

2020-02-27 Thread Ken Gaillot
On Thu, 2020-02-27 at 22:39 +0300, Andrei Borzenkov wrote:
> 27.02.2020 20:54, Ken Gaillot пишет:
> > On Thu, 2020-02-27 at 18:43 +0100, Jehan-Guillaume de Rorthais
> > wrote:
> > > > > Speaking about shutdown, what is the status of clean shutdown
> > > > > of
> > > > > the
> > > > > cluster handled by Pacemaker? Currently, I advice to stop
> > > > > resources
> > > > > gracefully (eg. using pcs resource disable [...]) before
> > > > > shutting
> > > > > down each
> > > > > nodes either by hand or using some higher level tool (eg. pcs
> > > > > cluster stop
> > > > > --all).  
> > > > 
> > > > I'm not sure why that would be necessary. It should be
> > > > perfectly
> > > > fine
> > > > to stop pacemaker in any order without disabling resources.
> > > 
> > > Because resources might move around during the shutdown sequence.
> > > It
> > > might
> > > not be desirable as some resource migration can be heavy, long,
> > > interfere
> > > with shutdown, etc. I'm pretty sure this has been discussed in
> > > the
> > > past.
> > 
> > Ah, that makes sense, I hadn't thought about that.
> 
> Is not it exactly what shutdown-lock does? It prevents resource
> migration when stopping pacemaker so my expectation is that if we
> stop
> pacemaker on all nodes no resource is moved. Or what am I missing?

shutdown-lock would indeed handle this, if you want the behavior
whenever any node is shut down. However for this purpose, I could see
some users wanting the behavior when shutting down all nodes, but not
when shutting down just one node.

BTW if all nodes shut down, any shutdown locks are cleared.
Practically, this is because they are stored in the CIB status section,
which goes away with the cluster. Logically, I could see arguments for
and against, but this makes sense.
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Antw: Re: Antw: [EXT] Coming in Pacemaker 2.0.4: shutdown locks

2020-02-27 Thread Ken Gaillot
On Thu, 2020-02-27 at 20:42 +0200, Strahil Nikolov wrote:
> On February 27, 2020 7:00:36 PM GMT+02:00, Ken Gaillot <
> kgail...@redhat.com> wrote:
> > On Thu, 2020-02-27 at 17:28 +0100, Jehan-Guillaume de Rorthais
> > wrote:
> > > On Thu, 27 Feb 2020 09:48:23 -0600
> > > Ken Gaillot  wrote:
> > > 
> > > > On Thu, 2020-02-27 at 15:01 +0100, Jehan-Guillaume de Rorthais
> > > > wrote:
> > > > > On Thu, 27 Feb 2020 12:24:46 +0100
> > > > > "Ulrich Windl"  wrote:
> > > > >   
> > > > > > > > > Jehan-Guillaume de Rorthais  schrieb
> > > > > > > > > am
> > > > > > > > > 27.02.2020 um
> > > > > > 
> > > > > > 11:05 in
> > > > > > Nachricht <20200227110502.3624cb87@firost>:
> > > > > > 
> > > > > > [...]  
> > > > > > > What about something like "lock‑location=bool" and
> > > > > > 
> > > > > > For "lock-location" I would assume the value is a
> > > > > > "location". I
> > > > > > guess you
> > > > > > wanted a "use-lock-location" Boolean value.  
> > > > > 
> > > > > Mh, maybe "lock-current-location" would better reflect what I
> > > > > meant.
> > > > > 
> > > > > The point is to lock the resource on the node currently
> > > > > running
> > > > > it.  
> > > > 
> > > > Though it only applies for a clean node shutdown, so that has
> > > > to be
> > > > in
> > > > the name somewhere. The resource isn't locked during normal
> > > > cluster
> > > > operation (it can move for resource or node failures, load
> > > > rebalancing,
> > > > etc.).
> > > 
> > > Well, I was trying to make the new feature a bit wider than just
> > > the
> > > narrow shutdown feature.
> > > 
> > > Speaking about shutdown, what is the status of clean shutdown of
> > > the
> > > cluster
> > > handled by Pacemaker? Currently, I advice to stop resources
> > > gracefully (eg.
> > > using pcs resource disable [...]) before shutting down each nodes
> > > either by hand
> > > or using some higher level tool (eg. pcs cluster stop --all).
> > 
> > I'm not sure why that would be necessary. It should be perfectly
> > fine
> > to stop pacemaker in any order without disabling resources.
> > 
> > Start-up is actually more of an issue ... if you start corosync and
> > pacemaker on nodes one by one, and you're not quick enough, then
> > once
> > quorum is reached, the cluster will fence all the nodes that
> > haven't
> > yet come up. So on start-up, it makes sense to start corosync on
> > all
> > nodes, which will establish membership and quorum, then start
> > pacemaker
> > on all nodes. Obviously that can't be done within pacemaker so that
> > has
> > to be done manually or by a higher-level tool.
> > 
> > > Shouldn't this feature be discussed in this context as well?
> > > 
> > > [...] 
> > > > > > > it would lock the resource location (unique or clones)
> > > > > > > until
> > > > > > > the
> > > > > > > operator unlock it or the "lock‑location‑timeout" expire.
> > > > > > > No
> > > > > > > matter what
> > > > > > > happen to the resource, maintenance mode or not.
> > > > > > > 
> > > > > > > At a first look, it looks to peer nicely with
> > > > > > > maintenance‑mode
> > > > > > > and avoid resource migration after node reboot.
> > > > 
> > > > Maintenance mode is useful if you're updating the cluster stack
> > > > itself
> > > > -- put in maintenance mode, stop the cluster services (leaving
> > > > the
> > > > managed services still running), update the cluster services,
> > > > start
> > > > the
> > > > cluster services again, take out of maintenance mode.
> > > > 
> > > > This is useful if you're rebooting the node for a kernel update
> > > > (for
> > > > example). Apply the update, reboot the node. The cluster takes
> > > > care
> > > > of
> > > > everything else for you (stop the services before shutting down
> > > > and
> > > > do
> > > > not recover them until the node comes back).
> > > 
> > > I'm a bit lost. If resource doesn't move during maintenance mode,
> > > could you detail a scenario where we should ban it explicitly
> > > from
> > > other node to
> > > secure its current location when getting out of maintenance?
> > > Isn't it
> > 
> > Sorry, I was unclear -- I was contrasting maintenance mode with
> > shutdown locks.
> > 
> > You wouldn't need a ban with maintenance mode. However maintenance
> > mode
> > leaves any active resources running. That means the node shouldn't
> > be
> > rebooted in maintenance mode, because those resources will not be
> > cleanly stopped.
> > 
> > With shutdown locks, the active resources are cleanly stopped. That
> > does require a ban of some sort because otherwise the resources
> > will be
> > recovered on another node.
> > 
> > > excessive
> > > precaution? Is it just to avoid is to move somewhere else when
> > > exiting
> > > maintenance-mode? If the resource has a preferred node, I suppose
> > > the
> > > location
> > > constraint should take care of this, isn't it?
> > 
> > Having a preferred node doesn't prevent the resource from starting
> > elsewhere if the preferred node is down (or in 

Re: [ClusterLabs] Antw: Re: Antw: [EXT] Coming in Pacemaker 2.0.4: shutdown locks

2020-02-27 Thread Andrei Borzenkov
27.02.2020 20:54, Ken Gaillot пишет:
> On Thu, 2020-02-27 at 18:43 +0100, Jehan-Guillaume de Rorthais wrote:
 Speaking about shutdown, what is the status of clean shutdown of
 the
 cluster handled by Pacemaker? Currently, I advice to stop
 resources
 gracefully (eg. using pcs resource disable [...]) before shutting
 down each
 nodes either by hand or using some higher level tool (eg. pcs
 cluster stop
 --all).  
>>>
>>> I'm not sure why that would be necessary. It should be perfectly
>>> fine
>>> to stop pacemaker in any order without disabling resources.
>>
>> Because resources might move around during the shutdown sequence. It
>> might
>> not be desirable as some resource migration can be heavy, long,
>> interfere
>> with shutdown, etc. I'm pretty sure this has been discussed in the
>> past.
> 
> Ah, that makes sense, I hadn't thought about that.

Is not it exactly what shutdown-lock does? It prevents resource
migration when stopping pacemaker so my expectation is that if we stop
pacemaker on all nodes no resource is moved. Or what am I missing?
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Antw: Re: Antw: [EXT] Coming in Pacemaker 2.0.4: shutdown locks

2020-02-27 Thread Strahil Nikolov
On February 27, 2020 7:00:36 PM GMT+02:00, Ken Gaillot  
wrote:
>On Thu, 2020-02-27 at 17:28 +0100, Jehan-Guillaume de Rorthais wrote:
>> On Thu, 27 Feb 2020 09:48:23 -0600
>> Ken Gaillot  wrote:
>> 
>> > On Thu, 2020-02-27 at 15:01 +0100, Jehan-Guillaume de Rorthais
>> > wrote:
>> > > On Thu, 27 Feb 2020 12:24:46 +0100
>> > > "Ulrich Windl"  wrote:
>> > >   
>> > > > > > > Jehan-Guillaume de Rorthais  schrieb am
>> > > > > > > 27.02.2020 um
>> > > > 
>> > > > 11:05 in
>> > > > Nachricht <20200227110502.3624cb87@firost>:
>> > > > 
>> > > > [...]  
>> > > > > What about something like "lock‑location=bool" and
>> > > > 
>> > > > For "lock-location" I would assume the value is a "location". I
>> > > > guess you
>> > > > wanted a "use-lock-location" Boolean value.  
>> > > 
>> > > Mh, maybe "lock-current-location" would better reflect what I
>> > > meant.
>> > > 
>> > > The point is to lock the resource on the node currently running
>> > > it.  
>> > 
>> > Though it only applies for a clean node shutdown, so that has to be
>> > in
>> > the name somewhere. The resource isn't locked during normal cluster
>> > operation (it can move for resource or node failures, load
>> > rebalancing,
>> > etc.).
>> 
>> Well, I was trying to make the new feature a bit wider than just the
>> narrow shutdown feature.
>> 
>> Speaking about shutdown, what is the status of clean shutdown of the
>> cluster
>> handled by Pacemaker? Currently, I advice to stop resources
>> gracefully (eg.
>> using pcs resource disable [...]) before shutting down each nodes
>> either by hand
>> or using some higher level tool (eg. pcs cluster stop --all).
>
>I'm not sure why that would be necessary. It should be perfectly fine
>to stop pacemaker in any order without disabling resources.
>
>Start-up is actually more of an issue ... if you start corosync and
>pacemaker on nodes one by one, and you're not quick enough, then once
>quorum is reached, the cluster will fence all the nodes that haven't
>yet come up. So on start-up, it makes sense to start corosync on all
>nodes, which will establish membership and quorum, then start pacemaker
>on all nodes. Obviously that can't be done within pacemaker so that has
>to be done manually or by a higher-level tool.
>
>> Shouldn't this feature be discussed in this context as well?
>> 
>> [...] 
>> > > > > it would lock the resource location (unique or clones) until
>> > > > > the
>> > > > > operator unlock it or the "lock‑location‑timeout" expire. No
>> > > > > matter what
>> > > > > happen to the resource, maintenance mode or not.
>> > > > > 
>> > > > > At a first look, it looks to peer nicely with
>> > > > > maintenance‑mode
>> > > > > and avoid resource migration after node reboot.
>> > 
>> > Maintenance mode is useful if you're updating the cluster stack
>> > itself
>> > -- put in maintenance mode, stop the cluster services (leaving the
>> > managed services still running), update the cluster services, start
>> > the
>> > cluster services again, take out of maintenance mode.
>> > 
>> > This is useful if you're rebooting the node for a kernel update
>> > (for
>> > example). Apply the update, reboot the node. The cluster takes care
>> > of
>> > everything else for you (stop the services before shutting down and
>> > do
>> > not recover them until the node comes back).
>> 
>> I'm a bit lost. If resource doesn't move during maintenance mode,
>> could you detail a scenario where we should ban it explicitly from
>> other node to
>> secure its current location when getting out of maintenance? Isn't it
>
>Sorry, I was unclear -- I was contrasting maintenance mode with
>shutdown locks.
>
>You wouldn't need a ban with maintenance mode. However maintenance mode
>leaves any active resources running. That means the node shouldn't be
>rebooted in maintenance mode, because those resources will not be
>cleanly stopped.
>
>With shutdown locks, the active resources are cleanly stopped. That
>does require a ban of some sort because otherwise the resources will be
>recovered on another node.
>
>> excessive
>> precaution? Is it just to avoid is to move somewhere else when
>> exiting
>> maintenance-mode? If the resource has a preferred node, I suppose the
>> location
>> constraint should take care of this, isn't it?
>
>Having a preferred node doesn't prevent the resource from starting
>elsewhere if the preferred node is down (or in standby, or otherwise
>ineligible to run the resource). Even a +INFINITY constraint allows
>recovery elsewhere if the node is not available. To keep a resource
>from being recovered, you have to put a ban (-INFINITY location
>constraint) on any nodes that could otherwise run it.
>
>> > > > I wonder: Where is it different from a time-limited "ban"
>> > > > (wording
>> > > > also exists
>> > > > already)? If you ban all resources from running on a specific
>> > > > node,
>> > > > resources
>> > > > would be move away, and when booting the node, resources won't
>> > > > come
>> > > > back.  
>> > 

Re: [ClusterLabs] Antw: Re: Antw: [EXT] Coming in Pacemaker 2.0.4: shutdown locks

2020-02-27 Thread Jehan-Guillaume de Rorthais
On Thu, 27 Feb 2020 11:54:52 -0600
Ken Gaillot  wrote:

> On Thu, 2020-02-27 at 18:43 +0100, Jehan-Guillaume de Rorthais wrote:
> > > > Speaking about shutdown, what is the status of clean shutdown of
> > > > the
> > > > cluster handled by Pacemaker? Currently, I advice to stop
> > > > resources
> > > > gracefully (eg. using pcs resource disable [...]) before shutting
> > > > down each
> > > > nodes either by hand or using some higher level tool (eg. pcs
> > > > cluster stop
> > > > --all).
> > > 
> > > I'm not sure why that would be necessary. It should be perfectly
> > > fine
> > > to stop pacemaker in any order without disabling resources.  
> > 
> > Because resources might move around during the shutdown sequence. It
> > might
> > not be desirable as some resource migration can be heavy, long,
> > interfere
> > with shutdown, etc. I'm pretty sure this has been discussed in the
> > past.  
> 
> Ah, that makes sense, I hadn't thought about that. FYI, there is a
> stop-all-resources cluster property that would let you disable
> everything in one step.

Yes, I discovered it some weeks ago, thanks :)

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Antw: Re: Antw: [EXT] Coming in Pacemaker 2.0.4: shutdown locks

2020-02-27 Thread Ken Gaillot
On Thu, 2020-02-27 at 18:43 +0100, Jehan-Guillaume de Rorthais wrote:
> > > Speaking about shutdown, what is the status of clean shutdown of
> > > the
> > > cluster handled by Pacemaker? Currently, I advice to stop
> > > resources
> > > gracefully (eg. using pcs resource disable [...]) before shutting
> > > down each
> > > nodes either by hand or using some higher level tool (eg. pcs
> > > cluster stop
> > > --all).  
> > 
> > I'm not sure why that would be necessary. It should be perfectly
> > fine
> > to stop pacemaker in any order without disabling resources.
> 
> Because resources might move around during the shutdown sequence. It
> might
> not be desirable as some resource migration can be heavy, long,
> interfere
> with shutdown, etc. I'm pretty sure this has been discussed in the
> past.

Ah, that makes sense, I hadn't thought about that. FYI, there is a
stop-all-resources cluster property that would let you disable
everything in one step.
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Antw: Re: Antw: [EXT] Coming in Pacemaker 2.0.4: shutdown locks

2020-02-27 Thread Jehan-Guillaume de Rorthais
On Thu, 27 Feb 2020 11:00:36 -0600
Ken Gaillot  wrote:

> On Thu, 2020-02-27 at 17:28 +0100, Jehan-Guillaume de Rorthais wrote:
> > On Thu, 27 Feb 2020 09:48:23 -0600
> > Ken Gaillot  wrote:
> >   
> > > On Thu, 2020-02-27 at 15:01 +0100, Jehan-Guillaume de Rorthais
> > > wrote:  
> > > > On Thu, 27 Feb 2020 12:24:46 +0100
> > > > "Ulrich Windl"  wrote:
> > > > 
> > > > > > > > Jehan-Guillaume de Rorthais  schrieb am
> > > > > > > > 27.02.2020 um  
> > > > > 
> > > > > 11:05 in
> > > > > Nachricht <20200227110502.3624cb87@firost>:
> > > > > 
> > > > > [...]
> > > > > > What about something like "lock‑location=bool" and  
> > > > > 
> > > > > For "lock-location" I would assume the value is a "location". I
> > > > > guess you
> > > > > wanted a "use-lock-location" Boolean value.
> > > > 
> > > > Mh, maybe "lock-current-location" would better reflect what I
> > > > meant.
> > > > 
> > > > The point is to lock the resource on the node currently running
> > > > it.
> > > 
> > > Though it only applies for a clean node shutdown, so that has to be
> > > in the name somewhere. The resource isn't locked during normal cluster
> > > operation (it can move for resource or node failures, load
> > > rebalancing,
> > > etc.).  
> > 
> > Well, I was trying to make the new feature a bit wider than just the
> > narrow shutdown feature.
> > 
> > Speaking about shutdown, what is the status of clean shutdown of the
> > cluster handled by Pacemaker? Currently, I advice to stop resources
> > gracefully (eg. using pcs resource disable [...]) before shutting down each
> > nodes either by hand or using some higher level tool (eg. pcs cluster stop
> > --all).  
> 
> I'm not sure why that would be necessary. It should be perfectly fine
> to stop pacemaker in any order without disabling resources.

Because resources might move around during the shutdown sequence. It might
not be desirable as some resource migration can be heavy, long, interfere
with shutdown, etc. I'm pretty sure this has been discussed in the past.

> Start-up is actually more of an issue ... if you start corosync and
> pacemaker on nodes one by one, and you're not quick enough, then once
> quorum is reached, the cluster will fence all the nodes that haven't
> yet come up. So on start-up, it makes sense to start corosync on all
> nodes, which will establish membership and quorum, then start pacemaker
> on all nodes. Obviously that can't be done within pacemaker so that has
> to be done manually or by a higher-level tool.

Indeed.
Or use wait-for-all.

> > Shouldn't this feature be discussed in this context as well?
> > 
> > [...]   
> > > > > > it would lock the resource location (unique or clones) until
> > > > > > the
> > > > > > operator unlock it or the "lock‑location‑timeout" expire. No
> > > > > > matter what
> > > > > > happen to the resource, maintenance mode or not.
> > > > > > 
> > > > > > At a first look, it looks to peer nicely with
> > > > > > maintenance‑mode
> > > > > > and avoid resource migration after node reboot.  
> > > 
> > > Maintenance mode is useful if you're updating the cluster stack
> > > itself
> > > -- put in maintenance mode, stop the cluster services (leaving the
> > > managed services still running), update the cluster services, start
> > > the cluster services again, take out of maintenance mode.
> > > 
> > > This is useful if you're rebooting the node for a kernel update
> > > (for example). Apply the update, reboot the node. The cluster takes care
> > > of everything else for you (stop the services before shutting down and
> > > do not recover them until the node comes back).  
> > 
> > I'm a bit lost. If resource doesn't move during maintenance mode,
> > could you detail a scenario where we should ban it explicitly from
> > other node to secure its current location when getting out of maintenance?
> > Isn't it  
> 
> Sorry, I was unclear -- I was contrasting maintenance mode with
> shutdown locks.
> 
> You wouldn't need a ban with maintenance mode. However maintenance mode
> leaves any active resources running. That means the node shouldn't be
> rebooted in maintenance mode, because those resources will not be
> cleanly stopped.
> 
> With shutdown locks, the active resources are cleanly stopped. That
> does require a ban of some sort because otherwise the resources will be
> recovered on another node.

ok, thanks,

> > excessive precaution? Is it just to avoid is to move somewhere else when
> > exiting maintenance-mode? If the resource has a preferred node, I suppose
> > the location constraint should take care of this, isn't it?  
> 
> Having a preferred node doesn't prevent the resource from starting
> elsewhere if the preferred node is down (or in standby, or otherwise
> ineligible to run the resource). Even a +INFINITY constraint allows
> recovery elsewhere if the node is not available. To keep a resource
> from being recovered, you have to put a ban (-INFINITY location
> constraint) on any nodes 

Re: [ClusterLabs] Antw: Re: Antw: [EXT] Coming in Pacemaker 2.0.4: shutdown locks

2020-02-27 Thread Ken Gaillot
On Thu, 2020-02-27 at 17:28 +0100, Jehan-Guillaume de Rorthais wrote:
> On Thu, 27 Feb 2020 09:48:23 -0600
> Ken Gaillot  wrote:
> 
> > On Thu, 2020-02-27 at 15:01 +0100, Jehan-Guillaume de Rorthais
> > wrote:
> > > On Thu, 27 Feb 2020 12:24:46 +0100
> > > "Ulrich Windl"  wrote:
> > >   
> > > > > > > Jehan-Guillaume de Rorthais  schrieb am
> > > > > > > 27.02.2020 um
> > > > 
> > > > 11:05 in
> > > > Nachricht <20200227110502.3624cb87@firost>:
> > > > 
> > > > [...]  
> > > > > What about something like "lock‑location=bool" and
> > > > 
> > > > For "lock-location" I would assume the value is a "location". I
> > > > guess you
> > > > wanted a "use-lock-location" Boolean value.  
> > > 
> > > Mh, maybe "lock-current-location" would better reflect what I
> > > meant.
> > > 
> > > The point is to lock the resource on the node currently running
> > > it.  
> > 
> > Though it only applies for a clean node shutdown, so that has to be
> > in
> > the name somewhere. The resource isn't locked during normal cluster
> > operation (it can move for resource or node failures, load
> > rebalancing,
> > etc.).
> 
> Well, I was trying to make the new feature a bit wider than just the
> narrow shutdown feature.
> 
> Speaking about shutdown, what is the status of clean shutdown of the
> cluster
> handled by Pacemaker? Currently, I advice to stop resources
> gracefully (eg.
> using pcs resource disable [...]) before shutting down each nodes
> either by hand
> or using some higher level tool (eg. pcs cluster stop --all).

I'm not sure why that would be necessary. It should be perfectly fine
to stop pacemaker in any order without disabling resources.

Start-up is actually more of an issue ... if you start corosync and
pacemaker on nodes one by one, and you're not quick enough, then once
quorum is reached, the cluster will fence all the nodes that haven't
yet come up. So on start-up, it makes sense to start corosync on all
nodes, which will establish membership and quorum, then start pacemaker
on all nodes. Obviously that can't be done within pacemaker so that has
to be done manually or by a higher-level tool.

> Shouldn't this feature be discussed in this context as well?
> 
> [...] 
> > > > > it would lock the resource location (unique or clones) until
> > > > > the
> > > > > operator unlock it or the "lock‑location‑timeout" expire. No
> > > > > matter what
> > > > > happen to the resource, maintenance mode or not.
> > > > > 
> > > > > At a first look, it looks to peer nicely with
> > > > > maintenance‑mode
> > > > > and avoid resource migration after node reboot.
> > 
> > Maintenance mode is useful if you're updating the cluster stack
> > itself
> > -- put in maintenance mode, stop the cluster services (leaving the
> > managed services still running), update the cluster services, start
> > the
> > cluster services again, take out of maintenance mode.
> > 
> > This is useful if you're rebooting the node for a kernel update
> > (for
> > example). Apply the update, reboot the node. The cluster takes care
> > of
> > everything else for you (stop the services before shutting down and
> > do
> > not recover them until the node comes back).
> 
> I'm a bit lost. If resource doesn't move during maintenance mode,
> could you detail a scenario where we should ban it explicitly from
> other node to
> secure its current location when getting out of maintenance? Isn't it

Sorry, I was unclear -- I was contrasting maintenance mode with
shutdown locks.

You wouldn't need a ban with maintenance mode. However maintenance mode
leaves any active resources running. That means the node shouldn't be
rebooted in maintenance mode, because those resources will not be
cleanly stopped.

With shutdown locks, the active resources are cleanly stopped. That
does require a ban of some sort because otherwise the resources will be
recovered on another node.

> excessive
> precaution? Is it just to avoid is to move somewhere else when
> exiting
> maintenance-mode? If the resource has a preferred node, I suppose the
> location
> constraint should take care of this, isn't it?

Having a preferred node doesn't prevent the resource from starting
elsewhere if the preferred node is down (or in standby, or otherwise
ineligible to run the resource). Even a +INFINITY constraint allows
recovery elsewhere if the node is not available. To keep a resource
from being recovered, you have to put a ban (-INFINITY location
constraint) on any nodes that could otherwise run it.

> > > > I wonder: Where is it different from a time-limited "ban"
> > > > (wording
> > > > also exists
> > > > already)? If you ban all resources from running on a specific
> > > > node,
> > > > resources
> > > > would be move away, and when booting the node, resources won't
> > > > come
> > > > back.  
> > 
> > It actually is equivalent to this process:
> > 
> > 1. Determine what resources are active on the node about to be shut
> > down.
> > 2. For each of those resources, configure a ban 

Re: [ClusterLabs] Antw: Re: Antw: [EXT] Coming in Pacemaker 2.0.4: shutdown locks

2020-02-27 Thread Jehan-Guillaume de Rorthais
On Thu, 27 Feb 2020 09:48:23 -0600
Ken Gaillot  wrote:

> On Thu, 2020-02-27 at 15:01 +0100, Jehan-Guillaume de Rorthais wrote:
> > On Thu, 27 Feb 2020 12:24:46 +0100
> > "Ulrich Windl"  wrote:
> >   
> > > > > > Jehan-Guillaume de Rorthais  schrieb am
> > > > > > 27.02.2020 um
> > > 
> > > 11:05 in
> > > Nachricht <20200227110502.3624cb87@firost>:
> > > 
> > > [...]  
> > > > What about something like "lock‑location=bool" and
> > > 
> > > For "lock-location" I would assume the value is a "location". I
> > > guess you
> > > wanted a "use-lock-location" Boolean value.  
> > 
> > Mh, maybe "lock-current-location" would better reflect what I meant.
> > 
> > The point is to lock the resource on the node currently running it.  
> 
> Though it only applies for a clean node shutdown, so that has to be in
> the name somewhere. The resource isn't locked during normal cluster
> operation (it can move for resource or node failures, load rebalancing,
> etc.).

Well, I was trying to make the new feature a bit wider than just the
narrow shutdown feature.

Speaking about shutdown, what is the status of clean shutdown of the cluster
handled by Pacemaker? Currently, I advice to stop resources gracefully (eg.
using pcs resource disable [...]) before shutting down each nodes either by hand
or using some higher level tool (eg. pcs cluster stop --all).

Shouldn't this feature be discussed in this context as well?

[...] 
> > > > it would lock the resource location (unique or clones) until the
> > > > operator unlock it or the "lock‑location‑timeout" expire. No matter what
> > > > happen to the resource, maintenance mode or not.
> > > > 
> > > > At a first look, it looks to peer nicely with maintenance‑mode
> > > > and avoid resource migration after node reboot.
> 
> Maintenance mode is useful if you're updating the cluster stack itself
> -- put in maintenance mode, stop the cluster services (leaving the
> managed services still running), update the cluster services, start the
> cluster services again, take out of maintenance mode.
> 
> This is useful if you're rebooting the node for a kernel update (for
> example). Apply the update, reboot the node. The cluster takes care of
> everything else for you (stop the services before shutting down and do
> not recover them until the node comes back).

I'm a bit lost. If resource doesn't move during maintenance mode,
could you detail a scenario where we should ban it explicitly from other node to
secure its current location when getting out of maintenance? Isn't it excessive
precaution? Is it just to avoid is to move somewhere else when exiting
maintenance-mode? If the resource has a preferred node, I suppose the location
constraint should take care of this, isn't it?

> > > I wonder: Where is it different from a time-limited "ban" (wording
> > > also exists
> > > already)? If you ban all resources from running on a specific node,
> > > resources
> > > would be move away, and when booting the node, resources won't come
> > > back.  
> 
> It actually is equivalent to this process:
> 
> 1. Determine what resources are active on the node about to be shut
> down.
> 2. For each of those resources, configure a ban (location constraint
> with -INFINITY score) using a rule where node name is not the node
> being shut down.
> 3. Apply the updates and reboot the node. The cluster will stop the
> resources (due to shutdown) and not start them anywhere else (due to
> the bans).

In maintenance mode, this would not move either.

> 4. Wait for the node to rejoin and the resources to start on it again,
> then remove all the bans.
> 
> The advantage is automation, and in particular the sysadmin applying
> the updates doesn't need to even know that the host is part of a
> cluster.

Could you elaborate? I suppose the operator still need to issue a command to
set the shutdown‑lock before reboot, isn't it?

Moreover, if shutdown‑lock is just a matter of setting ±infinity constraint on
nodes, maybe a higher level tool can take care of this?

> > This is the standby mode.  
> 
> Standby mode will stop all resources on a node, but it doesn't prevent
> recovery elsewhere.

Yes, I was just commenting on Ulrich's description (history context crop'ed
here).
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Antw: Re: Antw: [EXT] Coming in Pacemaker 2.0.4: shutdown locks

2020-02-27 Thread Ken Gaillot
On Thu, 2020-02-27 at 08:12 +0100, Ulrich Windl wrote:
> > > > Ken Gaillot  schrieb am 26.02.2020 um
> > > > 16:41 in Nachricht
> 
> <2257e2a1e5fd88ae2b915b8241a8e8c9e150b95b.ca...@redhat.com>:
> 
> [...]
> > I considered a per-resource and/or per-node setting, but the target
> > audience is someone who wants things as simple as possible. A per-
> > node
> 
> Actually, while it may seem simple, it adds quite a lot of additional
> complexity, and I'm still not convinced that this is really needed.
> 
> [...]
> 
> Regards,
> Ulrich

I think that was the reaction of just about everyone (including myself)
the first time they heard about it :)

The main justification is that other HA software offers the capability,
so this removes an obstacle to those users switching to pacemaker.

However the fact that it's a blocking point for users who might
otherwise switch points out that it does have real-world value.

It might be a narrow use case, but it's one that involves scale, which
is something we're always striving to better support. If an
organization has hundreds or thousands of clusters, yet those still are
just a small fraction of the total servers being administered at the
organization, expertise becomes a major limiting factor. In such a case
you don't want to waste your cluster admins' time on late-night routine
OS updates.
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Antw: Re: Antw: [EXT] Coming in Pacemaker 2.0.4: shutdown locks

2020-02-27 Thread Ken Gaillot
On Thu, 2020-02-27 at 15:01 +0100, Jehan-Guillaume de Rorthais wrote:
> On Thu, 27 Feb 2020 12:24:46 +0100
> "Ulrich Windl"  wrote:
> 
> > > > > Jehan-Guillaume de Rorthais  schrieb am
> > > > > 27.02.2020 um  
> > 
> > 11:05 in
> > Nachricht <20200227110502.3624cb87@firost>:
> > 
> > [...]
> > > What about something like "lock‑location=bool" and  
> > 
> > For "lock-location" I would assume the value is a "location". I
> > guess you
> > wanted a "use-lock-location" Boolean value.
> 
> Mh, maybe "lock-current-location" would better reflect what I meant.
> 
> The point is to lock the resource on the node currently running it.

Though it only applies for a clean node shutdown, so that has to be in
the name somewhere. The resource isn't locked during normal cluster
operation (it can move for resource or node failures, load rebalancing,
etc.).

> > > "lock‑location‑timeout=duration" (for those who like automatic
> > > steps)? I 
> > > imagine  
> > 
> > I'm still unhappy with "lock-location": What is a "location", and
> > what does it
> > mean to be "locked"?
> > Is that fundamentally different from "freeze/frozen" or "ignore"
> > (all those
> > phrases exist already)?
> 
> A "location" define where a resource is located in the cluster, on
> what node.
> Eg., a location constraint express where a ressource //can// run:
> 
>   «Location constraints tell the cluster which nodes a resource can
> run on. »
>   
> https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/2.0/html/Pacemaker_Explained/_deciding_which_nodes_a_resource_can_run_on.html
> 
> Here, "constraints" applies to a location. So, if you remove this
> constraint,
> the natural definition location would be:
> 
>   «Location tell the cluster what node a resource is running on.»
> 
> > > it would lock the resource location (unique or clones) until the
> > > operator
> > > unlock it or the "lock‑location‑timeout" expire. No matter what
> > > happen to  
> > > the resource, maintenance mode or not.
> > > 
> > > At a first look, it looks to peer nicely with maintenance‑mode
> > > and avoid
> > > resource migration after node reboot.  

Maintenance mode is useful if you're updating the cluster stack itself
-- put in maintenance mode, stop the cluster services (leaving the
managed services still running), update the cluster services, start the
cluster services again, take out of maintenance mode.

This is useful if you're rebooting the node for a kernel update (for
example). Apply the update, reboot the node. The cluster takes care of
everything else for you (stop the services before shutting down and do
not recover them until the node comes back).

> > I wonder: Where is it different from a time-limited "ban" (wording
> > also exists
> > already)? If you ban all resources from running on a specific node,
> > resources
> > would be move away, and when booting the node, resources won't come
> > back.

It actually is equivalent to this process:

1. Determine what resources are active on the node about to be shut
down.
2. For each of those resources, configure a ban (location constraint
with -INFINITY score) using a rule where node name is not the node
being shut down.
3. Apply the updates and reboot the node. The cluster will stop the
resources (due to shutdown) and not start them anywhere else (due to
the bans).
4. Wait for the node to rejoin and the resources to start on it again,
then remove all the bans.

The advantage is automation, and in particular the sysadmin applying
the updates doesn't need to even know that the host is part of a
cluster.

> This is the standby mode.

Standby mode will stop all resources on a node, but it doesn't prevent
recovery elsewhere.

> Moreover, note that Ken explicitly wrote: «The cluster runs services
> that have
> a preferred node». So if the resource moved elsewhere, the resource
> **must**
> come back.

Right, the point of preventing recovery elsewhere is to avoid the extra
outage:

Without shutdown lock:
1. When node is stopped, resource stops on that node, and starts on
another node. (First outage)
2. When node rejoins, resource stops on the alternate node, and starts
on original node. (Second outage)

With shutdown lock, there's one outage when the node is rebooted, but
then it starts on the same node so there is no second outage. If the
resource start time is much longer (e.g. a half hour for an extremely
large database) than the reboot time (a couple of minutes), the feature
becomes worthwhile.

> > But you want the resources to be down while the node boots, right?
> > How can
> > that concept be "married with" the concept of high availablility?
> 
> The point here is to avoid moving resources during planed
> maintenance/downtime
> as it would require longer maintenance duration (thus longer
> downtime) than a
> simple reboot with no resource migration.
> 
> Even a resource in HA can have planed maintenance :)

Right. I jokingly call this feature "medium availability" but really it
is just another way to set a 

Re: [ClusterLabs] Antw: Re: Antw: [EXT] Coming in Pacemaker 2.0.4: shutdown locks

2020-02-27 Thread Jehan-Guillaume de Rorthais
On Thu, 27 Feb 2020 12:24:46 +0100
"Ulrich Windl"  wrote:

> >>> Jehan-Guillaume de Rorthais  schrieb am 27.02.2020 um  
> 11:05 in
> Nachricht <20200227110502.3624cb87@firost>:
> 
> [...]
> > What about something like "lock‑location=bool" and  
> 
> For "lock-location" I would assume the value is a "location". I guess you
> wanted a "use-lock-location" Boolean value.

Mh, maybe "lock-current-location" would better reflect what I meant.

The point is to lock the resource on the node currently running it.

> > "lock‑location‑timeout=duration" (for those who like automatic steps)? I 
> > imagine  
> 
> I'm still unhappy with "lock-location": What is a "location", and what does it
> mean to be "locked"?
> Is that fundamentally different from "freeze/frozen" or "ignore" (all those
> phrases exist already)?

A "location" define where a resource is located in the cluster, on what node.
Eg., a location constraint express where a ressource //can// run:

  «Location constraints tell the cluster which nodes a resource can run on. »
  
https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/2.0/html/Pacemaker_Explained/_deciding_which_nodes_a_resource_can_run_on.html

Here, "constraints" applies to a location. So, if you remove this constraint,
the natural definition location would be:

  «Location tell the cluster what node a resource is running on.»

> > it would lock the resource location (unique or clones) until the operator
> > unlock it or the "lock‑location‑timeout" expire. No matter what happen to  
> > the resource, maintenance mode or not.
> > 
> > At a first look, it looks to peer nicely with maintenance‑mode and avoid
> > resource migration after node reboot.  
> 
> I wonder: Where is it different from a time-limited "ban" (wording also exists
> already)? If you ban all resources from running on a specific node, resources
> would be move away, and when booting the node, resources won't come back.

This is the standby mode.

Moreover, note that Ken explicitly wrote: «The cluster runs services that have
a preferred node». So if the resource moved elsewhere, the resource **must**
come back.

> But you want the resources to be down while the node boots, right? How can
> that concept be "married with" the concept of high availablility?

The point here is to avoid moving resources during planed maintenance/downtime
as it would require longer maintenance duration (thus longer downtime) than a
simple reboot with no resource migration.

Even a resource in HA can have planed maintenance :)

> "We have a HA cluster and HA resources, but when we boot a node those
> HA-resources will be down while the node boots." How is that different from
> not having a HA cluster, or taking those resources temporarily away from the
> HA cluster? (That was my intitial objection: Why not simply ignore resource
> failures for some time?)

Unless I'm wrong, maintenance mode does not secure the current location of
resources after reboots.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/