Re: [ClusterLabs] Antw: [EXT] Re: Q: rulke-based operation pause/freeze?

2020-03-06 Thread Ken Gaillot
On Fri, 2020-03-06 at 08:19 +0100, Ulrich Windl wrote:
> > > > Ondrej  schrieb am 06.03.2020 um
> > > > 01:45 in
> 
> Nachricht
> <
> 7499_1583455563_5E619D4B_7499_1105_1_2a18c389-059e-cf6f-a840-dec26437fdd1@famer
> .cz>:
> > On 3/5/20 9:24 PM, Ulrich Windl wrote:
> > > Hi!
> > > 
> > > I'm wondering whether it's possible to pause/freeze specific
> > > resource 
> > 
> > operations through rules.
> > > The idea is something like this: If your monitor operation needes
> > > (e.g.) 
> > 
> > some external NFS server, and thst NFS server is known to be down,
> > it seems
> > better to delay the monitor operation until NFS is up again, rather
> > than 
> > forcing a monitor timeout that will most likely be followed by a
> > stop 
> > operation that will also time out, eventually killing the node
> > (which has no
> > problem itself).
> > > 
> > > As I guess it's not possible right now, what would be needed to
> > > make this 
> > 
> > work?
> > > In case it's possible, how would an example scenario look like?
> > > 
> > > Regards,
> > > Ulrich
> > > 
> > 
> > Hi Ulrich,
> > 
> > To determine _when_ this state should be enabled and disabled would
> > be a 
> > different story.
> 
> For the moment let's assume I know it ;-) ping-node, maybe.

I believe that limited scenario is possible, but imperfectly.

You could configure an ocf:pacemaker:ping resource to ping the NFS
server IP. Then in the dependent resource, configure the recurring
monitor logically like this:

  monitor interval=N
 meta attributes
rule when ping attribute lt 1 or not defined
enabled=false

The node attribute will be changed only once the ping resource monitor
detects the IP gone, so there will be a window between when the IP
actually disappears and the node attribute is changed where the problem
could still occur. Also, the NFS server could have problems that do not
make the IP unpingable, and those situations would still have the
issue.
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] Antw: [EXT] Re: Q: rulke-based operation pause/freeze?

2020-03-05 Thread Ulrich Windl
>>> Ondrej  schrieb am 06.03.2020 um 01:45 in
Nachricht
<7499_1583455563_5E619D4B_7499_1105_1_2a18c389-059e-cf6f-a840-dec26437fdd1@famer
.cz>:
> On 3/5/20 9:24 PM, Ulrich Windl wrote:
>> Hi!
>> 
>> I'm wondering whether it's possible to pause/freeze specific resource 
> operations through rules.
>> The idea is something like this: If your monitor operation needes (e.g.) 
> some external NFS server, and thst NFS server is known to be down, it seems

> better to delay the monitor operation until NFS is up again, rather than 
> forcing a monitor timeout that will most likely be followed by a stop 
> operation that will also time out, eventually killing the node (which has no

> problem itself).
>> 
>> As I guess it's not possible right now, what would be needed to make this 
> work?
>> In case it's possible, how would an example scenario look like?
>> 
>> Regards,
>> Ulrich
>> 
> 
> Hi Ulrich,
> 
> For 'monitor' operation you can disable it with approach described here 
> at 
>
https://clusterlabs.org/pacemaker/doc/en‑US/Pacemaker/1.1/html/Pacemaker_Expl

> ained/_disabling_a_monitor_operation.html
> 
>  > "followed by a stop operation that will also time out, eventually 
> killing the node (which has no problem itself)"
> This sounds to me as the resource agent "feature" and I would expect 
> that different resources agents would have different behavior when 
> something is lost/not present.

Of course. Some RAs are "slim", while others are real "fat" (like calling a
command that uses REST API to query a java server that runs a command which
finally checks the status of the service. Maybe even worse.).

> 
> To me the idea here looks like "maintenance period" for some resource.

No, it's to avoid an "error cascade".

> Is your expectation that cluster would not for some time do anything 
> with some resources?
> (In such case I would consider 'is‑managed'=false + disabling monitor)
>
https://clusterlabs.org/pacemaker/doc/en‑US/Pacemaker/1.1/html/Pacemaker_Expl

> ained/s‑resource‑options.html#_resource_meta_attributes

Your suggestion would require to modify multiple operations in multiple
resources every time it'S needed, while my idea was to "flag" corresponding
operations once, and let some rule decide what to do. Agreed, the rule would
eventually do the same from a higher perspective, but the "configuration" would
not change very time.

> 
> To determine _when_ this state should be enabled and disabled would be a 
> different story.

For the moment let's assume I know it ;-) ping-node, maybe.

Regards,
Ulrich

> 
> ‑‑
> Ondrej Famera
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/ 



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/