Re: [ClusterLabs] Q: rulke-based operation pause/freeze?
On 3/5/20 9:24 PM, Ulrich Windl wrote: Hi! I'm wondering whether it's possible to pause/freeze specific resource operations through rules. The idea is something like this: If your monitor operation needes (e.g.) some external NFS server, and thst NFS server is known to be down, it seems better to delay the monitor operation until NFS is up again, rather than forcing a monitor timeout that will most likely be followed by a stop operation that will also time out, eventually killing the node (which has no problem itself). As I guess it's not possible right now, what would be needed to make this work? In case it's possible, how would an example scenario look like? Regards, Ulrich Hi Ulrich, For 'monitor' operation you can disable it with approach described here at https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/_disabling_a_monitor_operation.html > "followed by a stop operation that will also time out, eventually killing the node (which has no problem itself)" This sounds to me as the resource agent "feature" and I would expect that different resources agents would have different behavior when something is lost/not present. To me the idea here looks like "maintenance period" for some resource. Is your expectation that cluster would not for some time do anything with some resources? (In such case I would consider 'is-managed'=false + disabling monitor) https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-resource-options.html#_resource_meta_attributes To determine _when_ this state should be enabled and disabled would be a different story. -- Ondrej Famera ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Q: rulke-based operation pause/freeze?
Hi Ulrich, for HA NFS , you should expect no more than 90s (after the failover is complete) for NFSv4 clients to recover. Due to that, I think that all resources (in same cluster or another one) should have a longer period of monitoring. Maybe something like 179s . Of course , if you NFS will be down for a longer period, you can set all HA resources that depend on it with a "on-fail=ignore" and once the maintenance is over to remove it. After all , you seek the cluster not to react for that specific time , but you should keep track on such changes - as it is easy to forget such setting. Another approach is to leave the monitoring period high enough ,so the cluster won't catch the downtime - but imagine that the downtime of the NFS has to be extended - do you believe that you will be able to change all affected resources on time ? Best Regards, Strahil Nikolov В четвъртък, 5 март 2020 г., 14:25:36 ч. Гринуич+2, Ulrich Windl написа: Hi! I'm wondering whether it's possible to pause/freeze specific resource operations through rules. The idea is something like this: If your monitor operation needes (e.g.) some external NFS server, and thst NFS server is known to be down, it seems better to delay the monitor operation until NFS is up again, rather than forcing a monitor timeout that will most likely be followed by a stop operation that will also time out, eventually killing the node (which has no problem itself). As I guess it's not possible right now, what would be needed to make this work? In case it's possible, how would an example scenario look like? Regards, Ulrich ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/ ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
[ClusterLabs] Q: rulke-based operation pause/freeze?
Hi! I'm wondering whether it's possible to pause/freeze specific resource operations through rules. The idea is something like this: If your monitor operation needes (e.g.) some external NFS server, and thst NFS server is known to be down, it seems better to delay the monitor operation until NFS is up again, rather than forcing a monitor timeout that will most likely be followed by a stop operation that will also time out, eventually killing the node (which has no problem itself). As I guess it's not possible right now, what would be needed to make this work? In case it's possible, how would an example scenario look like? Regards, Ulrich ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/