Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?
On 20/05/16 17:04 +0100, Adam Spiers wrote: > Klaus Wenningerwrote: >> On 05/20/2016 08:39 AM, Ulrich Windl wrote: >>> I think RAs should not rely on "stop" being called multiple times >>> for a resource to be stopped. > > Well, this would be a major architectural change. Currently if > stop fails once, the node gets fenced - period. So if we changed > this, there would presumably be quite a bit of scope for making the > new design address whatever concerns you have about relying on "stop" > *sometimes* needing to be called multiple times. For the sake of > backwards compatibility with existing RAs, I think we'd have to ensure > the current semantics still work. But maybe there could be a new > option where RAs are allowed to return OCF_RETRY_STOP to indicate that > they want to escalate, or something. However it's not clear how that > would be distinguished from an old RA returning the same value as > whatever we chose for OCF_RETRY_STOP. > >> I see a couple of positive points in having something inside pacemaker >> that helps the RAs escalating their stop strategy: >> >> - this way you have the same logging for all RAs - done within the >> RA it would look different with each of them >> - timeout-retry stuff is potentially prone to not being implemented >> properly - like this you have a proven >> implementation within pacemaker >> - keeps logic within RA simpler and guides implementation in >> a certain direction that makes them look more similar to each >> other making it easier to understand an RA you haven't seen >> before > > Yes, all good points which I agree with. > >> Of course there are basically two approaches to achieve this: >> >> - give some global or per resource view of pacemaker to the RA and leave >> it to the RA to act in a responsible manner (like telling the RA >> that there are x stop-retries to come) >> - handle the escalation withing pacemaker and already tell the RA >> what you expect it to do like requesting a graceful / hard / >> emergency or however you would call it stop > > I'd probably prefer the former, to avoid hardcoding any assumptions > about the different levels of escalation the RA might want to take. > That would almost certainly vary per RA. I'd like to point out the direction of just-released systemd 236 to solve "what if action needs more time to finish than permitted": > The sd_notify() protocol can now with EXTEND_TIMEOUT_USEC=microsecond > extend the effective start, runtime, and stop time. The service must > continue to send EXTEND_TIMEOUT_USEC within the period specified to > prevent the service manager from making the service as timedout. It apparently does not solve "cannot wait forever otherwise degrading availability" off the bat, is not well suited for the current agent-driven, synchronous+sequenced supervision model (which, since beginning, was not planned to remain the final state-of-art[1], though), but looks simple enough and is quite close to OCF_RETRY_STOP idea proposed above. [1] https://github.com/ClusterLabs/OCF-spec/commit/2331bb8d3624a2697afaf3429cec1f47d19251f5#diff-316ade5241704833815c8fa2c2b71d4dR422 > However, we're slightly off-topic for this thread at this point ;-) (It's all one big Gordian knot, all is related, and that we are not starting with a clean drawing board but are rolling some stones ahead of us already is not helping.) -- Poki pgpFjdD_xHYWf.pgp Description: PGP signature ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?
Ken Gaillotwrote: > On 06/24/2016 05:41 AM, Adam Spiers wrote: > > Andrew Beekhof wrote: > >> On Fri, Jun 24, 2016 at 1:01 AM, Adam Spiers wrote: > >>> Andrew Beekhof wrote: > > Earlier in this thread I proposed > > the idea of a tiny temporary file in /run which tracks the last known > > state and optimizes away the consecutive invocations, but IIRC you > > were against that. > > I'm generally not a fan, but sometimes state files are a necessity. > Just make sure you think through what a missing file might mean. > >>> > >>> Sure. A missing file would mean the RA's never called service-disable > >>> before, > >> > >> And that is why I generally don't like state files. > >> The default location for state files doesn't persist across reboots. > >> > >> t1. stop (ie. disable) > >> t2. reboot > >> t3. start with no state file > >> t4. WHY WONT NOVA USE THE NEW COMPUTE NODE STUPID CLUSTERS > > > > Well then we simply put the state file somewhere which does persist > > across reboots. > > There's also the possibility of using a node attribute. If you set a > normal node attribute, it will abort the transition and calculate a new > one, so that's something to take into account. You could set a private > node attribute, which never gets written to the CIB and thus doesn't > abort transitions, but it also does not survive a complete cluster stop. Interesting idea, although I wonder if there is a good solution to either of these challenges. Aborting the current transition sounds bad, and we would certainly want the state to survive a cluster stop, otherwise we risk the exact issue Andrew described above. Also, since the state is per-node, I'm not convinced there's a huge advantage to sharing it cluster-wide, which is why I proposed the local filesystem as the store for it. But I'm open to suggestions of course :-) ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?
On Fri, Jun 24, 2016 at 1:01 AM, Adam Spierswrote: > Andrew Beekhof wrote: >> > Well, if you're OK with bending the rules like this then that's good >> > enough for me to say we should at least try it :) >> >> I still say you shouldn't only do it on error. > > When else should it be done? I was thinking whenever a stop() happens. > IIUC, disabling/enabling the service is independent of the up/down > state which nova tracks automatically, and which based on slightly > more than a skim of the code, is dependent on the state of the RPC > layer. > >> > But how would you avoid repeated consecutive invocations of "nova >> > service-disable" when the monitor action fails, and ditto for "nova >> > service-enable" when it succeeds? >> >> I don't think you can. Not ideal but I'd not have thought a deal breaker. > > Sounds like a massive deal-breaker to me! With op monitor > interval="10s" and 100 compute nodes, that would mean 10 pointless > calls to nova-api every second. Am I missing something? I was thinking you would only call it for the "I detected a failure case" and service-enable would still be on start(). So the number of pointless calls per second would be capped at one tenth of the number of failed compute nodes. One would hope that all of them weren't dead. > > Also I don't see any benefit to moving the API calls from start/stop > actions to the monitor action. If there's a failure, Pacemaker will > invoke the stop action, so we can do service-disable there. I agree. Doing it unconditionally at stop() is my preferred option, I was only trying to provide a path that might be close to the behaviour you were looking for. > If the > start action is invoked and we successfully initiate startup of > nova-compute, the RA can undo any service-disable it previously did > (although it should not reverse a service-disable done elsewhere, > e.g. manually by the cloud operator). Agree > >> > Earlier in this thread I proposed >> > the idea of a tiny temporary file in /run which tracks the last known >> > state and optimizes away the consecutive invocations, but IIRC you >> > were against that. >> >> I'm generally not a fan, but sometimes state files are a necessity. >> Just make sure you think through what a missing file might mean. > > Sure. A missing file would mean the RA's never called service-disable > before, And that is why I generally don't like state files. The default location for state files doesn't persist across reboots. t1. stop (ie. disable) t2. reboot t3. start with no state file t4. WHY WONT NOVA USE THE NEW COMPUTE NODE STUPID CLUSTERS > which means that it shouldn't call service-enable on startup. > >> Unless use the state file to store the date at which the last >> start operation occurred? >> >> If we're calling stop() and data - start_date > threshold, then, if >> you must, be optimistic, skip service-disable and assume we'll get >> started again soon. >> >> Otherwise if we're calling stop() and data - start_date <= threshold, >> always call service-disable because we're in a restart loop which is >> not worth optimising for. >> >> ( And always call service-enable at start() ) >> >> No Pacemaker feature or Beekhof approval required :-) > > Hmm ... it's possible I just don't understand this proposal fully, > but it sounds a bit woolly to me, e.g. how would you decide a suitable > threshold? roll a dice? > I think I preferred your other suggestion of just skipping the > optimization, i.e. calling service-disable on the first stop, and > service-enable on (almost) every start. good :) And the use of force-down from your subsequent email sounds excellent ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?
Andrew Beekhofwrote: > On Wed, Jun 15, 2016 at 10:42 PM, Adam Spiers wrote: > > Andrew Beekhof wrote: > >> On Mon, Jun 13, 2016 at 9:34 PM, Adam Spiers wrote: > >> > Andrew Beekhof wrote: > >> >> On Wed, Jun 8, 2016 at 6:23 PM, Adam Spiers wrote: > >> >> > Andrew Beekhof wrote: > >> >> >> On Wed, Jun 8, 2016 at 12:11 AM, Adam Spiers > >> >> >> wrote: > >> >> >> > We would also need to ensure that service-enable is called on start > >> >> >> > when necessary. Perhaps we could track the enable/disable state > >> >> >> > in a > >> >> >> > local temporary file, and if the file indicates that we've > >> >> >> > previously > >> >> >> > done service-disable, we know to run service-enable on start. This > >> >> >> > would avoid calling service-enable on every single start. > >> >> >> > >> >> >> feels like an over-optimization > >> >> >> in fact, the whole thing feels like that if i'm honest. > >> >> > > >> >> > Huh ... You didn't seem to think that when we discussed automating > >> >> > service-disable at length in Austin. > >> >> > >> >> I didn't feel the need to push back because RH uses the systemd agent > >> >> instead so you're only hanging yourself, but more importantly because > >> >> the proposed implementation to facilitate it wasn't leading RA writers > >> >> down a hazardous path :-) > >> > > >> > I'm a bit confused by that statement, because the only proposed > >> > implementation we came up with in Austin was adding this new feature > >> > to Pacemaker. > >> > >> _A_ new feature, not _this_ new feature. > >> The one we discussed was far less prone to being abused but, as it > >> turns out, also far less useful for what you were trying to do. > > > > Was there really that much significant change since the original idea? > > IIRC the only thing which really changed was the type, from "number of > > retries remaining" to a boolean "there are still some retries" left. > > The new implementation has nothing to do with retries. Like the new > name, it is based on "is a start action expected". Oh yeah, I remember now. > Thats why I got an attack of the heebie-jeebies. I'm not sure why, but at least now I understand your change of position :-) > > I'm not sure why the integer approach would be far less open to abuse, > > or even why it would have been far less useful. I'm probably missing > > something. > > > > [snipped] > > > >> >> >> why are we trying to optimise the projected performance impact > >> >> > > >> >> > It's not really "projected"; we know exactly what the impact is. And > >> >> > it's not really a performance impact either. If nova-compute (or a > >> >> > dependency) is malfunctioning on a compute node, there will be a > >> >> > window (bounded by nova.conf's rpc_response_timeout value, IIUC) in > >> >> > which nova-scheduler could still schedule VMs onto that compute node, > >> >> > and then of course they'll fail to boot. > >> >> > >> >> Right, but that window exists regardless of whether the node is or is > >> >> not ever coming back. > >> > > >> > Sure, but the window's a *lot* bigger if we don't do service-disable. > >> > Although perhaps your question "why are we trying to optimise the > >> > projected performance impact" was actually "why are we trying to avoid > >> > extra calls to service-disable" rather than "why do we want to call > >> > service-disable" as I initially assumed. Is that right? > >> > >> Exactly. I assumed it was to limit the noise we'd be generating in doing > >> so. > > > > Sort of - not just the noise, but the extra delay introduced by > > calling service-disable, restarting nova-compute, and then calling > > service-enable again when it succeeds. > > Ok, but restarting nova-compute is not optional and the bits that are > optional are all but completely asynchronous* - so the overhead should > be negligible. > > * Like most API calls, they are Ack'd when the request has been > received, not processed. Yes, fair points. > >> >> > The masakari folks have a lot of operational experience in this space, > >> >> > and they found that this was enough of a problem to justify calling > >> >> > nova service-disable whenever the failure is detected. > >> >> > >> >> If you really want it whenever the failure is detected, call it from > >> >> the monitor operation that finds it broken. > >> > > >> > Hmm, that appears to violate what I assume would be a fundamental > >> > design principle of Pacemaker: that the "monitor" action never changes > >> > the system's state (assuming there are no Heisenberg-like side effects > >> > of monitoring, of course). > >> > >> That has traditionally been the considered a good idea, in the vast > >> majority of cases I still think it is a good idea, but its also a > >> guideline that has been broken because there is no other way for the > >> agent to work *cough*
Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?
On Wed, Jun 15, 2016 at 10:42 PM, Adam Spierswrote: > Andrew Beekhof wrote: >> On Mon, Jun 13, 2016 at 9:34 PM, Adam Spiers wrote: >> > Andrew Beekhof wrote: >> >> On Wed, Jun 8, 2016 at 6:23 PM, Adam Spiers wrote: >> >> > Andrew Beekhof wrote: >> >> >> On Wed, Jun 8, 2016 at 12:11 AM, Adam Spiers wrote: >> >> >> > We would also need to ensure that service-enable is called on start >> >> >> > when necessary. Perhaps we could track the enable/disable state in a >> >> >> > local temporary file, and if the file indicates that we've previously >> >> >> > done service-disable, we know to run service-enable on start. This >> >> >> > would avoid calling service-enable on every single start. >> >> >> >> >> >> feels like an over-optimization >> >> >> in fact, the whole thing feels like that if i'm honest. >> >> > >> >> > Huh ... You didn't seem to think that when we discussed automating >> >> > service-disable at length in Austin. >> >> >> >> I didn't feel the need to push back because RH uses the systemd agent >> >> instead so you're only hanging yourself, but more importantly because >> >> the proposed implementation to facilitate it wasn't leading RA writers >> >> down a hazardous path :-) >> > >> > I'm a bit confused by that statement, because the only proposed >> > implementation we came up with in Austin was adding this new feature >> > to Pacemaker. >> >> _A_ new feature, not _this_ new feature. >> The one we discussed was far less prone to being abused but, as it >> turns out, also far less useful for what you were trying to do. > > Was there really that much significant change since the original idea? > IIRC the only thing which really changed was the type, from "number of > retries remaining" to a boolean "there are still some retries" left. The new implementation has nothing to do with retries. Like the new name, it is based on "is a start action expected". Thats why I got an attack of the heebie-jeebies. > I'm not sure why the integer approach would be far less open to abuse, > or even why it would have been far less useful. I'm probably missing > something. > > [snipped] > >> >> >> why are we trying to optimise the projected performance impact >> >> > >> >> > It's not really "projected"; we know exactly what the impact is. And >> >> > it's not really a performance impact either. If nova-compute (or a >> >> > dependency) is malfunctioning on a compute node, there will be a >> >> > window (bounded by nova.conf's rpc_response_timeout value, IIUC) in >> >> > which nova-scheduler could still schedule VMs onto that compute node, >> >> > and then of course they'll fail to boot. >> >> >> >> Right, but that window exists regardless of whether the node is or is >> >> not ever coming back. >> > >> > Sure, but the window's a *lot* bigger if we don't do service-disable. >> > Although perhaps your question "why are we trying to optimise the >> > projected performance impact" was actually "why are we trying to avoid >> > extra calls to service-disable" rather than "why do we want to call >> > service-disable" as I initially assumed. Is that right? >> >> Exactly. I assumed it was to limit the noise we'd be generating in doing so. > > Sort of - not just the noise, but the extra delay introduced by > calling service-disable, restarting nova-compute, and then calling > service-enable again when it succeeds. Ok, but restarting nova-compute is not optional and the bits that are optional are all but completely asynchronous* - so the overhead should be negligible. * Like most API calls, they are Ack'd when the request has been received, not processed. > > [snipped] > >> >> > The masakari folks have a lot of operational experience in this space, >> >> > and they found that this was enough of a problem to justify calling >> >> > nova service-disable whenever the failure is detected. >> >> >> >> If you really want it whenever the failure is detected, call it from >> >> the monitor operation that finds it broken. >> > >> > Hmm, that appears to violate what I assume would be a fundamental >> > design principle of Pacemaker: that the "monitor" action never changes >> > the system's state (assuming there are no Heisenberg-like side effects >> > of monitoring, of course). >> >> That has traditionally been the considered a good idea, in the vast >> majority of cases I still think it is a good idea, but its also a >> guideline that has been broken because there is no other way for the >> agent to work *cough* rabbit *cough*. >> >> In this specific case, I think it could be forgivable because you're >> not strictly altering the service but something that sits in front of >> it. start/stop/monitor would all continue to do TheRightThing(tm). >> >> > I guess you could argue that in this case, >> > the nova server's internal state could be considered outside the >> > system which
Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?
Andrew Beekhofwrote: > On Mon, Jun 13, 2016 at 9:34 PM, Adam Spiers wrote: > > Andrew Beekhof wrote: > >> On Wed, Jun 8, 2016 at 6:23 PM, Adam Spiers wrote: > >> > Andrew Beekhof wrote: > >> >> On Wed, Jun 8, 2016 at 12:11 AM, Adam Spiers wrote: > >> >> > We would also need to ensure that service-enable is called on start > >> >> > when necessary. Perhaps we could track the enable/disable state in a > >> >> > local temporary file, and if the file indicates that we've previously > >> >> > done service-disable, we know to run service-enable on start. This > >> >> > would avoid calling service-enable on every single start. > >> >> > >> >> feels like an over-optimization > >> >> in fact, the whole thing feels like that if i'm honest. > >> > > >> > Huh ... You didn't seem to think that when we discussed automating > >> > service-disable at length in Austin. > >> > >> I didn't feel the need to push back because RH uses the systemd agent > >> instead so you're only hanging yourself, but more importantly because > >> the proposed implementation to facilitate it wasn't leading RA writers > >> down a hazardous path :-) > > > > I'm a bit confused by that statement, because the only proposed > > implementation we came up with in Austin was adding this new feature > > to Pacemaker. > > _A_ new feature, not _this_ new feature. > The one we discussed was far less prone to being abused but, as it > turns out, also far less useful for what you were trying to do. Was there really that much significant change since the original idea? IIRC the only thing which really changed was the type, from "number of retries remaining" to a boolean "there are still some retries" left. I'm not sure why the integer approach would be far less open to abuse, or even why it would have been far less useful. I'm probably missing something. [snipped] > >> >> why are we trying to optimise the projected performance impact > >> > > >> > It's not really "projected"; we know exactly what the impact is. And > >> > it's not really a performance impact either. If nova-compute (or a > >> > dependency) is malfunctioning on a compute node, there will be a > >> > window (bounded by nova.conf's rpc_response_timeout value, IIUC) in > >> > which nova-scheduler could still schedule VMs onto that compute node, > >> > and then of course they'll fail to boot. > >> > >> Right, but that window exists regardless of whether the node is or is > >> not ever coming back. > > > > Sure, but the window's a *lot* bigger if we don't do service-disable. > > Although perhaps your question "why are we trying to optimise the > > projected performance impact" was actually "why are we trying to avoid > > extra calls to service-disable" rather than "why do we want to call > > service-disable" as I initially assumed. Is that right? > > Exactly. I assumed it was to limit the noise we'd be generating in doing so. Sort of - not just the noise, but the extra delay introduced by calling service-disable, restarting nova-compute, and then calling service-enable again when it succeeds. [snipped] > >> > The masakari folks have a lot of operational experience in this space, > >> > and they found that this was enough of a problem to justify calling > >> > nova service-disable whenever the failure is detected. > >> > >> If you really want it whenever the failure is detected, call it from > >> the monitor operation that finds it broken. > > > > Hmm, that appears to violate what I assume would be a fundamental > > design principle of Pacemaker: that the "monitor" action never changes > > the system's state (assuming there are no Heisenberg-like side effects > > of monitoring, of course). > > That has traditionally been the considered a good idea, in the vast > majority of cases I still think it is a good idea, but its also a > guideline that has been broken because there is no other way for the > agent to work *cough* rabbit *cough*. > > In this specific case, I think it could be forgivable because you're > not strictly altering the service but something that sits in front of > it. start/stop/monitor would all continue to do TheRightThing(tm). > > > I guess you could argue that in this case, > > the nova server's internal state could be considered outside the > > system which Pacemaker is managing. > > Right. Well, if you're OK with bending the rules like this then that's good enough for me to say we should at least try it :) But how would you avoid repeated consecutive invocations of "nova service-disable" when the monitor action fails, and ditto for "nova service-enable" when it succeeds? Earlier in this thread I proposed the idea of a tiny temporary file in /run which tracks the last known state and optimizes away the consecutive invocations, but IIRC you were against that. > >> I'm arguing that trying to do it only failure is an over optimization >
Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?
On Mon, Jun 13, 2016 at 9:34 PM, Adam Spierswrote: > Andrew Beekhof wrote: >> On Wed, Jun 8, 2016 at 6:23 PM, Adam Spiers wrote: >> > Andrew Beekhof wrote: >> >> On Wed, Jun 8, 2016 at 12:11 AM, Adam Spiers wrote: >> >> > Ken Gaillot wrote: >> >> >> On 06/06/2016 05:45 PM, Adam Spiers wrote: >> >> >> > Maybe your point was that if the expected start never happens (so >> >> >> > never even gets a chance to fail), we still want to do a nova >> >> >> > service-disable? >> >> >> >> >> >> That is a good question, which might mean it should be done on every >> >> >> stop -- or could that cause problems (besides delays)? >> >> > >> >> > No, the whole point of adding this feature is to avoid a >> >> > service-disable on every stop, and instead only do it on the final >> >> > stop. If there are corner cases where we never reach the final stop, >> >> > that's not a disaster because nova will eventually figure it out and >> >> > do the right thing when the server-agent connection times out. >> >> > >> >> >> Another aspect of this is that the proposed feature could only look at >> >> >> a >> >> >> single transition. What if stop is called with start_expected=false, >> >> >> but >> >> >> then Pacemaker is able to start the service on the same node in the >> >> >> next >> >> >> transition immediately afterward? Would having called service-disable >> >> >> cause problems for that start? >> >> > >> >> > We would also need to ensure that service-enable is called on start >> >> > when necessary. Perhaps we could track the enable/disable state in a >> >> > local temporary file, and if the file indicates that we've previously >> >> > done service-disable, we know to run service-enable on start. This >> >> > would avoid calling service-enable on every single start. >> >> >> >> feels like an over-optimization >> >> in fact, the whole thing feels like that if i'm honest. >> > >> > Huh ... You didn't seem to think that when we discussed automating >> > service-disable at length in Austin. >> >> I didn't feel the need to push back because RH uses the systemd agent >> instead so you're only hanging yourself, but more importantly because >> the proposed implementation to facilitate it wasn't leading RA writers >> down a hazardous path :-) > > I'm a bit confused by that statement, because the only proposed > implementation we came up with in Austin was adding this new feature > to Pacemaker. _A_ new feature, not _this_ new feature. The one we discussed was far less prone to being abused but, as it turns out, also far less useful for what you were trying to do. Prior to that, AFAICR, you, Dawid, and I had a long > afternoon discussion in the sun where we tried to figure out a way to > implement it just by tweaking the OCF RAs, but every approach we > discussed turned out to have fundamental issues. That's why we > eventually turned to the idea of this new feature in Pacemaker. > > But anyway, it's water under the bridge now :-) > >> > What changed? Can you suggest a better approach? >> >> Either always or never disable the service would be my advice. >> "Always" specifically getting my vote. > > OK, thanks. We discussed that at the meeting this morning, and it > looks like we'll give it a try. > >> >> why are we trying to optimise the projected performance impact >> > >> > It's not really "projected"; we know exactly what the impact is. And >> > it's not really a performance impact either. If nova-compute (or a >> > dependency) is malfunctioning on a compute node, there will be a >> > window (bounded by nova.conf's rpc_response_timeout value, IIUC) in >> > which nova-scheduler could still schedule VMs onto that compute node, >> > and then of course they'll fail to boot. >> >> Right, but that window exists regardless of whether the node is or is >> not ever coming back. > > Sure, but the window's a *lot* bigger if we don't do service-disable. > Although perhaps your question "why are we trying to optimise the > projected performance impact" was actually "why are we trying to avoid > extra calls to service-disable" rather than "why do we want to call > service-disable" as I initially assumed. Is that right? Exactly. I assumed it was to limit the noise we'd be generating in doing so. > >> And as we already discussed, the proposed feature still leaves you >> open to this window because we can't know if the expected restart will >> ever happen. > > Yes, but as I already said, the perfect should not become the enemy of > the good. Just because an approach doesn't solve all cases, it > doesn't necessarily mean it's not suitable for solving some of them. > >> In this context, trying to avoid the disable call under certain >> circumstances, to avoid repeated and frequent flip-flopping of the >> state, seems ill-advised. At the point nova compute is bouncing up >> and down like that, you have a more
Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?
Andrew Beekhofwrote: > On Wed, Jun 8, 2016 at 6:23 PM, Adam Spiers wrote: > > Andrew Beekhof wrote: > >> On Wed, Jun 8, 2016 at 12:11 AM, Adam Spiers wrote: > >> > Ken Gaillot wrote: > >> >> On 06/06/2016 05:45 PM, Adam Spiers wrote: > >> >> > Maybe your point was that if the expected start never happens (so > >> >> > never even gets a chance to fail), we still want to do a nova > >> >> > service-disable? > >> >> > >> >> That is a good question, which might mean it should be done on every > >> >> stop -- or could that cause problems (besides delays)? > >> > > >> > No, the whole point of adding this feature is to avoid a > >> > service-disable on every stop, and instead only do it on the final > >> > stop. If there are corner cases where we never reach the final stop, > >> > that's not a disaster because nova will eventually figure it out and > >> > do the right thing when the server-agent connection times out. > >> > > >> >> Another aspect of this is that the proposed feature could only look at a > >> >> single transition. What if stop is called with start_expected=false, but > >> >> then Pacemaker is able to start the service on the same node in the next > >> >> transition immediately afterward? Would having called service-disable > >> >> cause problems for that start? > >> > > >> > We would also need to ensure that service-enable is called on start > >> > when necessary. Perhaps we could track the enable/disable state in a > >> > local temporary file, and if the file indicates that we've previously > >> > done service-disable, we know to run service-enable on start. This > >> > would avoid calling service-enable on every single start. > >> > >> feels like an over-optimization > >> in fact, the whole thing feels like that if i'm honest. > > > > Huh ... You didn't seem to think that when we discussed automating > > service-disable at length in Austin. > > I didn't feel the need to push back because RH uses the systemd agent > instead so you're only hanging yourself, but more importantly because > the proposed implementation to facilitate it wasn't leading RA writers > down a hazardous path :-) I'm a bit confused by that statement, because the only proposed implementation we came up with in Austin was adding this new feature to Pacemaker. Prior to that, AFAICR, you, Dawid, and I had a long afternoon discussion in the sun where we tried to figure out a way to implement it just by tweaking the OCF RAs, but every approach we discussed turned out to have fundamental issues. That's why we eventually turned to the idea of this new feature in Pacemaker. But anyway, it's water under the bridge now :-) > > What changed? Can you suggest a better approach? > > Either always or never disable the service would be my advice. > "Always" specifically getting my vote. OK, thanks. We discussed that at the meeting this morning, and it looks like we'll give it a try. > >> why are we trying to optimise the projected performance impact > > > > It's not really "projected"; we know exactly what the impact is. And > > it's not really a performance impact either. If nova-compute (or a > > dependency) is malfunctioning on a compute node, there will be a > > window (bounded by nova.conf's rpc_response_timeout value, IIUC) in > > which nova-scheduler could still schedule VMs onto that compute node, > > and then of course they'll fail to boot. > > Right, but that window exists regardless of whether the node is or is > not ever coming back. Sure, but the window's a *lot* bigger if we don't do service-disable. Although perhaps your question "why are we trying to optimise the projected performance impact" was actually "why are we trying to avoid extra calls to service-disable" rather than "why do we want to call service-disable" as I initially assumed. Is that right? > And as we already discussed, the proposed feature still leaves you > open to this window because we can't know if the expected restart will > ever happen. Yes, but as I already said, the perfect should not become the enemy of the good. Just because an approach doesn't solve all cases, it doesn't necessarily mean it's not suitable for solving some of them. > In this context, trying to avoid the disable call under certain > circumstances, to avoid repeated and frequent flip-flopping of the > state, seems ill-advised. At the point nova compute is bouncing up > and down like that, you have a more fundamental issue somewhere in > your stack and this is only one (and IMHO minor) symptom of it. That's a fair point. > > The masakari folks have a lot of operational experience in this space, > > and they found that this was enough of a problem to justify calling > > nova service-disable whenever the failure is detected. > > If you really want it whenever the failure is detected, call it from > the monitor operation that finds it broken. Hmm, that
Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?
On Wed, Jun 8, 2016 at 6:23 PM, Adam Spierswrote: > Andrew Beekhof wrote: >> On Wed, Jun 8, 2016 at 12:11 AM, Adam Spiers wrote: >> > Ken Gaillot wrote: >> >> On 06/06/2016 05:45 PM, Adam Spiers wrote: >> >> > Adam Spiers wrote: >> >> >> Andrew Beekhof wrote: >> >> >>> On Tue, Jun 7, 2016 at 8:29 AM, Adam Spiers wrote: >> >> Ken Gaillot wrote: >> >> > My main question is how useful would it actually be in the proposed >> >> > use >> >> > cases. Considering the possibility that the expected start might >> >> > never >> >> > happen (or fail), can an RA really do anything different if >> >> > start_expected=true? >> >> >> >> That's the wrong question :-) >> >> >> >> > If the use case is there, I have no problem with >> >> > adding it, but I want to make sure it's worthwhile. >> >> >> >> The use case which started this whole thread is for >> >> start_expected=false, not start_expected=true. >> >> >>> >> >> >>> Isn't this just two sides of the same coin? >> >> >>> If you're not doing the same thing for both cases, then you're just >> >> >>> reversing the order of the clauses. >> >> >> >> >> >> No, because the stated concern about unreliable expectations >> >> >> ("Considering the possibility that the expected start might never >> >> >> happen (or fail)") was regarding start_expected=true, and that's the >> >> >> side of the coin we don't care about, so it doesn't matter if it's >> >> >> unreliable. >> >> > >> >> > BTW, if the expected start happens but fails, then Pacemaker will just >> >> > keep repeating until migration-threshold is hit, at which point it >> >> > will call the RA 'stop' action finally with start_expected=false. >> >> > So that's of no concern. >> >> >> >> To clarify, that's configurable, via start-failure-is-fatal and on-fail >> > >> > Sure. >> > >> >> > Maybe your point was that if the expected start never happens (so >> >> > never even gets a chance to fail), we still want to do a nova >> >> > service-disable? >> >> >> >> That is a good question, which might mean it should be done on every >> >> stop -- or could that cause problems (besides delays)? >> > >> > No, the whole point of adding this feature is to avoid a >> > service-disable on every stop, and instead only do it on the final >> > stop. If there are corner cases where we never reach the final stop, >> > that's not a disaster because nova will eventually figure it out and >> > do the right thing when the server-agent connection times out. >> > >> >> Another aspect of this is that the proposed feature could only look at a >> >> single transition. What if stop is called with start_expected=false, but >> >> then Pacemaker is able to start the service on the same node in the next >> >> transition immediately afterward? Would having called service-disable >> >> cause problems for that start? >> > >> > We would also need to ensure that service-enable is called on start >> > when necessary. Perhaps we could track the enable/disable state in a >> > local temporary file, and if the file indicates that we've previously >> > done service-disable, we know to run service-enable on start. This >> > would avoid calling service-enable on every single start. >> >> feels like an over-optimization >> in fact, the whole thing feels like that if i'm honest. > > Huh ... You didn't seem to think that when we discussed automating > service-disable at length in Austin. I didn't feel the need to push back because RH uses the systemd agent instead so you're only hanging yourself, but more importantly because the proposed implementation to facilitate it wasn't leading RA writers down a hazardous path :-) > What changed? Can you suggest a > better approach? Either always or never disable the service would be my advice. "Always" specifically getting my vote. > >> why are we trying to optimise the projected performance impact > > It's not really "projected"; we know exactly what the impact is. And > it's not really a performance impact either. If nova-compute (or a > dependency) is malfunctioning on a compute node, there will be a > window (bounded by nova.conf's rpc_response_timeout value, IIUC) in > which nova-scheduler could still schedule VMs onto that compute node, > and then of course they'll fail to boot. Right, but that window exists regardless of whether the node is or is not ever coming back. And as we already discussed, the proposed feature still leaves you open to this window because we can't know if the expected restart will ever happen. In this context, trying to avoid the disable call under certain circumstances, to avoid repeated and frequent flip-flopping of the state, seems ill-advised. At the point nova compute is bouncing up and down like that, you have a more fundamental issue somewhere in your
Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?
On Wed, Jun 8, 2016 at 10:29 AM, Andrew Beekhofwrote: > On Wed, Jun 8, 2016 at 12:11 AM, Adam Spiers wrote: >> Ken Gaillot wrote: >>> On 06/06/2016 05:45 PM, Adam Spiers wrote: >>> > Adam Spiers wrote: >>> >> Andrew Beekhof wrote: >>> >>> On Tue, Jun 7, 2016 at 8:29 AM, Adam Spiers wrote: >>> Ken Gaillot wrote: >>> > My main question is how useful would it actually be in the proposed >>> > use >>> > cases. Considering the possibility that the expected start might never >>> > happen (or fail), can an RA really do anything different if >>> > start_expected=true? >>> >>> That's the wrong question :-) >>> >>> > If the use case is there, I have no problem with >>> > adding it, but I want to make sure it's worthwhile. >>> >>> The use case which started this whole thread is for >>> start_expected=false, not start_expected=true. >>> >>> >>> >>> Isn't this just two sides of the same coin? >>> >>> If you're not doing the same thing for both cases, then you're just >>> >>> reversing the order of the clauses. >>> >> >>> >> No, because the stated concern about unreliable expectations >>> >> ("Considering the possibility that the expected start might never >>> >> happen (or fail)") was regarding start_expected=true, and that's the >>> >> side of the coin we don't care about, so it doesn't matter if it's >>> >> unreliable. >>> > >>> > BTW, if the expected start happens but fails, then Pacemaker will just >>> > keep repeating until migration-threshold is hit, at which point it >>> > will call the RA 'stop' action finally with start_expected=false. >>> > So that's of no concern. >>> >>> To clarify, that's configurable, via start-failure-is-fatal and on-fail >> >> Sure. >> >>> > Maybe your point was that if the expected start never happens (so >>> > never even gets a chance to fail), we still want to do a nova >>> > service-disable? >>> >>> That is a good question, which might mean it should be done on every >>> stop -- or could that cause problems (besides delays)? >> >> No, the whole point of adding this feature is to avoid a >> service-disable on every stop, and instead only do it on the final >> stop. If there are corner cases where we never reach the final stop, >> that's not a disaster because nova will eventually figure it out and >> do the right thing when the server-agent connection times out. >> >>> Another aspect of this is that the proposed feature could only look at a >>> single transition. What if stop is called with start_expected=false, but >>> then Pacemaker is able to start the service on the same node in the next >>> transition immediately afterward? Would having called service-disable >>> cause problems for that start? >> >> We would also need to ensure that service-enable is called on start >> when necessary. Perhaps we could track the enable/disable state in a >> local temporary file, and if the file indicates that we've previously >> done service-disable, we know to run service-enable on start. This >> would avoid calling service-enable on every single start. > > feels like an over-optimization > in fact, the whole thing feels like that if i'm honest. Today the stars aligned :-) http://xkcd.com/1691/ > > why are we trying to optimise the projected performance impact when > the system is in terrible shape already? > >> >>> > Yes that would be nice, but this proposal was never intended to >>> > address that. I guess we'd need an entirely different mechanism in >>> > Pacemaker for that. But let's not allow perfection to become the >>> > enemy of the good ;-) >>> >>> The ultimate concern is that this will encourage people to write RAs >>> that leave services in a dangerous state after stop is called. >> >> I don't see why it would. > > Previous experience suggests it definitely will. > > People will do exactly what you're thinking but with something important. > They'll see it behaves as they expect in best-case testing and never > think about the corner cases. > Then they'll start thinking about optimising their start operations, > write some "optimistic" state recording code and break those too. > > Imagine a bug in your state recording code (maybe you forget to handle > a missing state file after reboot) that means the 'enable' does't get > run. The service is up, but nova will never use it. > >> The new feature will be obscure enough that >> noone would be able to use it without reading the corresponding >> documentation first anyway. > > I like your optimism. > >> >>> I think with naming and documenting it properly, I'm fine to provide the >>> option, but I'm on the fence. Beekhof needs a little more convincing :-) >> >> Can you provide an example of a potential real-world situation where >> an RA author would end up accidentally abusing the feature? > > You want a real-world
Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?
On Wed, Jun 8, 2016 at 12:11 AM, Adam Spierswrote: > Ken Gaillot wrote: >> On 06/06/2016 05:45 PM, Adam Spiers wrote: >> > Adam Spiers wrote: >> >> Andrew Beekhof wrote: >> >>> On Tue, Jun 7, 2016 at 8:29 AM, Adam Spiers wrote: >> Ken Gaillot wrote: >> > My main question is how useful would it actually be in the proposed use >> > cases. Considering the possibility that the expected start might never >> > happen (or fail), can an RA really do anything different if >> > start_expected=true? >> >> That's the wrong question :-) >> >> > If the use case is there, I have no problem with >> > adding it, but I want to make sure it's worthwhile. >> >> The use case which started this whole thread is for >> start_expected=false, not start_expected=true. >> >>> >> >>> Isn't this just two sides of the same coin? >> >>> If you're not doing the same thing for both cases, then you're just >> >>> reversing the order of the clauses. >> >> >> >> No, because the stated concern about unreliable expectations >> >> ("Considering the possibility that the expected start might never >> >> happen (or fail)") was regarding start_expected=true, and that's the >> >> side of the coin we don't care about, so it doesn't matter if it's >> >> unreliable. >> > >> > BTW, if the expected start happens but fails, then Pacemaker will just >> > keep repeating until migration-threshold is hit, at which point it >> > will call the RA 'stop' action finally with start_expected=false. >> > So that's of no concern. >> >> To clarify, that's configurable, via start-failure-is-fatal and on-fail > > Sure. > >> > Maybe your point was that if the expected start never happens (so >> > never even gets a chance to fail), we still want to do a nova >> > service-disable? >> >> That is a good question, which might mean it should be done on every >> stop -- or could that cause problems (besides delays)? > > No, the whole point of adding this feature is to avoid a > service-disable on every stop, and instead only do it on the final > stop. If there are corner cases where we never reach the final stop, > that's not a disaster because nova will eventually figure it out and > do the right thing when the server-agent connection times out. > >> Another aspect of this is that the proposed feature could only look at a >> single transition. What if stop is called with start_expected=false, but >> then Pacemaker is able to start the service on the same node in the next >> transition immediately afterward? Would having called service-disable >> cause problems for that start? > > We would also need to ensure that service-enable is called on start > when necessary. Perhaps we could track the enable/disable state in a > local temporary file, and if the file indicates that we've previously > done service-disable, we know to run service-enable on start. This > would avoid calling service-enable on every single start. feels like an over-optimization in fact, the whole thing feels like that if i'm honest. why are we trying to optimise the projected performance impact when the system is in terrible shape already? > >> > Yes that would be nice, but this proposal was never intended to >> > address that. I guess we'd need an entirely different mechanism in >> > Pacemaker for that. But let's not allow perfection to become the >> > enemy of the good ;-) >> >> The ultimate concern is that this will encourage people to write RAs >> that leave services in a dangerous state after stop is called. > > I don't see why it would. Previous experience suggests it definitely will. People will do exactly what you're thinking but with something important. They'll see it behaves as they expect in best-case testing and never think about the corner cases. Then they'll start thinking about optimising their start operations, write some "optimistic" state recording code and break those too. Imagine a bug in your state recording code (maybe you forget to handle a missing state file after reboot) that means the 'enable' does't get run. The service is up, but nova will never use it. > The new feature will be obscure enough that > noone would be able to use it without reading the corresponding > documentation first anyway. I like your optimism. > >> I think with naming and documenting it properly, I'm fine to provide the >> option, but I'm on the fence. Beekhof needs a little more convincing :-) > > Can you provide an example of a potential real-world situation where > an RA author would end up accidentally abusing the feature? You want a real-world example of how someone could accidentally mis-using a feature that doesn't exist yet? Um... if we knew all the weird and wonderful ways people break our code we'd be able to build a better mouse trap. > > Thanks a lot for your continued attention on this! > > Adam > >
Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?
Ken Gaillotwrote: > On 06/06/2016 05:45 PM, Adam Spiers wrote: > > Adam Spiers wrote: > >> Andrew Beekhof wrote: > >>> On Tue, Jun 7, 2016 at 8:29 AM, Adam Spiers wrote: > Ken Gaillot wrote: > > My main question is how useful would it actually be in the proposed use > > cases. Considering the possibility that the expected start might never > > happen (or fail), can an RA really do anything different if > > start_expected=true? > > That's the wrong question :-) > > > If the use case is there, I have no problem with > > adding it, but I want to make sure it's worthwhile. > > The use case which started this whole thread is for > start_expected=false, not start_expected=true. > >>> > >>> Isn't this just two sides of the same coin? > >>> If you're not doing the same thing for both cases, then you're just > >>> reversing the order of the clauses. > >> > >> No, because the stated concern about unreliable expectations > >> ("Considering the possibility that the expected start might never > >> happen (or fail)") was regarding start_expected=true, and that's the > >> side of the coin we don't care about, so it doesn't matter if it's > >> unreliable. > > > > BTW, if the expected start happens but fails, then Pacemaker will just > > keep repeating until migration-threshold is hit, at which point it > > will call the RA 'stop' action finally with start_expected=false. > > So that's of no concern. > > To clarify, that's configurable, via start-failure-is-fatal and on-fail Sure. > > Maybe your point was that if the expected start never happens (so > > never even gets a chance to fail), we still want to do a nova > > service-disable? > > That is a good question, which might mean it should be done on every > stop -- or could that cause problems (besides delays)? No, the whole point of adding this feature is to avoid a service-disable on every stop, and instead only do it on the final stop. If there are corner cases where we never reach the final stop, that's not a disaster because nova will eventually figure it out and do the right thing when the server-agent connection times out. > Another aspect of this is that the proposed feature could only look at a > single transition. What if stop is called with start_expected=false, but > then Pacemaker is able to start the service on the same node in the next > transition immediately afterward? Would having called service-disable > cause problems for that start? We would also need to ensure that service-enable is called on start when necessary. Perhaps we could track the enable/disable state in a local temporary file, and if the file indicates that we've previously done service-disable, we know to run service-enable on start. This would avoid calling service-enable on every single start. > > Yes that would be nice, but this proposal was never intended to > > address that. I guess we'd need an entirely different mechanism in > > Pacemaker for that. But let's not allow perfection to become the > > enemy of the good ;-) > > The ultimate concern is that this will encourage people to write RAs > that leave services in a dangerous state after stop is called. I don't see why it would. The new feature will be obscure enough that noone would be able to use it without reading the corresponding documentation first anyway. > I think with naming and documenting it properly, I'm fine to provide the > option, but I'm on the fence. Beekhof needs a little more convincing :-) Can you provide an example of a potential real-world situation where an RA author would end up accidentally abusing the feature? Thanks a lot for your continued attention on this! Adam ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?
07.06.2016 02:20, Ken Gaillot wrote: On 06/06/2016 03:30 PM, Vladislav Bogdanov wrote: 06.06.2016 22:43, Ken Gaillot wrote: On 06/06/2016 12:25 PM, Vladislav Bogdanov wrote: 06.06.2016 19:39, Ken Gaillot wrote: On 06/05/2016 07:27 PM, Andrew Beekhof wrote: On Sat, Jun 4, 2016 at 12:16 AM, Ken Gaillotwrote: On 06/02/2016 08:01 PM, Andrew Beekhof wrote: On Fri, May 20, 2016 at 1:53 AM, Ken Gaillot wrote: A recent thread discussed a proposed new feature, a new environment variable that would be passed to resource agents, indicating whether a stop action was part of a recovery. Since that thread was long and covered a lot of topics, I'm starting a new one to focus on the core issue remaining: The original idea was to pass the number of restarts remaining before the resource will no longer tried to be started on the same node. This involves calculating (fail-count - migration-threshold), and that implies certain limitations: (1) it will only be set when the cluster checks migration-threshold; (2) it will only be set for the failed resource itself, not for other resources that may be recovered due to dependencies on it. Ulrich Windl proposed an alternative: setting a boolean value instead. I forgot to cc the list on my reply, so I'll summarize now: We would set a new variable like OCF_RESKEY_CRM_recovery=true This concept worries me, especially when what we've implemented is called OCF_RESKEY_CRM_restarting. Agreed; I plan to rename it yet again, to OCF_RESKEY_CRM_start_expected. The name alone encourages people to "optimise" the agent to not actually stop the service "because its just going to start again shortly". I know thats not what Adam would do, but not everyone understands how clusters work. There are any number of reasons why a cluster that intends to restart a service may not do so. In such a scenario, a badly written agent would cause the cluster to mistakenly believe that the service is stopped - allowing it to start elsewhere. Its true there are any number of ways to write bad agents, but I would argue that we shouldn't be nudging people in that direction :) I do have mixed feelings about that. I think if we name it start_expected, and document it carefully, we can avoid any casual mistakes. My main question is how useful would it actually be in the proposed use cases. Considering the possibility that the expected start might never happen (or fail), can an RA really do anything different if start_expected=true? I would have thought not. Correctness should trump optimal. But I'm prepared to be mistaken. If the use case is there, I have no problem with adding it, but I want to make sure it's worthwhile. Anyone have comments on this? A simple example: pacemaker calls an RA stop with start_expected=true, then before the start happens, someone disables the resource, so the start is never called. Or the node is fenced before the start happens, etc. Is there anything significant an RA can do differently based on start_expected=true/false without causing problems if an expected start never happens? Yep. It may request stop of other resources * on that node by removing some node attributes which participate in location constraints * or cluster-wide by revoking/putting to standby cluster ticket other resources depend on Latter case is that's why I asked about the possibility of passing the node name resource is intended to be started on instead of a boolean value (in comments to PR #1026) - I would use it to request stop of lustre MDTs and OSTs by revoking ticket they depend on if MGS (primary lustre component which does all "request routing") fails to start anywhere in cluster. That way, if RA does not receive any node name, Why would ordering constraints be insufficient? They are in place, but advisory ones to allow MGS fail/switch-over. What happens if the MDTs/OSTs continue running because a start of MGS was expected, but something prevents the start from actually happening? Nothing critical, lustre clients won't be able to contact them without MGS running and will hang. But it is safer to shutdown them if it is known that MGS cannot be started right now. Especially if geo-cluster failover is expected in that case (as MGS can be local to a site, countrary to all other lustre parts which need to be replicated). Actually that is the only part of a puzzle remaining to "solve" that big project, and IMHO it is enough to have a node name of a intended start or nothing in that attribute (nothing means stop everything and initiate geo-failover if needed). If f.e. fencing happens for a node intended to start resource, then stop will be called again after the next start failure after failure-timeout lapses. That would be much better than no information at all. Total stop or geo-failover will happen just with some (configurable) delay instead of rendering the whole filesystem to an unusable state requiring manual
Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?
On Tue, Jun 7, 2016 at 9:07 AM, Ken Gaillotwrote: > On 06/06/2016 05:45 PM, Adam Spiers wrote: >> Adam Spiers wrote: >>> Andrew Beekhof wrote: On Tue, Jun 7, 2016 at 8:29 AM, Adam Spiers wrote: > Ken Gaillot wrote: >> My main question is how useful would it actually be in the proposed use >> cases. Considering the possibility that the expected start might never >> happen (or fail), can an RA really do anything different if >> start_expected=true? > > That's the wrong question :-) > >> If the use case is there, I have no problem with >> adding it, but I want to make sure it's worthwhile. > > The use case which started this whole thread is for > start_expected=false, not start_expected=true. Isn't this just two sides of the same coin? If you're not doing the same thing for both cases, then you're just reversing the order of the clauses. >>> >>> No, because the stated concern about unreliable expectations >>> ("Considering the possibility that the expected start might never >>> happen (or fail)") was regarding start_expected=true, and that's the >>> side of the coin we don't care about, so it doesn't matter if it's >>> unreliable. >> >> BTW, if the expected start happens but fails, then Pacemaker will just >> keep repeating until migration-threshold is hit, at which point it >> will call the RA 'stop' action finally with start_expected=false. >> So that's of no concern. > > To clarify, that's configurable, via start-failure-is-fatal and on-fail > >> Maybe your point was that if the expected start never happens (so >> never even gets a chance to fail), we still want to do a nova >> service-disable? > > That is a good question, which might mean it should be done on every > stop -- or could that cause problems (besides delays)? > > Another aspect of this is that the proposed feature could only look at a > single transition. What if stop is called with start_expected=false, but > then Pacemaker is able to start the service on the same node in the next > transition immediately afterward? Would having called service-disable > cause problems for that start? > >> Yes that would be nice, but this proposal was never intended to >> address that. I guess we'd need an entirely different mechanism in >> Pacemaker for that. But let's not allow perfection to become the >> enemy of the good ;-) > > The ultimate concern is that this will encourage people to write RAs > that leave services in a dangerous state after stop is called. > > I think with naming and documenting it properly, I'm fine to provide the > option, but I'm on the fence. Beekhof needs a little more convincing :-) I think the new name is a big step in the right direction ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?
On Tue, Jun 7, 2016 at 8:45 AM, Adam Spierswrote: > Adam Spiers wrote: >> Andrew Beekhof wrote: >> > On Tue, Jun 7, 2016 at 8:29 AM, Adam Spiers wrote: >> > > Ken Gaillot wrote: >> > >> My main question is how useful would it actually be in the proposed use >> > >> cases. Considering the possibility that the expected start might never >> > >> happen (or fail), can an RA really do anything different if >> > >> start_expected=true? >> > > >> > > That's the wrong question :-) >> > > >> > >> If the use case is there, I have no problem with >> > >> adding it, but I want to make sure it's worthwhile. >> > > >> > > The use case which started this whole thread is for >> > > start_expected=false, not start_expected=true. >> > >> > Isn't this just two sides of the same coin? >> > If you're not doing the same thing for both cases, then you're just >> > reversing the order of the clauses. >> >> No, because the stated concern about unreliable expectations >> ("Considering the possibility that the expected start might never >> happen (or fail)") was regarding start_expected=true, and that's the >> side of the coin we don't care about, so it doesn't matter if it's >> unreliable. > > BTW, if the expected start happens but fails, then Pacemaker will just > keep repeating until migration-threshold is hit, at which point it > will call the RA 'stop' action finally with start_expected=false. Maybe. Maybe not. People cannot rely on this and I'd put money on them trying :-) > So that's of no concern. > > Maybe your point was that if the expected start never happens (so > never even gets a chance to fail), we still want to do a nova > service-disable? Exactly :) > > Yes that would be nice, but this proposal was never intended to > address that. I guess we'd need an entirely different mechanism in > Pacemaker for that. But let's not allow perfection to become the > enemy of the good ;-) > > ___ > Users mailing list: Users@clusterlabs.org > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?
On 06/06/2016 03:30 PM, Vladislav Bogdanov wrote: > 06.06.2016 22:43, Ken Gaillot wrote: >> On 06/06/2016 12:25 PM, Vladislav Bogdanov wrote: >>> 06.06.2016 19:39, Ken Gaillot wrote: On 06/05/2016 07:27 PM, Andrew Beekhof wrote: > On Sat, Jun 4, 2016 at 12:16 AM, Ken Gaillot> wrote: >> On 06/02/2016 08:01 PM, Andrew Beekhof wrote: >>> On Fri, May 20, 2016 at 1:53 AM, Ken Gaillot >>> wrote: A recent thread discussed a proposed new feature, a new environment variable that would be passed to resource agents, indicating whether a stop action was part of a recovery. Since that thread was long and covered a lot of topics, I'm starting a new one to focus on the core issue remaining: The original idea was to pass the number of restarts remaining before the resource will no longer tried to be started on the same node. This involves calculating (fail-count - migration-threshold), and that implies certain limitations: (1) it will only be set when the cluster checks migration-threshold; (2) it will only be set for the failed resource itself, not for other resources that may be recovered due to dependencies on it. Ulrich Windl proposed an alternative: setting a boolean value instead. I forgot to cc the list on my reply, so I'll summarize now: We would set a new variable like OCF_RESKEY_CRM_recovery=true >>> >>> This concept worries me, especially when what we've implemented is >>> called OCF_RESKEY_CRM_restarting. >> >> Agreed; I plan to rename it yet again, to >> OCF_RESKEY_CRM_start_expected. >> >>> The name alone encourages people to "optimise" the agent to not >>> actually stop the service "because its just going to start again >>> shortly". I know thats not what Adam would do, but not everyone >>> understands how clusters work. >>> >>> There are any number of reasons why a cluster that intends to >>> restart >>> a service may not do so. In such a scenario, a badly written agent >>> would cause the cluster to mistakenly believe that the service is >>> stopped - allowing it to start elsewhere. >>> >>> Its true there are any number of ways to write bad agents, but I >>> would >>> argue that we shouldn't be nudging people in that direction :) >> >> I do have mixed feelings about that. I think if we name it >> start_expected, and document it carefully, we can avoid any casual >> mistakes. >> >> My main question is how useful would it actually be in the >> proposed use >> cases. Considering the possibility that the expected start might >> never >> happen (or fail), can an RA really do anything different if >> start_expected=true? > > I would have thought not. Correctness should trump optimal. > But I'm prepared to be mistaken. > >> If the use case is there, I have no problem with >> adding it, but I want to make sure it's worthwhile. Anyone have comments on this? A simple example: pacemaker calls an RA stop with start_expected=true, then before the start happens, someone disables the resource, so the start is never called. Or the node is fenced before the start happens, etc. Is there anything significant an RA can do differently based on start_expected=true/false without causing problems if an expected start never happens? >>> >>> Yep. >>> >>> It may request stop of other resources >>> * on that node by removing some node attributes which participate in >>> location constraints >>> * or cluster-wide by revoking/putting to standby cluster ticket other >>> resources depend on >>> >>> Latter case is that's why I asked about the possibility of passing the >>> node name resource is intended to be started on instead of a boolean >>> value (in comments to PR #1026) - I would use it to request stop of >>> lustre MDTs and OSTs by revoking ticket they depend on if MGS (primary >>> lustre component which does all "request routing") fails to start >>> anywhere in cluster. That way, if RA does not receive any node name, >> >> Why would ordering constraints be insufficient? > > They are in place, but advisory ones to allow MGS fail/switch-over. >> >> What happens if the MDTs/OSTs continue running because a start of MGS >> was expected, but something prevents the start from actually happening? > > Nothing critical, lustre clients won't be able to contact them without > MGS running and will hang. > But it is safer to shutdown them if it is known that MGS cannot be > started right now. Especially if geo-cluster failover is expected in > that case (as MGS can be local to a site, countrary to all other lustre > parts which need to
Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?
On 06/06/2016 05:45 PM, Adam Spiers wrote: > Adam Spierswrote: >> Andrew Beekhof wrote: >>> On Tue, Jun 7, 2016 at 8:29 AM, Adam Spiers wrote: Ken Gaillot wrote: > My main question is how useful would it actually be in the proposed use > cases. Considering the possibility that the expected start might never > happen (or fail), can an RA really do anything different if > start_expected=true? That's the wrong question :-) > If the use case is there, I have no problem with > adding it, but I want to make sure it's worthwhile. The use case which started this whole thread is for start_expected=false, not start_expected=true. >>> >>> Isn't this just two sides of the same coin? >>> If you're not doing the same thing for both cases, then you're just >>> reversing the order of the clauses. >> >> No, because the stated concern about unreliable expectations >> ("Considering the possibility that the expected start might never >> happen (or fail)") was regarding start_expected=true, and that's the >> side of the coin we don't care about, so it doesn't matter if it's >> unreliable. > > BTW, if the expected start happens but fails, then Pacemaker will just > keep repeating until migration-threshold is hit, at which point it > will call the RA 'stop' action finally with start_expected=false. > So that's of no concern. To clarify, that's configurable, via start-failure-is-fatal and on-fail > Maybe your point was that if the expected start never happens (so > never even gets a chance to fail), we still want to do a nova > service-disable? That is a good question, which might mean it should be done on every stop -- or could that cause problems (besides delays)? Another aspect of this is that the proposed feature could only look at a single transition. What if stop is called with start_expected=false, but then Pacemaker is able to start the service on the same node in the next transition immediately afterward? Would having called service-disable cause problems for that start? > Yes that would be nice, but this proposal was never intended to > address that. I guess we'd need an entirely different mechanism in > Pacemaker for that. But let's not allow perfection to become the > enemy of the good ;-) The ultimate concern is that this will encourage people to write RAs that leave services in a dangerous state after stop is called. I think with naming and documenting it properly, I'm fine to provide the option, but I'm on the fence. Beekhof needs a little more convincing :-) ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?
Adam Spierswrote: > Andrew Beekhof wrote: > > On Tue, Jun 7, 2016 at 8:29 AM, Adam Spiers wrote: > > > Ken Gaillot wrote: > > >> My main question is how useful would it actually be in the proposed use > > >> cases. Considering the possibility that the expected start might never > > >> happen (or fail), can an RA really do anything different if > > >> start_expected=true? > > > > > > That's the wrong question :-) > > > > > >> If the use case is there, I have no problem with > > >> adding it, but I want to make sure it's worthwhile. > > > > > > The use case which started this whole thread is for > > > start_expected=false, not start_expected=true. > > > > Isn't this just two sides of the same coin? > > If you're not doing the same thing for both cases, then you're just > > reversing the order of the clauses. > > No, because the stated concern about unreliable expectations > ("Considering the possibility that the expected start might never > happen (or fail)") was regarding start_expected=true, and that's the > side of the coin we don't care about, so it doesn't matter if it's > unreliable. BTW, if the expected start happens but fails, then Pacemaker will just keep repeating until migration-threshold is hit, at which point it will call the RA 'stop' action finally with start_expected=false. So that's of no concern. Maybe your point was that if the expected start never happens (so never even gets a chance to fail), we still want to do a nova service-disable? Yes that would be nice, but this proposal was never intended to address that. I guess we'd need an entirely different mechanism in Pacemaker for that. But let's not allow perfection to become the enemy of the good ;-) ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?
Andrew Beekhofwrote: > On Tue, Jun 7, 2016 at 8:29 AM, Adam Spiers wrote: > > Ken Gaillot wrote: > >> On 06/02/2016 08:01 PM, Andrew Beekhof wrote: > >> > On Fri, May 20, 2016 at 1:53 AM, Ken Gaillot wrote: > >> >> A recent thread discussed a proposed new feature, a new environment > >> >> variable that would be passed to resource agents, indicating whether a > >> >> stop action was part of a recovery. > >> >> > >> >> Since that thread was long and covered a lot of topics, I'm starting a > >> >> new one to focus on the core issue remaining: > >> >> > >> >> The original idea was to pass the number of restarts remaining before > >> >> the resource will no longer tried to be started on the same node. This > >> >> involves calculating (fail-count - migration-threshold), and that > >> >> implies certain limitations: (1) it will only be set when the cluster > >> >> checks migration-threshold; (2) it will only be set for the failed > >> >> resource itself, not for other resources that may be recovered due to > >> >> dependencies on it. > >> >> > >> >> Ulrich Windl proposed an alternative: setting a boolean value instead. I > >> >> forgot to cc the list on my reply, so I'll summarize now: We would set a > >> >> new variable like OCF_RESKEY_CRM_recovery=true > >> > > >> > This concept worries me, especially when what we've implemented is > >> > called OCF_RESKEY_CRM_restarting. > >> > >> Agreed; I plan to rename it yet again, to OCF_RESKEY_CRM_start_expected. > > > > [snipped] > > > >> My main question is how useful would it actually be in the proposed use > >> cases. Considering the possibility that the expected start might never > >> happen (or fail), can an RA really do anything different if > >> start_expected=true? > > > > That's the wrong question :-) > > > >> If the use case is there, I have no problem with > >> adding it, but I want to make sure it's worthwhile. > > > > The use case which started this whole thread is for > > start_expected=false, not start_expected=true. > > Isn't this just two sides of the same coin? > If you're not doing the same thing for both cases, then you're just > reversing the order of the clauses. No, because the stated concern about unreliable expectations ("Considering the possibility that the expected start might never happen (or fail)") was regarding start_expected=true, and that's the side of the coin we don't care about, so it doesn't matter if it's unreliable. ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?
On Tue, Jun 7, 2016 at 8:29 AM, Adam Spierswrote: > Ken Gaillot wrote: >> On 06/02/2016 08:01 PM, Andrew Beekhof wrote: >> > On Fri, May 20, 2016 at 1:53 AM, Ken Gaillot wrote: >> >> A recent thread discussed a proposed new feature, a new environment >> >> variable that would be passed to resource agents, indicating whether a >> >> stop action was part of a recovery. >> >> >> >> Since that thread was long and covered a lot of topics, I'm starting a >> >> new one to focus on the core issue remaining: >> >> >> >> The original idea was to pass the number of restarts remaining before >> >> the resource will no longer tried to be started on the same node. This >> >> involves calculating (fail-count - migration-threshold), and that >> >> implies certain limitations: (1) it will only be set when the cluster >> >> checks migration-threshold; (2) it will only be set for the failed >> >> resource itself, not for other resources that may be recovered due to >> >> dependencies on it. >> >> >> >> Ulrich Windl proposed an alternative: setting a boolean value instead. I >> >> forgot to cc the list on my reply, so I'll summarize now: We would set a >> >> new variable like OCF_RESKEY_CRM_recovery=true >> > >> > This concept worries me, especially when what we've implemented is >> > called OCF_RESKEY_CRM_restarting. >> >> Agreed; I plan to rename it yet again, to OCF_RESKEY_CRM_start_expected. > > [snipped] > >> My main question is how useful would it actually be in the proposed use >> cases. Considering the possibility that the expected start might never >> happen (or fail), can an RA really do anything different if >> start_expected=true? > > That's the wrong question :-) > >> If the use case is there, I have no problem with >> adding it, but I want to make sure it's worthwhile. > > The use case which started this whole thread is for > start_expected=false, not start_expected=true. Isn't this just two sides of the same coin? If you're not doing the same thing for both cases, then you're just reversing the order of the clauses. "A isn't different from B, B is different from A!" :-) > When it's false for > NovaCompute, we call nova service-disable to ensure that nova doesn't > attempt to schedule any more VMs on that host. > > If start_expected=true, we don't *want* to do anything different. So > it doesn't matter even if the expected start never happens. ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?
Ken Gaillotwrote: > On 06/02/2016 08:01 PM, Andrew Beekhof wrote: > > On Fri, May 20, 2016 at 1:53 AM, Ken Gaillot wrote: > >> A recent thread discussed a proposed new feature, a new environment > >> variable that would be passed to resource agents, indicating whether a > >> stop action was part of a recovery. > >> > >> Since that thread was long and covered a lot of topics, I'm starting a > >> new one to focus on the core issue remaining: > >> > >> The original idea was to pass the number of restarts remaining before > >> the resource will no longer tried to be started on the same node. This > >> involves calculating (fail-count - migration-threshold), and that > >> implies certain limitations: (1) it will only be set when the cluster > >> checks migration-threshold; (2) it will only be set for the failed > >> resource itself, not for other resources that may be recovered due to > >> dependencies on it. > >> > >> Ulrich Windl proposed an alternative: setting a boolean value instead. I > >> forgot to cc the list on my reply, so I'll summarize now: We would set a > >> new variable like OCF_RESKEY_CRM_recovery=true > > > > This concept worries me, especially when what we've implemented is > > called OCF_RESKEY_CRM_restarting. > > Agreed; I plan to rename it yet again, to OCF_RESKEY_CRM_start_expected. [snipped] > My main question is how useful would it actually be in the proposed use > cases. Considering the possibility that the expected start might never > happen (or fail), can an RA really do anything different if > start_expected=true? That's the wrong question :-) > If the use case is there, I have no problem with > adding it, but I want to make sure it's worthwhile. The use case which started this whole thread is for start_expected=false, not start_expected=true. When it's false for NovaCompute, we call nova service-disable to ensure that nova doesn't attempt to schedule any more VMs on that host. If start_expected=true, we don't *want* to do anything different. So it doesn't matter even if the expected start never happens. ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?
06.06.2016 22:43, Ken Gaillot wrote: On 06/06/2016 12:25 PM, Vladislav Bogdanov wrote: 06.06.2016 19:39, Ken Gaillot wrote: On 06/05/2016 07:27 PM, Andrew Beekhof wrote: On Sat, Jun 4, 2016 at 12:16 AM, Ken Gaillotwrote: On 06/02/2016 08:01 PM, Andrew Beekhof wrote: On Fri, May 20, 2016 at 1:53 AM, Ken Gaillot wrote: A recent thread discussed a proposed new feature, a new environment variable that would be passed to resource agents, indicating whether a stop action was part of a recovery. Since that thread was long and covered a lot of topics, I'm starting a new one to focus on the core issue remaining: The original idea was to pass the number of restarts remaining before the resource will no longer tried to be started on the same node. This involves calculating (fail-count - migration-threshold), and that implies certain limitations: (1) it will only be set when the cluster checks migration-threshold; (2) it will only be set for the failed resource itself, not for other resources that may be recovered due to dependencies on it. Ulrich Windl proposed an alternative: setting a boolean value instead. I forgot to cc the list on my reply, so I'll summarize now: We would set a new variable like OCF_RESKEY_CRM_recovery=true This concept worries me, especially when what we've implemented is called OCF_RESKEY_CRM_restarting. Agreed; I plan to rename it yet again, to OCF_RESKEY_CRM_start_expected. The name alone encourages people to "optimise" the agent to not actually stop the service "because its just going to start again shortly". I know thats not what Adam would do, but not everyone understands how clusters work. There are any number of reasons why a cluster that intends to restart a service may not do so. In such a scenario, a badly written agent would cause the cluster to mistakenly believe that the service is stopped - allowing it to start elsewhere. Its true there are any number of ways to write bad agents, but I would argue that we shouldn't be nudging people in that direction :) I do have mixed feelings about that. I think if we name it start_expected, and document it carefully, we can avoid any casual mistakes. My main question is how useful would it actually be in the proposed use cases. Considering the possibility that the expected start might never happen (or fail), can an RA really do anything different if start_expected=true? I would have thought not. Correctness should trump optimal. But I'm prepared to be mistaken. If the use case is there, I have no problem with adding it, but I want to make sure it's worthwhile. Anyone have comments on this? A simple example: pacemaker calls an RA stop with start_expected=true, then before the start happens, someone disables the resource, so the start is never called. Or the node is fenced before the start happens, etc. Is there anything significant an RA can do differently based on start_expected=true/false without causing problems if an expected start never happens? Yep. It may request stop of other resources * on that node by removing some node attributes which participate in location constraints * or cluster-wide by revoking/putting to standby cluster ticket other resources depend on Latter case is that's why I asked about the possibility of passing the node name resource is intended to be started on instead of a boolean value (in comments to PR #1026) - I would use it to request stop of lustre MDTs and OSTs by revoking ticket they depend on if MGS (primary lustre component which does all "request routing") fails to start anywhere in cluster. That way, if RA does not receive any node name, Why would ordering constraints be insufficient? They are in place, but advisory ones to allow MGS fail/switch-over. What happens if the MDTs/OSTs continue running because a start of MGS was expected, but something prevents the start from actually happening? Nothing critical, lustre clients won't be able to contact them without MGS running and will hang. But it is safer to shutdown them if it is known that MGS cannot be started right now. Especially if geo-cluster failover is expected in that case (as MGS can be local to a site, countrary to all other lustre parts which need to be replicated). Actually that is the only part of a puzzle remaining to "solve" that big project, and IMHO it is enough to have a node name of a intended start or nothing in that attribute (nothing means stop everything and initiate geo-failover if needed). If f.e. fencing happens for a node intended to start resource, then stop will be called again after the next start failure after failure-timeout lapses. That would be much better than no information at all. Total stop or geo-failover will happen just with some (configurable) delay instead of rendering the whole filesystem to an unusable state requiring manual intervention. then it can be "almost sure" pacemaker does not intend to restart
Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?
On 06/06/2016 12:25 PM, Vladislav Bogdanov wrote: > 06.06.2016 19:39, Ken Gaillot wrote: >> On 06/05/2016 07:27 PM, Andrew Beekhof wrote: >>> On Sat, Jun 4, 2016 at 12:16 AM, Ken Gaillot>>> wrote: On 06/02/2016 08:01 PM, Andrew Beekhof wrote: > On Fri, May 20, 2016 at 1:53 AM, Ken Gaillot > wrote: >> A recent thread discussed a proposed new feature, a new environment >> variable that would be passed to resource agents, indicating >> whether a >> stop action was part of a recovery. >> >> Since that thread was long and covered a lot of topics, I'm >> starting a >> new one to focus on the core issue remaining: >> >> The original idea was to pass the number of restarts remaining before >> the resource will no longer tried to be started on the same node. >> This >> involves calculating (fail-count - migration-threshold), and that >> implies certain limitations: (1) it will only be set when the cluster >> checks migration-threshold; (2) it will only be set for the failed >> resource itself, not for other resources that may be recovered due to >> dependencies on it. >> >> Ulrich Windl proposed an alternative: setting a boolean value >> instead. I >> forgot to cc the list on my reply, so I'll summarize now: We would >> set a >> new variable like OCF_RESKEY_CRM_recovery=true > > This concept worries me, especially when what we've implemented is > called OCF_RESKEY_CRM_restarting. Agreed; I plan to rename it yet again, to OCF_RESKEY_CRM_start_expected. > The name alone encourages people to "optimise" the agent to not > actually stop the service "because its just going to start again > shortly". I know thats not what Adam would do, but not everyone > understands how clusters work. > > There are any number of reasons why a cluster that intends to restart > a service may not do so. In such a scenario, a badly written agent > would cause the cluster to mistakenly believe that the service is > stopped - allowing it to start elsewhere. > > Its true there are any number of ways to write bad agents, but I would > argue that we shouldn't be nudging people in that direction :) I do have mixed feelings about that. I think if we name it start_expected, and document it carefully, we can avoid any casual mistakes. My main question is how useful would it actually be in the proposed use cases. Considering the possibility that the expected start might never happen (or fail), can an RA really do anything different if start_expected=true? >>> >>> I would have thought not. Correctness should trump optimal. >>> But I'm prepared to be mistaken. >>> If the use case is there, I have no problem with adding it, but I want to make sure it's worthwhile. >> >> Anyone have comments on this? >> >> A simple example: pacemaker calls an RA stop with start_expected=true, >> then before the start happens, someone disables the resource, so the >> start is never called. Or the node is fenced before the start happens, >> etc. >> >> Is there anything significant an RA can do differently based on >> start_expected=true/false without causing problems if an expected start >> never happens? > > Yep. > > It may request stop of other resources > * on that node by removing some node attributes which participate in > location constraints > * or cluster-wide by revoking/putting to standby cluster ticket other > resources depend on > > Latter case is that's why I asked about the possibility of passing the > node name resource is intended to be started on instead of a boolean > value (in comments to PR #1026) - I would use it to request stop of > lustre MDTs and OSTs by revoking ticket they depend on if MGS (primary > lustre component which does all "request routing") fails to start > anywhere in cluster. That way, if RA does not receive any node name, Why would ordering constraints be insufficient? What happens if the MDTs/OSTs continue running because a start of MGS was expected, but something prevents the start from actually happening? > then it can be "almost sure" pacemaker does not intend to restart > resource (yet) and can request it to stop everything else (because > filesystem is not usable anyways). Later, if another start attempt > (caused by failure-timeout expiration) succeeds, RA may grant the ticket > back, and all other resources start again. > > Best, > Vladislav ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?
06.06.2016 19:39, Ken Gaillot wrote: On 06/05/2016 07:27 PM, Andrew Beekhof wrote: On Sat, Jun 4, 2016 at 12:16 AM, Ken Gaillotwrote: On 06/02/2016 08:01 PM, Andrew Beekhof wrote: On Fri, May 20, 2016 at 1:53 AM, Ken Gaillot wrote: A recent thread discussed a proposed new feature, a new environment variable that would be passed to resource agents, indicating whether a stop action was part of a recovery. Since that thread was long and covered a lot of topics, I'm starting a new one to focus on the core issue remaining: The original idea was to pass the number of restarts remaining before the resource will no longer tried to be started on the same node. This involves calculating (fail-count - migration-threshold), and that implies certain limitations: (1) it will only be set when the cluster checks migration-threshold; (2) it will only be set for the failed resource itself, not for other resources that may be recovered due to dependencies on it. Ulrich Windl proposed an alternative: setting a boolean value instead. I forgot to cc the list on my reply, so I'll summarize now: We would set a new variable like OCF_RESKEY_CRM_recovery=true This concept worries me, especially when what we've implemented is called OCF_RESKEY_CRM_restarting. Agreed; I plan to rename it yet again, to OCF_RESKEY_CRM_start_expected. The name alone encourages people to "optimise" the agent to not actually stop the service "because its just going to start again shortly". I know thats not what Adam would do, but not everyone understands how clusters work. There are any number of reasons why a cluster that intends to restart a service may not do so. In such a scenario, a badly written agent would cause the cluster to mistakenly believe that the service is stopped - allowing it to start elsewhere. Its true there are any number of ways to write bad agents, but I would argue that we shouldn't be nudging people in that direction :) I do have mixed feelings about that. I think if we name it start_expected, and document it carefully, we can avoid any casual mistakes. My main question is how useful would it actually be in the proposed use cases. Considering the possibility that the expected start might never happen (or fail), can an RA really do anything different if start_expected=true? I would have thought not. Correctness should trump optimal. But I'm prepared to be mistaken. If the use case is there, I have no problem with adding it, but I want to make sure it's worthwhile. Anyone have comments on this? A simple example: pacemaker calls an RA stop with start_expected=true, then before the start happens, someone disables the resource, so the start is never called. Or the node is fenced before the start happens, etc. Is there anything significant an RA can do differently based on start_expected=true/false without causing problems if an expected start never happens? Yep. It may request stop of other resources * on that node by removing some node attributes which participate in location constraints * or cluster-wide by revoking/putting to standby cluster ticket other resources depend on Latter case is that's why I asked about the possibility of passing the node name resource is intended to be started on instead of a boolean value (in comments to PR #1026) - I would use it to request stop of lustre MDTs and OSTs by revoking ticket they depend on if MGS (primary lustre component which does all "request routing") fails to start anywhere in cluster. That way, if RA does not receive any node name, then it can be "almost sure" pacemaker does not intend to restart resource (yet) and can request it to stop everything else (because filesystem is not usable anyways). Later, if another start attempt (caused by failure-timeout expiration) succeeds, RA may grant the ticket back, and all other resources start again. Best, Vladislav ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?
On 06/05/2016 07:27 PM, Andrew Beekhof wrote: > On Sat, Jun 4, 2016 at 12:16 AM, Ken Gaillotwrote: >> On 06/02/2016 08:01 PM, Andrew Beekhof wrote: >>> On Fri, May 20, 2016 at 1:53 AM, Ken Gaillot wrote: A recent thread discussed a proposed new feature, a new environment variable that would be passed to resource agents, indicating whether a stop action was part of a recovery. Since that thread was long and covered a lot of topics, I'm starting a new one to focus on the core issue remaining: The original idea was to pass the number of restarts remaining before the resource will no longer tried to be started on the same node. This involves calculating (fail-count - migration-threshold), and that implies certain limitations: (1) it will only be set when the cluster checks migration-threshold; (2) it will only be set for the failed resource itself, not for other resources that may be recovered due to dependencies on it. Ulrich Windl proposed an alternative: setting a boolean value instead. I forgot to cc the list on my reply, so I'll summarize now: We would set a new variable like OCF_RESKEY_CRM_recovery=true >>> >>> This concept worries me, especially when what we've implemented is >>> called OCF_RESKEY_CRM_restarting. >> >> Agreed; I plan to rename it yet again, to OCF_RESKEY_CRM_start_expected. >> >>> The name alone encourages people to "optimise" the agent to not >>> actually stop the service "because its just going to start again >>> shortly". I know thats not what Adam would do, but not everyone >>> understands how clusters work. >>> >>> There are any number of reasons why a cluster that intends to restart >>> a service may not do so. In such a scenario, a badly written agent >>> would cause the cluster to mistakenly believe that the service is >>> stopped - allowing it to start elsewhere. >>> >>> Its true there are any number of ways to write bad agents, but I would >>> argue that we shouldn't be nudging people in that direction :) >> >> I do have mixed feelings about that. I think if we name it >> start_expected, and document it carefully, we can avoid any casual mistakes. >> >> My main question is how useful would it actually be in the proposed use >> cases. Considering the possibility that the expected start might never >> happen (or fail), can an RA really do anything different if >> start_expected=true? > > I would have thought not. Correctness should trump optimal. > But I'm prepared to be mistaken. > >> If the use case is there, I have no problem with >> adding it, but I want to make sure it's worthwhile. Anyone have comments on this? A simple example: pacemaker calls an RA stop with start_expected=true, then before the start happens, someone disables the resource, so the start is never called. Or the node is fenced before the start happens, etc. Is there anything significant an RA can do differently based on start_expected=true/false without causing problems if an expected start never happens? ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?
On Sat, Jun 4, 2016 at 12:16 AM, Ken Gaillotwrote: > On 06/02/2016 08:01 PM, Andrew Beekhof wrote: >> On Fri, May 20, 2016 at 1:53 AM, Ken Gaillot wrote: >>> A recent thread discussed a proposed new feature, a new environment >>> variable that would be passed to resource agents, indicating whether a >>> stop action was part of a recovery. >>> >>> Since that thread was long and covered a lot of topics, I'm starting a >>> new one to focus on the core issue remaining: >>> >>> The original idea was to pass the number of restarts remaining before >>> the resource will no longer tried to be started on the same node. This >>> involves calculating (fail-count - migration-threshold), and that >>> implies certain limitations: (1) it will only be set when the cluster >>> checks migration-threshold; (2) it will only be set for the failed >>> resource itself, not for other resources that may be recovered due to >>> dependencies on it. >>> >>> Ulrich Windl proposed an alternative: setting a boolean value instead. I >>> forgot to cc the list on my reply, so I'll summarize now: We would set a >>> new variable like OCF_RESKEY_CRM_recovery=true >> >> This concept worries me, especially when what we've implemented is >> called OCF_RESKEY_CRM_restarting. > > Agreed; I plan to rename it yet again, to OCF_RESKEY_CRM_start_expected. > >> The name alone encourages people to "optimise" the agent to not >> actually stop the service "because its just going to start again >> shortly". I know thats not what Adam would do, but not everyone >> understands how clusters work. >> >> There are any number of reasons why a cluster that intends to restart >> a service may not do so. In such a scenario, a badly written agent >> would cause the cluster to mistakenly believe that the service is >> stopped - allowing it to start elsewhere. >> >> Its true there are any number of ways to write bad agents, but I would >> argue that we shouldn't be nudging people in that direction :) > > I do have mixed feelings about that. I think if we name it > start_expected, and document it carefully, we can avoid any casual mistakes. > > My main question is how useful would it actually be in the proposed use > cases. Considering the possibility that the expected start might never > happen (or fail), can an RA really do anything different if > start_expected=true? I would have thought not. Correctness should trump optimal. But I'm prepared to be mistaken. > If the use case is there, I have no problem with > adding it, but I want to make sure it's worthwhile. ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?
On 06/02/2016 08:01 PM, Andrew Beekhof wrote: > On Fri, May 20, 2016 at 1:53 AM, Ken Gaillotwrote: >> A recent thread discussed a proposed new feature, a new environment >> variable that would be passed to resource agents, indicating whether a >> stop action was part of a recovery. >> >> Since that thread was long and covered a lot of topics, I'm starting a >> new one to focus on the core issue remaining: >> >> The original idea was to pass the number of restarts remaining before >> the resource will no longer tried to be started on the same node. This >> involves calculating (fail-count - migration-threshold), and that >> implies certain limitations: (1) it will only be set when the cluster >> checks migration-threshold; (2) it will only be set for the failed >> resource itself, not for other resources that may be recovered due to >> dependencies on it. >> >> Ulrich Windl proposed an alternative: setting a boolean value instead. I >> forgot to cc the list on my reply, so I'll summarize now: We would set a >> new variable like OCF_RESKEY_CRM_recovery=true > > This concept worries me, especially when what we've implemented is > called OCF_RESKEY_CRM_restarting. Agreed; I plan to rename it yet again, to OCF_RESKEY_CRM_start_expected. > The name alone encourages people to "optimise" the agent to not > actually stop the service "because its just going to start again > shortly". I know thats not what Adam would do, but not everyone > understands how clusters work. > > There are any number of reasons why a cluster that intends to restart > a service may not do so. In such a scenario, a badly written agent > would cause the cluster to mistakenly believe that the service is > stopped - allowing it to start elsewhere. > > Its true there are any number of ways to write bad agents, but I would > argue that we shouldn't be nudging people in that direction :) I do have mixed feelings about that. I think if we name it start_expected, and document it carefully, we can avoid any casual mistakes. My main question is how useful would it actually be in the proposed use cases. Considering the possibility that the expected start might never happen (or fail), can an RA really do anything different if start_expected=true? If the use case is there, I have no problem with adding it, but I want to make sure it's worthwhile. >> whenever a start is >> scheduled after a stop on the same node in the same transition. This >> would avoid the corner cases of the previous approach; instead of being >> tied to migration-threshold, it would be set whenever a recovery was >> being attempted, for any reason. And with this approach, it should be >> easier to set the variable for all actions on the resource >> (demote/stop/start/promote), rather than just the stop. >> >> I think the boolean approach fits all the envisioned use cases that have >> been discussed. Any objections to going that route instead of the count? >> -- >> Ken Gaillot ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?
On Fri, May 20, 2016 at 1:53 AM, Ken Gaillotwrote: > A recent thread discussed a proposed new feature, a new environment > variable that would be passed to resource agents, indicating whether a > stop action was part of a recovery. > > Since that thread was long and covered a lot of topics, I'm starting a > new one to focus on the core issue remaining: > > The original idea was to pass the number of restarts remaining before > the resource will no longer tried to be started on the same node. This > involves calculating (fail-count - migration-threshold), and that > implies certain limitations: (1) it will only be set when the cluster > checks migration-threshold; (2) it will only be set for the failed > resource itself, not for other resources that may be recovered due to > dependencies on it. > > Ulrich Windl proposed an alternative: setting a boolean value instead. I > forgot to cc the list on my reply, so I'll summarize now: We would set a > new variable like OCF_RESKEY_CRM_recovery=true This concept worries me, especially when what we've implemented is called OCF_RESKEY_CRM_restarting. The name alone encourages people to "optimise" the agent to not actually stop the service "because its just going to start again shortly". I know thats not what Adam would do, but not everyone understands how clusters work. There are any number of reasons why a cluster that intends to restart a service may not do so. In such a scenario, a badly written agent would cause the cluster to mistakenly believe that the service is stopped - allowing it to start elsewhere. Its true there are any number of ways to write bad agents, but I would argue that we shouldn't be nudging people in that direction :) > whenever a start is > scheduled after a stop on the same node in the same transition. This > would avoid the corner cases of the previous approach; instead of being > tied to migration-threshold, it would be set whenever a recovery was > being attempted, for any reason. And with this approach, it should be > easier to set the variable for all actions on the resource > (demote/stop/start/promote), rather than just the stop. > > I think the boolean approach fits all the envisioned use cases that have > been discussed. Any objections to going that route instead of the count? > -- > Ken Gaillot > > ___ > Users mailing list: Users@clusterlabs.org > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?
Ken Gaillotwrote: > A recent thread discussed a proposed new feature, a new environment > variable that would be passed to resource agents, indicating whether a > stop action was part of a recovery. > > Since that thread was long and covered a lot of topics, I'm starting a > new one to focus on the core issue remaining: > > The original idea was to pass the number of restarts remaining before > the resource will no longer tried to be started on the same node. This > involves calculating (fail-count - migration-threshold), and that > implies certain limitations: (1) it will only be set when the cluster > checks migration-threshold; (2) it will only be set for the failed > resource itself, not for other resources that may be recovered due to > dependencies on it. > > Ulrich Windl proposed an alternative: setting a boolean value instead. I > forgot to cc the list on my reply, so I'll summarize now: We would set a > new variable like OCF_RESKEY_CRM_recovery=true whenever a start is > scheduled after a stop on the same node in the same transition. This > would avoid the corner cases of the previous approach; instead of being > tied to migration-threshold, it would be set whenever a recovery was > being attempted, for any reason. And with this approach, it should be > easier to set the variable for all actions on the resource > (demote/stop/start/promote), rather than just the stop. > > I think the boolean approach fits all the envisioned use cases that have > been discussed. Any objections to going that route instead of the count? I think that sounds fine to me. Thanks! ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?
Le Thu, 19 May 2016 13:15:20 -0500, Ken Gaillota écrit : > On 05/19/2016 11:43 AM, Jehan-Guillaume de Rorthais wrote: >> Le Thu, 19 May 2016 10:53:31 -0500, >> Ken Gaillot a écrit : >> >>> A recent thread discussed a proposed new feature, a new environment >>> variable that would be passed to resource agents, indicating whether a >>> stop action was part of a recovery. >>> >>> Since that thread was long and covered a lot of topics, I'm starting a >>> new one to focus on the core issue remaining: >>> >>> The original idea was to pass the number of restarts remaining before >>> the resource will no longer tried to be started on the same node. This >>> involves calculating (fail-count - migration-threshold), and that >>> implies certain limitations: (1) it will only be set when the cluster >>> checks migration-threshold; (2) it will only be set for the failed >>> resource itself, not for other resources that may be recovered due to >>> dependencies on it. >>> >>> Ulrich Windl proposed an alternative: setting a boolean value instead. I >>> forgot to cc the list on my reply, so I'll summarize now: We would set a >>> new variable like OCF_RESKEY_CRM_recovery=true whenever a start is >>> scheduled after a stop on the same node in the same transition. This >>> would avoid the corner cases of the previous approach; instead of being >>> tied to migration-threshold, it would be set whenever a recovery was >>> being attempted, for any reason. And with this approach, it should be >>> easier to set the variable for all actions on the resource >>> (demote/stop/start/promote), rather than just the stop. >> >> I can see the value of having such variable during various actions. >> However, we can also deduce the transition is a recovering during the >> notify actions with the notify variables (the only information we lack is >> the order of the actions). A most flexible approach would be to make sure >> the notify variables are always available during the whole transaction for >> **all** actions, not just notify. It seems like it's already the case, but >> a recent discussion emphase this is just a side effect of the current >> implementation. I understand this as they were sometime available outside >> of notification "by accident". > > It does seem that a recovery could be implied from the > notify_{start,stop}_uname variables, but notify variables are only set > for clones that support the notify action. I think the goal here is to > work with any resource type. Even for clones, if they don't otherwise > need notifications, they'd have to add the overhead of notify calls on > all instances, that would do nothing. Exact, notify variables are only available for clones, presently. What I was suggesting is that notify variables were always available, whatever the resource is a clone, a ms or a standard one. And I wasn't meaning notify *action* should be activated all the time for all the resources. The notify switch for clones/ms could be kept to false by default so the notify action is not called itself during the transitions. > > Also, I can see the benefit of having the remaining attempt for the current > > action before hitting the migration-threshold. I might misunderstand > > something here, but it seems to me both informations are different. > > I think the use cases that have been mentioned would all be happy with > just the boolean. Does anyone need the actual count, or just whether > this is a stop-start vs a full stop? I was thinking of a use case where a graceful demote or stop action failed multiple times and to give a chance to the RA to choose another method to stop the resource before it requires a migration. As instance, PostgreSQL has 3 different kind of stop, the last one being not graceful, but still better than a kill -9. > The problem with the migration-threshold approach is that there are > recoveries that will be missed because they don't involve > migration-threshold. If the count is really needed, the > migration-threshold approach is necessary, but if recovery is the really > interesting information, then a boolean would be more accurate. I think I misunderstood the original use cases you try to achieve. It seems to me we are talking about different a feature. >> Basically, what we need is a better understanding of the transition itself >> from the RA actions. >> >> If you are still brainstorming on this, as a RA dev, what I would >> suggest is: >> >> * provide and enforce the notify variables in all actions >> * add the actions order during the current transition to these variables >> using eg. OCF_RESKEY_CRM_meta_notify_*_actionid > > The action ID would be different for each node being acted on, so it > would be more complicated (maybe *_actions="NODE1:ID1,NODE2:ID2,..."?). Following the principle adopted for other variables, each ID would apply to the corresponding resource and node in OCF_RESKEY_CRM_meta_notify_*_uname and
Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?
Le Thu, 19 May 2016 10:53:31 -0500, Ken Gaillota écrit : > A recent thread discussed a proposed new feature, a new environment > variable that would be passed to resource agents, indicating whether a > stop action was part of a recovery. > > Since that thread was long and covered a lot of topics, I'm starting a > new one to focus on the core issue remaining: > > The original idea was to pass the number of restarts remaining before > the resource will no longer tried to be started on the same node. This > involves calculating (fail-count - migration-threshold), and that > implies certain limitations: (1) it will only be set when the cluster > checks migration-threshold; (2) it will only be set for the failed > resource itself, not for other resources that may be recovered due to > dependencies on it. > > Ulrich Windl proposed an alternative: setting a boolean value instead. I > forgot to cc the list on my reply, so I'll summarize now: We would set a > new variable like OCF_RESKEY_CRM_recovery=true whenever a start is > scheduled after a stop on the same node in the same transition. This > would avoid the corner cases of the previous approach; instead of being > tied to migration-threshold, it would be set whenever a recovery was > being attempted, for any reason. And with this approach, it should be > easier to set the variable for all actions on the resource > (demote/stop/start/promote), rather than just the stop. I can see the value of having such variable during various actions. However, we can also deduce the transition is a recovering during the notify actions with the notify variables (the only information we lack is the order of the actions). A most flexible approach would be to make sure the notify variables are always available during the whole transaction for **all** actions, not just notify. It seems like it's already the case, but a recent discussion emphase this is just a side effect of the current implementation. I understand this as they were sometime available outside of notification "by accident". Also, I can see the benefit of having the remaining attempt for the current action before hitting the migration-threshold. I might misunderstand something here, but it seems to me both informations are different. Basically, what we need is a better understanding of the transition itself from the RA actions. If you are still brainstorming on this, as a RA dev, what I would suggest is: * provide and enforce the notify variables in all actions * add the actions order during the current transition to these variables using eg. OCF_RESKEY_CRM_meta_notify_*_actionid * add a new variable with remaining action attempt before migration. This one has the advantage to survive the transition breakage when a failure occurs. On a second step, we would be able to provide some helper function in the ocf_shellfuncs (and in my perl module equivalent) to compute if the transition is a switchover, a failover, a recovery, etc, based on the notify variables. Presently, I am detecting such scenarios directly in my RA during the notify actions and tracking them as private attributes to be aware of the situation during the real actions (demote and stop). See: https://github.com/dalibo/PAF/blob/952cb3cf2f03aad18fbeafe3a91f997a56c3b606/script/pgsqlms#L95 Regards, ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org