Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?

2017-12-15 Thread Jan Pokorný
On 20/05/16 17:04 +0100, Adam Spiers wrote:
> Klaus Wenninger  wrote:
>> On 05/20/2016 08:39 AM, Ulrich Windl wrote:
>>> I think RAs should not rely on "stop" being called multiple times
>>> for a resource to be stopped.
> 
> Well, this would be a major architectural change.  Currently if
> stop fails once, the node gets fenced - period.  So if we changed
> this, there would presumably be quite a bit of scope for making the
> new design address whatever concerns you have about relying on "stop"
> *sometimes* needing to be called multiple times.  For the sake of
> backwards compatibility with existing RAs, I think we'd have to ensure
> the current semantics still work.  But maybe there could be a new
> option where RAs are allowed to return OCF_RETRY_STOP to indicate that
> they want to escalate, or something.  However it's not clear how that
> would be distinguished from an old RA returning the same value as
> whatever we chose for OCF_RETRY_STOP.
> 
>> I see a couple of positive points in having something inside pacemaker
>> that helps the RAs escalating their stop strategy:
>> 
>> - this way you have the same logging for all RAs - done within the
>>   RA it would look different with each of them
>> - timeout-retry stuff is potentially prone to not being implemented
>>   properly - like this you have a proven
>>   implementation within pacemaker
>> - keeps logic within RA simpler and guides implementation in
>>   a certain direction that makes them look more similar to each
>>   other making it easier to understand an RA you haven't seen
>>   before
> 
> Yes, all good points which I agree with.
> 
>> Of course there are basically two approaches to achieve this:
>> 
>> - give some global or per resource view of pacemaker to the RA and leave
>>   it to the RA to act in a responsible manner (like telling the RA
>>   that there are x stop-retries to come)
>> - handle the escalation withing pacemaker and already tell the RA
>>   what you expect it to do like requesting a graceful / hard /
>>   emergency or however you would call it stop
> 
> I'd probably prefer the former, to avoid hardcoding any assumptions
> about the different levels of escalation the RA might want to take.
> That would almost certainly vary per RA.

I'd like to point out the direction of just-released systemd 236 to
solve "what if action needs more time to finish than permitted":

> The sd_notify() protocol can now with EXTEND_TIMEOUT_USEC=microsecond
> extend the effective start, runtime, and stop time. The service must
> continue to send EXTEND_TIMEOUT_USEC within the period specified to
> prevent the service manager from making the service as timedout.

It apparently does not solve "cannot wait forever otherwise degrading
availability" off the bat, is not well suited for the current
agent-driven, synchronous+sequenced supervision model (which, since
beginning, was not planned to remain the final state-of-art[1],
though), but looks simple enough and is quite close to
OCF_RETRY_STOP idea proposed above.

[1] 
https://github.com/ClusterLabs/OCF-spec/commit/2331bb8d3624a2697afaf3429cec1f47d19251f5#diff-316ade5241704833815c8fa2c2b71d4dR422

> However, we're slightly off-topic for this thread at this point ;-)

(It's all one big Gordian knot, all is related, and that we are not
starting with a clean drawing board but are rolling some stones
ahead of us already is not helping.)

-- 
Poki


pgpFjdD_xHYWf.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?

2016-06-25 Thread Adam Spiers
Ken Gaillot  wrote:
> On 06/24/2016 05:41 AM, Adam Spiers wrote:
> > Andrew Beekhof  wrote:
> >> On Fri, Jun 24, 2016 at 1:01 AM, Adam Spiers  wrote:
> >>> Andrew Beekhof  wrote:
> > Earlier in this thread I proposed
> > the idea of a tiny temporary file in /run which tracks the last known
> > state and optimizes away the consecutive invocations, but IIRC you
> > were against that.
> 
>  I'm generally not a fan, but sometimes state files are a necessity.
>  Just make sure you think through what a missing file might mean.
> >>>
> >>> Sure.  A missing file would mean the RA's never called service-disable
> >>> before,
> >>
> >> And that is why I generally don't like state files.
> >> The default location for state files doesn't persist across reboots.
> >>
> >> t1. stop (ie. disable)
> >> t2. reboot
> >> t3. start with no state file
> >> t4. WHY WONT NOVA USE THE NEW COMPUTE NODE STUPID CLUSTERS
> > 
> > Well then we simply put the state file somewhere which does persist
> > across reboots.
> 
> There's also the possibility of using a node attribute. If you set a
> normal node attribute, it will abort the transition and calculate a new
> one, so that's something to take into account. You could set a private
> node attribute, which never gets written to the CIB and thus doesn't
> abort transitions, but it also does not survive a complete cluster stop.

Interesting idea, although I wonder if there is a good solution to
either of these challenges.  Aborting the current transition sounds
bad, and we would certainly want the state to survive a cluster stop,
otherwise we risk the exact issue Andrew described above.

Also, since the state is per-node, I'm not convinced there's a huge
advantage to sharing it cluster-wide, which is why I proposed the
local filesystem as the store for it.  But I'm open to suggestions of
course :-)

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?

2016-06-23 Thread Andrew Beekhof
On Fri, Jun 24, 2016 at 1:01 AM, Adam Spiers  wrote:
> Andrew Beekhof  wrote:

>> > Well, if you're OK with bending the rules like this then that's good
>> > enough for me to say we should at least try it :)
>>
>> I still say you shouldn't only do it on error.
>
> When else should it be done?

I was thinking whenever a stop() happens.

> IIUC, disabling/enabling the service is independent of the up/down
> state which nova tracks automatically, and which based on slightly
> more than a skim of the code, is dependent on the state of the RPC
> layer.
>
>> > But how would you avoid repeated consecutive invocations of "nova
>> > service-disable" when the monitor action fails, and ditto for "nova
>> > service-enable" when it succeeds?
>>
>> I don't think you can. Not ideal but I'd not have thought a deal breaker.
>
> Sounds like a massive deal-breaker to me!  With op monitor
> interval="10s" and 100 compute nodes, that would mean 10 pointless
> calls to nova-api every second.  Am I missing something?

I was thinking you would only call it for the "I detected a failure
case" and service-enable would still be on start().
So the number of pointless calls per second would be capped at one
tenth of the number of failed compute nodes.

One would hope that all of them weren't dead.

>
> Also I don't see any benefit to moving the API calls from start/stop
> actions to the monitor action.  If there's a failure, Pacemaker will
> invoke the stop action, so we can do service-disable there.

I agree. Doing it unconditionally at stop() is my preferred option, I
was only trying to provide a path that might be close to the behaviour
you were looking for.

> If the
> start action is invoked and we successfully initiate startup of
> nova-compute, the RA can undo any service-disable it previously did
> (although it should not reverse a service-disable done elsewhere,
> e.g. manually by the cloud operator).

Agree

>
>> > Earlier in this thread I proposed
>> > the idea of a tiny temporary file in /run which tracks the last known
>> > state and optimizes away the consecutive invocations, but IIRC you
>> > were against that.
>>
>> I'm generally not a fan, but sometimes state files are a necessity.
>> Just make sure you think through what a missing file might mean.
>
> Sure.  A missing file would mean the RA's never called service-disable
> before,

And that is why I generally don't like state files.
The default location for state files doesn't persist across reboots.

t1. stop (ie. disable)
t2. reboot
t3. start with no state file
t4. WHY WONT NOVA USE THE NEW COMPUTE NODE STUPID CLUSTERS

> which means that it shouldn't call service-enable on startup.
>
>> Unless use the state file to store the date at which the last
>> start operation occurred?
>>
>> If we're calling stop() and data - start_date > threshold, then, if
>> you must, be optimistic, skip service-disable and assume we'll get
>> started again soon.
>>
>> Otherwise if we're calling stop() and data - start_date <= threshold,
>> always call service-disable because we're in a restart loop which is
>> not worth optimising for.
>>
>> ( And always call service-enable at start() )
>>
>> No Pacemaker feature or Beekhof approval required :-)
>
> Hmm ...  it's possible I just don't understand this proposal fully,
> but it sounds a bit woolly to me, e.g. how would you decide a suitable
> threshold?

roll a dice?

> I think I preferred your other suggestion of just skipping the
> optimization, i.e. calling service-disable on the first stop, and
> service-enable on (almost) every start.

good :)


And the use of force-down from your subsequent email sounds excellent

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?

2016-06-23 Thread Adam Spiers
Andrew Beekhof  wrote:
> On Wed, Jun 15, 2016 at 10:42 PM, Adam Spiers  wrote:
> > Andrew Beekhof  wrote:
> >> On Mon, Jun 13, 2016 at 9:34 PM, Adam Spiers  wrote:
> >> > Andrew Beekhof  wrote:
> >> >> On Wed, Jun 8, 2016 at 6:23 PM, Adam Spiers  wrote:
> >> >> > Andrew Beekhof  wrote:
> >> >> >> On Wed, Jun 8, 2016 at 12:11 AM, Adam Spiers  
> >> >> >> wrote:
> >> >> >> > We would also need to ensure that service-enable is called on start
> >> >> >> > when necessary.  Perhaps we could track the enable/disable state 
> >> >> >> > in a
> >> >> >> > local temporary file, and if the file indicates that we've 
> >> >> >> > previously
> >> >> >> > done service-disable, we know to run service-enable on start.  This
> >> >> >> > would avoid calling service-enable on every single start.
> >> >> >>
> >> >> >> feels like an over-optimization
> >> >> >> in fact, the whole thing feels like that if i'm honest.
> >> >> >
> >> >> > Huh ... You didn't seem to think that when we discussed automating
> >> >> > service-disable at length in Austin.
> >> >>
> >> >> I didn't feel the need to push back because RH uses the systemd agent
> >> >> instead so you're only hanging yourself, but more importantly because
> >> >> the proposed implementation to facilitate it wasn't leading RA writers
> >> >> down a hazardous path :-)
> >> >
> >> > I'm a bit confused by that statement, because the only proposed
> >> > implementation we came up with in Austin was adding this new feature
> >> > to Pacemaker.
> >>
> >> _A_ new feature, not _this_ new feature.
> >> The one we discussed was far less prone to being abused but, as it
> >> turns out, also far less useful for what you were trying to do.
> >
> > Was there really that much significant change since the original idea?
> > IIRC the only thing which really changed was the type, from "number of
> > retries remaining" to a boolean "there are still some retries" left.
> 
> The new implementation has nothing to do with retries. Like the new
> name, it is based on "is a start action expected".

Oh yeah, I remember now.

> Thats why I got an attack of the heebie-jeebies.

I'm not sure why, but at least now I understand your change of
position :-)

> > I'm not sure why the integer approach would be far less open to abuse,
> > or even why it would have been far less useful.  I'm probably missing
> > something.
> >
> > [snipped]
> >
> >> >> >> why are we trying to optimise the projected performance impact
> >> >> >
> >> >> > It's not really "projected"; we know exactly what the impact is.  And
> >> >> > it's not really a performance impact either.  If nova-compute (or a
> >> >> > dependency) is malfunctioning on a compute node, there will be a
> >> >> > window (bounded by nova.conf's rpc_response_timeout value, IIUC) in
> >> >> > which nova-scheduler could still schedule VMs onto that compute node,
> >> >> > and then of course they'll fail to boot.
> >> >>
> >> >> Right, but that window exists regardless of whether the node is or is
> >> >> not ever coming back.
> >> >
> >> > Sure, but the window's a *lot* bigger if we don't do service-disable.
> >> > Although perhaps your question "why are we trying to optimise the
> >> > projected performance impact" was actually "why are we trying to avoid
> >> > extra calls to service-disable" rather than "why do we want to call
> >> > service-disable" as I initially assumed.  Is that right?
> >>
> >> Exactly.  I assumed it was to limit the noise we'd be generating in doing 
> >> so.
> >
> > Sort of - not just the noise, but the extra delay introduced by
> > calling service-disable, restarting nova-compute, and then calling
> > service-enable again when it succeeds.
> 
> Ok, but restarting nova-compute is not optional and the bits that are
> optional are all but completely asynchronous* - so the overhead should
> be negligible.
> 
> * Like most API calls, they are Ack'd when the request has been
> received, not processed.

Yes, fair points.

> >> >> > The masakari folks have a lot of operational experience in this space,
> >> >> > and they found that this was enough of a problem to justify calling
> >> >> > nova service-disable whenever the failure is detected.
> >> >>
> >> >> If you really want it whenever the failure is detected, call it from
> >> >> the monitor operation that finds it broken.
> >> >
> >> > Hmm, that appears to violate what I assume would be a fundamental
> >> > design principle of Pacemaker: that the "monitor" action never changes
> >> > the system's state (assuming there are no Heisenberg-like side effects
> >> > of monitoring, of course).
> >>
> >> That has traditionally been the considered a good idea, in the vast
> >> majority of cases I still think it is a good idea, but its also a
> >> guideline that has been broken because there is no other way for the
> >> agent to work *cough* 

Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?

2016-06-15 Thread Andrew Beekhof
On Wed, Jun 15, 2016 at 10:42 PM, Adam Spiers  wrote:
> Andrew Beekhof  wrote:
>> On Mon, Jun 13, 2016 at 9:34 PM, Adam Spiers  wrote:
>> > Andrew Beekhof  wrote:
>> >> On Wed, Jun 8, 2016 at 6:23 PM, Adam Spiers  wrote:
>> >> > Andrew Beekhof  wrote:
>> >> >> On Wed, Jun 8, 2016 at 12:11 AM, Adam Spiers  wrote:
>> >> >> > We would also need to ensure that service-enable is called on start
>> >> >> > when necessary.  Perhaps we could track the enable/disable state in a
>> >> >> > local temporary file, and if the file indicates that we've previously
>> >> >> > done service-disable, we know to run service-enable on start.  This
>> >> >> > would avoid calling service-enable on every single start.
>> >> >>
>> >> >> feels like an over-optimization
>> >> >> in fact, the whole thing feels like that if i'm honest.
>> >> >
>> >> > Huh ... You didn't seem to think that when we discussed automating
>> >> > service-disable at length in Austin.
>> >>
>> >> I didn't feel the need to push back because RH uses the systemd agent
>> >> instead so you're only hanging yourself, but more importantly because
>> >> the proposed implementation to facilitate it wasn't leading RA writers
>> >> down a hazardous path :-)
>> >
>> > I'm a bit confused by that statement, because the only proposed
>> > implementation we came up with in Austin was adding this new feature
>> > to Pacemaker.
>>
>> _A_ new feature, not _this_ new feature.
>> The one we discussed was far less prone to being abused but, as it
>> turns out, also far less useful for what you were trying to do.
>
> Was there really that much significant change since the original idea?
> IIRC the only thing which really changed was the type, from "number of
> retries remaining" to a boolean "there are still some retries" left.

The new implementation has nothing to do with retries. Like the new
name, it is based on "is a start action expected".
Thats why I got an attack of the heebie-jeebies.

> I'm not sure why the integer approach would be far less open to abuse,
> or even why it would have been far less useful.  I'm probably missing
> something.
>
> [snipped]
>
>> >> >> why are we trying to optimise the projected performance impact
>> >> >
>> >> > It's not really "projected"; we know exactly what the impact is.  And
>> >> > it's not really a performance impact either.  If nova-compute (or a
>> >> > dependency) is malfunctioning on a compute node, there will be a
>> >> > window (bounded by nova.conf's rpc_response_timeout value, IIUC) in
>> >> > which nova-scheduler could still schedule VMs onto that compute node,
>> >> > and then of course they'll fail to boot.
>> >>
>> >> Right, but that window exists regardless of whether the node is or is
>> >> not ever coming back.
>> >
>> > Sure, but the window's a *lot* bigger if we don't do service-disable.
>> > Although perhaps your question "why are we trying to optimise the
>> > projected performance impact" was actually "why are we trying to avoid
>> > extra calls to service-disable" rather than "why do we want to call
>> > service-disable" as I initially assumed.  Is that right?
>>
>> Exactly.  I assumed it was to limit the noise we'd be generating in doing so.
>
> Sort of - not just the noise, but the extra delay introduced by
> calling service-disable, restarting nova-compute, and then calling
> service-enable again when it succeeds.

Ok, but restarting nova-compute is not optional and the bits that are
optional are all but completely asynchronous* - so the overhead should
be negligible.

* Like most API calls, they are Ack'd when the request has been
received, not processed.

>
> [snipped]
>
>> >> > The masakari folks have a lot of operational experience in this space,
>> >> > and they found that this was enough of a problem to justify calling
>> >> > nova service-disable whenever the failure is detected.
>> >>
>> >> If you really want it whenever the failure is detected, call it from
>> >> the monitor operation that finds it broken.
>> >
>> > Hmm, that appears to violate what I assume would be a fundamental
>> > design principle of Pacemaker: that the "monitor" action never changes
>> > the system's state (assuming there are no Heisenberg-like side effects
>> > of monitoring, of course).
>>
>> That has traditionally been the considered a good idea, in the vast
>> majority of cases I still think it is a good idea, but its also a
>> guideline that has been broken because there is no other way for the
>> agent to work *cough* rabbit *cough*.
>>
>> In this specific case, I think it could be forgivable because you're
>> not strictly altering the service but something that sits in front of
>> it.  start/stop/monitor would all continue to do TheRightThing(tm).
>>
>> > I guess you could argue that in this case,
>> > the nova server's internal state could be considered outside the
>> > system which 

Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?

2016-06-15 Thread Adam Spiers
Andrew Beekhof  wrote:
> On Mon, Jun 13, 2016 at 9:34 PM, Adam Spiers  wrote:
> > Andrew Beekhof  wrote:
> >> On Wed, Jun 8, 2016 at 6:23 PM, Adam Spiers  wrote:
> >> > Andrew Beekhof  wrote:
> >> >> On Wed, Jun 8, 2016 at 12:11 AM, Adam Spiers  wrote:
> >> >> > We would also need to ensure that service-enable is called on start
> >> >> > when necessary.  Perhaps we could track the enable/disable state in a
> >> >> > local temporary file, and if the file indicates that we've previously
> >> >> > done service-disable, we know to run service-enable on start.  This
> >> >> > would avoid calling service-enable on every single start.
> >> >>
> >> >> feels like an over-optimization
> >> >> in fact, the whole thing feels like that if i'm honest.
> >> >
> >> > Huh ... You didn't seem to think that when we discussed automating
> >> > service-disable at length in Austin.
> >>
> >> I didn't feel the need to push back because RH uses the systemd agent
> >> instead so you're only hanging yourself, but more importantly because
> >> the proposed implementation to facilitate it wasn't leading RA writers
> >> down a hazardous path :-)
> >
> > I'm a bit confused by that statement, because the only proposed
> > implementation we came up with in Austin was adding this new feature
> > to Pacemaker.
> 
> _A_ new feature, not _this_ new feature.
> The one we discussed was far less prone to being abused but, as it
> turns out, also far less useful for what you were trying to do.

Was there really that much significant change since the original idea?
IIRC the only thing which really changed was the type, from "number of
retries remaining" to a boolean "there are still some retries" left.
I'm not sure why the integer approach would be far less open to abuse,
or even why it would have been far less useful.  I'm probably missing
something.

[snipped]

> >> >> why are we trying to optimise the projected performance impact
> >> >
> >> > It's not really "projected"; we know exactly what the impact is.  And
> >> > it's not really a performance impact either.  If nova-compute (or a
> >> > dependency) is malfunctioning on a compute node, there will be a
> >> > window (bounded by nova.conf's rpc_response_timeout value, IIUC) in
> >> > which nova-scheduler could still schedule VMs onto that compute node,
> >> > and then of course they'll fail to boot.
> >>
> >> Right, but that window exists regardless of whether the node is or is
> >> not ever coming back.
> >
> > Sure, but the window's a *lot* bigger if we don't do service-disable.
> > Although perhaps your question "why are we trying to optimise the
> > projected performance impact" was actually "why are we trying to avoid
> > extra calls to service-disable" rather than "why do we want to call
> > service-disable" as I initially assumed.  Is that right?
> 
> Exactly.  I assumed it was to limit the noise we'd be generating in doing so.

Sort of - not just the noise, but the extra delay introduced by
calling service-disable, restarting nova-compute, and then calling
service-enable again when it succeeds.

[snipped]

> >> > The masakari folks have a lot of operational experience in this space,
> >> > and they found that this was enough of a problem to justify calling
> >> > nova service-disable whenever the failure is detected.
> >>
> >> If you really want it whenever the failure is detected, call it from
> >> the monitor operation that finds it broken.
> >
> > Hmm, that appears to violate what I assume would be a fundamental
> > design principle of Pacemaker: that the "monitor" action never changes
> > the system's state (assuming there are no Heisenberg-like side effects
> > of monitoring, of course).
> 
> That has traditionally been the considered a good idea, in the vast
> majority of cases I still think it is a good idea, but its also a
> guideline that has been broken because there is no other way for the
> agent to work *cough* rabbit *cough*.
> 
> In this specific case, I think it could be forgivable because you're
> not strictly altering the service but something that sits in front of
> it.  start/stop/monitor would all continue to do TheRightThing(tm).
> 
> > I guess you could argue that in this case,
> > the nova server's internal state could be considered outside the
> > system which Pacemaker is managing.
> 
> Right.

Well, if you're OK with bending the rules like this then that's good
enough for me to say we should at least try it :)

But how would you avoid repeated consecutive invocations of "nova
service-disable" when the monitor action fails, and ditto for "nova
service-enable" when it succeeds?  Earlier in this thread I proposed
the idea of a tiny temporary file in /run which tracks the last known
state and optimizes away the consecutive invocations, but IIRC you
were against that.

> >> I'm arguing that trying to do it only failure is an over optimization
> 

Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?

2016-06-14 Thread Andrew Beekhof
On Mon, Jun 13, 2016 at 9:34 PM, Adam Spiers  wrote:
> Andrew Beekhof  wrote:
>> On Wed, Jun 8, 2016 at 6:23 PM, Adam Spiers  wrote:
>> > Andrew Beekhof  wrote:
>> >> On Wed, Jun 8, 2016 at 12:11 AM, Adam Spiers  wrote:
>> >> > Ken Gaillot  wrote:
>> >> >> On 06/06/2016 05:45 PM, Adam Spiers wrote:
>> >> >> > Maybe your point was that if the expected start never happens (so
>> >> >> > never even gets a chance to fail), we still want to do a nova
>> >> >> > service-disable?
>> >> >>
>> >> >> That is a good question, which might mean it should be done on every
>> >> >> stop -- or could that cause problems (besides delays)?
>> >> >
>> >> > No, the whole point of adding this feature is to avoid a
>> >> > service-disable on every stop, and instead only do it on the final
>> >> > stop.  If there are corner cases where we never reach the final stop,
>> >> > that's not a disaster because nova will eventually figure it out and
>> >> > do the right thing when the server-agent connection times out.
>> >> >
>> >> >> Another aspect of this is that the proposed feature could only look at 
>> >> >> a
>> >> >> single transition. What if stop is called with start_expected=false, 
>> >> >> but
>> >> >> then Pacemaker is able to start the service on the same node in the 
>> >> >> next
>> >> >> transition immediately afterward? Would having called service-disable
>> >> >> cause problems for that start?
>> >> >
>> >> > We would also need to ensure that service-enable is called on start
>> >> > when necessary.  Perhaps we could track the enable/disable state in a
>> >> > local temporary file, and if the file indicates that we've previously
>> >> > done service-disable, we know to run service-enable on start.  This
>> >> > would avoid calling service-enable on every single start.
>> >>
>> >> feels like an over-optimization
>> >> in fact, the whole thing feels like that if i'm honest.
>> >
>> > Huh ... You didn't seem to think that when we discussed automating
>> > service-disable at length in Austin.
>>
>> I didn't feel the need to push back because RH uses the systemd agent
>> instead so you're only hanging yourself, but more importantly because
>> the proposed implementation to facilitate it wasn't leading RA writers
>> down a hazardous path :-)
>
> I'm a bit confused by that statement, because the only proposed
> implementation we came up with in Austin was adding this new feature
> to Pacemaker.

_A_ new feature, not _this_ new feature.
The one we discussed was far less prone to being abused but, as it
turns out, also far less useful for what you were trying to do.

 Prior to that, AFAICR, you, Dawid, and I had a long
> afternoon discussion in the sun where we tried to figure out a way to
> implement it just by tweaking the OCF RAs, but every approach we
> discussed turned out to have fundamental issues.  That's why we
> eventually turned to the idea of this new feature in Pacemaker.
>
> But anyway, it's water under the bridge now :-)
>
>> > What changed?  Can you suggest a better approach?
>>
>> Either always or never disable the service would be my advice.
>> "Always" specifically getting my vote.
>
> OK, thanks.  We discussed that at the meeting this morning, and it
> looks like we'll give it a try.
>
>> >> why are we trying to optimise the projected performance impact
>> >
>> > It's not really "projected"; we know exactly what the impact is.  And
>> > it's not really a performance impact either.  If nova-compute (or a
>> > dependency) is malfunctioning on a compute node, there will be a
>> > window (bounded by nova.conf's rpc_response_timeout value, IIUC) in
>> > which nova-scheduler could still schedule VMs onto that compute node,
>> > and then of course they'll fail to boot.
>>
>> Right, but that window exists regardless of whether the node is or is
>> not ever coming back.
>
> Sure, but the window's a *lot* bigger if we don't do service-disable.
> Although perhaps your question "why are we trying to optimise the
> projected performance impact" was actually "why are we trying to avoid
> extra calls to service-disable" rather than "why do we want to call
> service-disable" as I initially assumed.  Is that right?

Exactly.  I assumed it was to limit the noise we'd be generating in doing so.

>
>> And as we already discussed, the proposed feature still leaves you
>> open to this window because we can't know if the expected restart will
>> ever happen.
>
> Yes, but as I already said, the perfect should not become the enemy of
> the good.  Just because an approach doesn't solve all cases, it
> doesn't necessarily mean it's not suitable for solving some of them.
>
>> In this context, trying to avoid the disable call under certain
>> circumstances, to avoid repeated and frequent flip-flopping of the
>> state, seems ill-advised.  At the point nova compute is bouncing up
>> and down like that, you have a more 

Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?

2016-06-13 Thread Adam Spiers
Andrew Beekhof  wrote:
> On Wed, Jun 8, 2016 at 6:23 PM, Adam Spiers  wrote:
> > Andrew Beekhof  wrote:
> >> On Wed, Jun 8, 2016 at 12:11 AM, Adam Spiers  wrote:
> >> > Ken Gaillot  wrote:
> >> >> On 06/06/2016 05:45 PM, Adam Spiers wrote:
> >> >> > Maybe your point was that if the expected start never happens (so
> >> >> > never even gets a chance to fail), we still want to do a nova
> >> >> > service-disable?
> >> >>
> >> >> That is a good question, which might mean it should be done on every
> >> >> stop -- or could that cause problems (besides delays)?
> >> >
> >> > No, the whole point of adding this feature is to avoid a
> >> > service-disable on every stop, and instead only do it on the final
> >> > stop.  If there are corner cases where we never reach the final stop,
> >> > that's not a disaster because nova will eventually figure it out and
> >> > do the right thing when the server-agent connection times out.
> >> >
> >> >> Another aspect of this is that the proposed feature could only look at a
> >> >> single transition. What if stop is called with start_expected=false, but
> >> >> then Pacemaker is able to start the service on the same node in the next
> >> >> transition immediately afterward? Would having called service-disable
> >> >> cause problems for that start?
> >> >
> >> > We would also need to ensure that service-enable is called on start
> >> > when necessary.  Perhaps we could track the enable/disable state in a
> >> > local temporary file, and if the file indicates that we've previously
> >> > done service-disable, we know to run service-enable on start.  This
> >> > would avoid calling service-enable on every single start.
> >>
> >> feels like an over-optimization
> >> in fact, the whole thing feels like that if i'm honest.
> >
> > Huh ... You didn't seem to think that when we discussed automating
> > service-disable at length in Austin.
> 
> I didn't feel the need to push back because RH uses the systemd agent
> instead so you're only hanging yourself, but more importantly because
> the proposed implementation to facilitate it wasn't leading RA writers
> down a hazardous path :-)

I'm a bit confused by that statement, because the only proposed
implementation we came up with in Austin was adding this new feature
to Pacemaker.  Prior to that, AFAICR, you, Dawid, and I had a long
afternoon discussion in the sun where we tried to figure out a way to
implement it just by tweaking the OCF RAs, but every approach we
discussed turned out to have fundamental issues.  That's why we
eventually turned to the idea of this new feature in Pacemaker.

But anyway, it's water under the bridge now :-)

> > What changed?  Can you suggest a better approach?
> 
> Either always or never disable the service would be my advice.
> "Always" specifically getting my vote.

OK, thanks.  We discussed that at the meeting this morning, and it
looks like we'll give it a try.

> >> why are we trying to optimise the projected performance impact
> >
> > It's not really "projected"; we know exactly what the impact is.  And
> > it's not really a performance impact either.  If nova-compute (or a
> > dependency) is malfunctioning on a compute node, there will be a
> > window (bounded by nova.conf's rpc_response_timeout value, IIUC) in
> > which nova-scheduler could still schedule VMs onto that compute node,
> > and then of course they'll fail to boot.
> 
> Right, but that window exists regardless of whether the node is or is
> not ever coming back.

Sure, but the window's a *lot* bigger if we don't do service-disable.
Although perhaps your question "why are we trying to optimise the
projected performance impact" was actually "why are we trying to avoid
extra calls to service-disable" rather than "why do we want to call
service-disable" as I initially assumed.  Is that right?

> And as we already discussed, the proposed feature still leaves you
> open to this window because we can't know if the expected restart will
> ever happen.

Yes, but as I already said, the perfect should not become the enemy of
the good.  Just because an approach doesn't solve all cases, it
doesn't necessarily mean it's not suitable for solving some of them.

> In this context, trying to avoid the disable call under certain
> circumstances, to avoid repeated and frequent flip-flopping of the
> state, seems ill-advised.  At the point nova compute is bouncing up
> and down like that, you have a more fundamental issue somewhere in
> your stack and this is only one (and IMHO minor) symptom of it.

That's a fair point.

> > The masakari folks have a lot of operational experience in this space,
> > and they found that this was enough of a problem to justify calling
> > nova service-disable whenever the failure is detected.
> 
> If you really want it whenever the failure is detected, call it from
> the monitor operation that finds it broken.

Hmm, that 

Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?

2016-06-09 Thread Andrew Beekhof
On Wed, Jun 8, 2016 at 6:23 PM, Adam Spiers  wrote:
> Andrew Beekhof  wrote:
>> On Wed, Jun 8, 2016 at 12:11 AM, Adam Spiers  wrote:
>> > Ken Gaillot  wrote:
>> >> On 06/06/2016 05:45 PM, Adam Spiers wrote:
>> >> > Adam Spiers  wrote:
>> >> >> Andrew Beekhof  wrote:
>> >> >>> On Tue, Jun 7, 2016 at 8:29 AM, Adam Spiers  wrote:
>> >>  Ken Gaillot  wrote:
>> >> > My main question is how useful would it actually be in the proposed 
>> >> > use
>> >> > cases. Considering the possibility that the expected start might 
>> >> > never
>> >> > happen (or fail), can an RA really do anything different if
>> >> > start_expected=true?
>> >> 
>> >>  That's the wrong question :-)
>> >> 
>> >> > If the use case is there, I have no problem with
>> >> > adding it, but I want to make sure it's worthwhile.
>> >> 
>> >>  The use case which started this whole thread is for
>> >>  start_expected=false, not start_expected=true.
>> >> >>>
>> >> >>> Isn't this just two sides of the same coin?
>> >> >>> If you're not doing the same thing for both cases, then you're just
>> >> >>> reversing the order of the clauses.
>> >> >>
>> >> >> No, because the stated concern about unreliable expectations
>> >> >> ("Considering the possibility that the expected start might never
>> >> >> happen (or fail)") was regarding start_expected=true, and that's the
>> >> >> side of the coin we don't care about, so it doesn't matter if it's
>> >> >> unreliable.
>> >> >
>> >> > BTW, if the expected start happens but fails, then Pacemaker will just
>> >> > keep repeating until migration-threshold is hit, at which point it
>> >> > will call the RA 'stop' action finally with start_expected=false.
>> >> > So that's of no concern.
>> >>
>> >> To clarify, that's configurable, via start-failure-is-fatal and on-fail
>> >
>> > Sure.
>> >
>> >> > Maybe your point was that if the expected start never happens (so
>> >> > never even gets a chance to fail), we still want to do a nova
>> >> > service-disable?
>> >>
>> >> That is a good question, which might mean it should be done on every
>> >> stop -- or could that cause problems (besides delays)?
>> >
>> > No, the whole point of adding this feature is to avoid a
>> > service-disable on every stop, and instead only do it on the final
>> > stop.  If there are corner cases where we never reach the final stop,
>> > that's not a disaster because nova will eventually figure it out and
>> > do the right thing when the server-agent connection times out.
>> >
>> >> Another aspect of this is that the proposed feature could only look at a
>> >> single transition. What if stop is called with start_expected=false, but
>> >> then Pacemaker is able to start the service on the same node in the next
>> >> transition immediately afterward? Would having called service-disable
>> >> cause problems for that start?
>> >
>> > We would also need to ensure that service-enable is called on start
>> > when necessary.  Perhaps we could track the enable/disable state in a
>> > local temporary file, and if the file indicates that we've previously
>> > done service-disable, we know to run service-enable on start.  This
>> > would avoid calling service-enable on every single start.
>>
>> feels like an over-optimization
>> in fact, the whole thing feels like that if i'm honest.
>
> Huh ... You didn't seem to think that when we discussed automating
> service-disable at length in Austin.

I didn't feel the need to push back because RH uses the systemd agent
instead so you're only hanging yourself, but more importantly because
the proposed implementation to facilitate it wasn't leading RA writers
down a hazardous path :-)

>  What changed?  Can you suggest a
> better approach?

Either always or never disable the service would be my advice.
"Always" specifically getting my vote.

>
>> why are we trying to optimise the projected performance impact
>
> It's not really "projected"; we know exactly what the impact is.  And
> it's not really a performance impact either.  If nova-compute (or a
> dependency) is malfunctioning on a compute node, there will be a
> window (bounded by nova.conf's rpc_response_timeout value, IIUC) in
> which nova-scheduler could still schedule VMs onto that compute node,
> and then of course they'll fail to boot.

Right, but that window exists regardless of whether the node is or is
not ever coming back.
And as we already discussed, the proposed feature still leaves you
open to this window because we can't know if the expected restart will
ever happen.

In this context, trying to avoid the disable call under certain
circumstances, to avoid repeated and frequent flip-flopping of the
state, seems ill-advised.  At the point nova compute is bouncing up
and down like that, you have a more fundamental issue somewhere in
your 

Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?

2016-06-07 Thread Andrew Beekhof
On Wed, Jun 8, 2016 at 10:29 AM, Andrew Beekhof  wrote:
> On Wed, Jun 8, 2016 at 12:11 AM, Adam Spiers  wrote:
>> Ken Gaillot  wrote:
>>> On 06/06/2016 05:45 PM, Adam Spiers wrote:
>>> > Adam Spiers  wrote:
>>> >> Andrew Beekhof  wrote:
>>> >>> On Tue, Jun 7, 2016 at 8:29 AM, Adam Spiers  wrote:
>>>  Ken Gaillot  wrote:
>>> > My main question is how useful would it actually be in the proposed 
>>> > use
>>> > cases. Considering the possibility that the expected start might never
>>> > happen (or fail), can an RA really do anything different if
>>> > start_expected=true?
>>> 
>>>  That's the wrong question :-)
>>> 
>>> > If the use case is there, I have no problem with
>>> > adding it, but I want to make sure it's worthwhile.
>>> 
>>>  The use case which started this whole thread is for
>>>  start_expected=false, not start_expected=true.
>>> >>>
>>> >>> Isn't this just two sides of the same coin?
>>> >>> If you're not doing the same thing for both cases, then you're just
>>> >>> reversing the order of the clauses.
>>> >>
>>> >> No, because the stated concern about unreliable expectations
>>> >> ("Considering the possibility that the expected start might never
>>> >> happen (or fail)") was regarding start_expected=true, and that's the
>>> >> side of the coin we don't care about, so it doesn't matter if it's
>>> >> unreliable.
>>> >
>>> > BTW, if the expected start happens but fails, then Pacemaker will just
>>> > keep repeating until migration-threshold is hit, at which point it
>>> > will call the RA 'stop' action finally with start_expected=false.
>>> > So that's of no concern.
>>>
>>> To clarify, that's configurable, via start-failure-is-fatal and on-fail
>>
>> Sure.
>>
>>> > Maybe your point was that if the expected start never happens (so
>>> > never even gets a chance to fail), we still want to do a nova
>>> > service-disable?
>>>
>>> That is a good question, which might mean it should be done on every
>>> stop -- or could that cause problems (besides delays)?
>>
>> No, the whole point of adding this feature is to avoid a
>> service-disable on every stop, and instead only do it on the final
>> stop.  If there are corner cases where we never reach the final stop,
>> that's not a disaster because nova will eventually figure it out and
>> do the right thing when the server-agent connection times out.
>>
>>> Another aspect of this is that the proposed feature could only look at a
>>> single transition. What if stop is called with start_expected=false, but
>>> then Pacemaker is able to start the service on the same node in the next
>>> transition immediately afterward? Would having called service-disable
>>> cause problems for that start?
>>
>> We would also need to ensure that service-enable is called on start
>> when necessary.  Perhaps we could track the enable/disable state in a
>> local temporary file, and if the file indicates that we've previously
>> done service-disable, we know to run service-enable on start.  This
>> would avoid calling service-enable on every single start.
>
> feels like an over-optimization
> in fact, the whole thing feels like that if i'm honest.

Today the stars aligned :-)

   http://xkcd.com/1691/

>
> why are we trying to optimise the projected performance impact when
> the system is in terrible shape already?
>
>>
>>> > Yes that would be nice, but this proposal was never intended to
>>> > address that.  I guess we'd need an entirely different mechanism in
>>> > Pacemaker for that.  But let's not allow perfection to become the
>>> > enemy of the good ;-)
>>>
>>> The ultimate concern is that this will encourage people to write RAs
>>> that leave services in a dangerous state after stop is called.
>>
>> I don't see why it would.
>
> Previous experience suggests it definitely will.
>
> People will do exactly what you're thinking but with something important.
> They'll see it behaves as they expect in best-case testing and never
> think about the corner cases.
> Then they'll start thinking about optimising their start operations,
> write some "optimistic" state recording code and break those too.
>
> Imagine a bug in your state recording code (maybe you forget to handle
> a missing state file after reboot) that means the 'enable' does't get
> run.  The service is up, but nova will never use it.
>
>> The new feature will be obscure enough that
>> noone would be able to use it without reading the corresponding
>> documentation first anyway.
>
> I like your optimism.
>
>>
>>> I think with naming and documenting it properly, I'm fine to provide the
>>> option, but I'm on the fence. Beekhof needs a little more convincing :-)
>>
>> Can you provide an example of a potential real-world situation where
>> an RA author would end up accidentally abusing the feature?
>
> You want a real-world 

Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?

2016-06-07 Thread Andrew Beekhof
On Wed, Jun 8, 2016 at 12:11 AM, Adam Spiers  wrote:
> Ken Gaillot  wrote:
>> On 06/06/2016 05:45 PM, Adam Spiers wrote:
>> > Adam Spiers  wrote:
>> >> Andrew Beekhof  wrote:
>> >>> On Tue, Jun 7, 2016 at 8:29 AM, Adam Spiers  wrote:
>>  Ken Gaillot  wrote:
>> > My main question is how useful would it actually be in the proposed use
>> > cases. Considering the possibility that the expected start might never
>> > happen (or fail), can an RA really do anything different if
>> > start_expected=true?
>> 
>>  That's the wrong question :-)
>> 
>> > If the use case is there, I have no problem with
>> > adding it, but I want to make sure it's worthwhile.
>> 
>>  The use case which started this whole thread is for
>>  start_expected=false, not start_expected=true.
>> >>>
>> >>> Isn't this just two sides of the same coin?
>> >>> If you're not doing the same thing for both cases, then you're just
>> >>> reversing the order of the clauses.
>> >>
>> >> No, because the stated concern about unreliable expectations
>> >> ("Considering the possibility that the expected start might never
>> >> happen (or fail)") was regarding start_expected=true, and that's the
>> >> side of the coin we don't care about, so it doesn't matter if it's
>> >> unreliable.
>> >
>> > BTW, if the expected start happens but fails, then Pacemaker will just
>> > keep repeating until migration-threshold is hit, at which point it
>> > will call the RA 'stop' action finally with start_expected=false.
>> > So that's of no concern.
>>
>> To clarify, that's configurable, via start-failure-is-fatal and on-fail
>
> Sure.
>
>> > Maybe your point was that if the expected start never happens (so
>> > never even gets a chance to fail), we still want to do a nova
>> > service-disable?
>>
>> That is a good question, which might mean it should be done on every
>> stop -- or could that cause problems (besides delays)?
>
> No, the whole point of adding this feature is to avoid a
> service-disable on every stop, and instead only do it on the final
> stop.  If there are corner cases where we never reach the final stop,
> that's not a disaster because nova will eventually figure it out and
> do the right thing when the server-agent connection times out.
>
>> Another aspect of this is that the proposed feature could only look at a
>> single transition. What if stop is called with start_expected=false, but
>> then Pacemaker is able to start the service on the same node in the next
>> transition immediately afterward? Would having called service-disable
>> cause problems for that start?
>
> We would also need to ensure that service-enable is called on start
> when necessary.  Perhaps we could track the enable/disable state in a
> local temporary file, and if the file indicates that we've previously
> done service-disable, we know to run service-enable on start.  This
> would avoid calling service-enable on every single start.

feels like an over-optimization
in fact, the whole thing feels like that if i'm honest.

why are we trying to optimise the projected performance impact when
the system is in terrible shape already?

>
>> > Yes that would be nice, but this proposal was never intended to
>> > address that.  I guess we'd need an entirely different mechanism in
>> > Pacemaker for that.  But let's not allow perfection to become the
>> > enemy of the good ;-)
>>
>> The ultimate concern is that this will encourage people to write RAs
>> that leave services in a dangerous state after stop is called.
>
> I don't see why it would.

Previous experience suggests it definitely will.

People will do exactly what you're thinking but with something important.
They'll see it behaves as they expect in best-case testing and never
think about the corner cases.
Then they'll start thinking about optimising their start operations,
write some "optimistic" state recording code and break those too.

Imagine a bug in your state recording code (maybe you forget to handle
a missing state file after reboot) that means the 'enable' does't get
run.  The service is up, but nova will never use it.

> The new feature will be obscure enough that
> noone would be able to use it without reading the corresponding
> documentation first anyway.

I like your optimism.

>
>> I think with naming and documenting it properly, I'm fine to provide the
>> option, but I'm on the fence. Beekhof needs a little more convincing :-)
>
> Can you provide an example of a potential real-world situation where
> an RA author would end up accidentally abusing the feature?

You want a real-world example of how someone could accidentally
mis-using a feature that doesn't exist yet?

Um... if we knew all the weird and wonderful ways people break our
code we'd be able to build a better mouse trap.

>
> Thanks a lot for your continued attention on this!
>
> Adam
>
> 

Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?

2016-06-07 Thread Adam Spiers
Ken Gaillot  wrote:
> On 06/06/2016 05:45 PM, Adam Spiers wrote:
> > Adam Spiers  wrote:
> >> Andrew Beekhof  wrote:
> >>> On Tue, Jun 7, 2016 at 8:29 AM, Adam Spiers  wrote:
>  Ken Gaillot  wrote:
> > My main question is how useful would it actually be in the proposed use
> > cases. Considering the possibility that the expected start might never
> > happen (or fail), can an RA really do anything different if
> > start_expected=true?
> 
>  That's the wrong question :-)
> 
> > If the use case is there, I have no problem with
> > adding it, but I want to make sure it's worthwhile.
> 
>  The use case which started this whole thread is for
>  start_expected=false, not start_expected=true.
> >>>
> >>> Isn't this just two sides of the same coin?
> >>> If you're not doing the same thing for both cases, then you're just
> >>> reversing the order of the clauses.
> >>
> >> No, because the stated concern about unreliable expectations
> >> ("Considering the possibility that the expected start might never
> >> happen (or fail)") was regarding start_expected=true, and that's the
> >> side of the coin we don't care about, so it doesn't matter if it's
> >> unreliable.
> > 
> > BTW, if the expected start happens but fails, then Pacemaker will just
> > keep repeating until migration-threshold is hit, at which point it
> > will call the RA 'stop' action finally with start_expected=false.
> > So that's of no concern.
> 
> To clarify, that's configurable, via start-failure-is-fatal and on-fail

Sure.

> > Maybe your point was that if the expected start never happens (so
> > never even gets a chance to fail), we still want to do a nova
> > service-disable?
> 
> That is a good question, which might mean it should be done on every
> stop -- or could that cause problems (besides delays)?

No, the whole point of adding this feature is to avoid a
service-disable on every stop, and instead only do it on the final
stop.  If there are corner cases where we never reach the final stop,
that's not a disaster because nova will eventually figure it out and
do the right thing when the server-agent connection times out.

> Another aspect of this is that the proposed feature could only look at a
> single transition. What if stop is called with start_expected=false, but
> then Pacemaker is able to start the service on the same node in the next
> transition immediately afterward? Would having called service-disable
> cause problems for that start?

We would also need to ensure that service-enable is called on start
when necessary.  Perhaps we could track the enable/disable state in a
local temporary file, and if the file indicates that we've previously
done service-disable, we know to run service-enable on start.  This
would avoid calling service-enable on every single start.

> > Yes that would be nice, but this proposal was never intended to
> > address that.  I guess we'd need an entirely different mechanism in
> > Pacemaker for that.  But let's not allow perfection to become the
> > enemy of the good ;-)
> 
> The ultimate concern is that this will encourage people to write RAs
> that leave services in a dangerous state after stop is called.

I don't see why it would.  The new feature will be obscure enough that
noone would be able to use it without reading the corresponding
documentation first anyway.

> I think with naming and documenting it properly, I'm fine to provide the
> option, but I'm on the fence. Beekhof needs a little more convincing :-)

Can you provide an example of a potential real-world situation where
an RA author would end up accidentally abusing the feature?

Thanks a lot for your continued attention on this!

Adam

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?

2016-06-06 Thread Vladislav Bogdanov

07.06.2016 02:20, Ken Gaillot wrote:

On 06/06/2016 03:30 PM, Vladislav Bogdanov wrote:

06.06.2016 22:43, Ken Gaillot wrote:

On 06/06/2016 12:25 PM, Vladislav Bogdanov wrote:

06.06.2016 19:39, Ken Gaillot wrote:

On 06/05/2016 07:27 PM, Andrew Beekhof wrote:

On Sat, Jun 4, 2016 at 12:16 AM, Ken Gaillot 
wrote:

On 06/02/2016 08:01 PM, Andrew Beekhof wrote:

On Fri, May 20, 2016 at 1:53 AM, Ken Gaillot 
wrote:

A recent thread discussed a proposed new feature, a new environment
variable that would be passed to resource agents, indicating
whether a
stop action was part of a recovery.

Since that thread was long and covered a lot of topics, I'm
starting a
new one to focus on the core issue remaining:

The original idea was to pass the number of restarts remaining
before
the resource will no longer tried to be started on the same node.
This
involves calculating (fail-count - migration-threshold), and that
implies certain limitations: (1) it will only be set when the
cluster
checks migration-threshold; (2) it will only be set for the failed
resource itself, not for other resources that may be recovered
due to
dependencies on it.

Ulrich Windl proposed an alternative: setting a boolean value
instead. I
forgot to cc the list on my reply, so I'll summarize now: We would
set a
new variable like OCF_RESKEY_CRM_recovery=true


This concept worries me, especially when what we've implemented is
called OCF_RESKEY_CRM_restarting.


Agreed; I plan to rename it yet again, to
OCF_RESKEY_CRM_start_expected.


The name alone encourages people to "optimise" the agent to not
actually stop the service "because its just going to start again
shortly".  I know thats not what Adam would do, but not everyone
understands how clusters work.

There are any number of reasons why a cluster that intends to
restart
a service may not do so.  In such a scenario, a badly written agent
would cause the cluster to mistakenly believe that the service is
stopped - allowing it to start elsewhere.

Its true there are any number of ways to write bad agents, but I
would
argue that we shouldn't be nudging people in that direction :)


I do have mixed feelings about that. I think if we name it
start_expected, and document it carefully, we can avoid any casual
mistakes.

My main question is how useful would it actually be in the
proposed use
cases. Considering the possibility that the expected start might
never
happen (or fail), can an RA really do anything different if
start_expected=true?


I would have thought not.  Correctness should trump optimal.
But I'm prepared to be mistaken.


If the use case is there, I have no problem with
adding it, but I want to make sure it's worthwhile.


Anyone have comments on this?

A simple example: pacemaker calls an RA stop with start_expected=true,
then before the start happens, someone disables the resource, so the
start is never called. Or the node is fenced before the start happens,
etc.

Is there anything significant an RA can do differently based on
start_expected=true/false without causing problems if an expected start
never happens?


Yep.

It may request stop of other resources
* on that node by removing some node attributes which participate in
location constraints
* or cluster-wide by revoking/putting to standby cluster ticket other
resources depend on

Latter case is that's why I asked about the possibility of passing the
node name resource is intended to be started on instead of a boolean
value (in comments to PR #1026) - I would use it to request stop of
lustre MDTs and OSTs by revoking ticket they depend on if MGS (primary
lustre component which does all "request routing") fails to start
anywhere in cluster. That way, if RA does not receive any node name,


Why would ordering constraints be insufficient?


They are in place, but advisory ones to allow MGS fail/switch-over.


What happens if the MDTs/OSTs continue running because a start of MGS
was expected, but something prevents the start from actually happening?


Nothing critical, lustre clients won't be able to contact them without
MGS running and will hang.
But it is safer to shutdown them if it is known that MGS cannot be
started right now. Especially if geo-cluster failover is expected in
that case (as MGS can be local to a site, countrary to all other lustre
parts which need to be replicated). Actually that is the only part of a
puzzle remaining to "solve" that big project, and IMHO it is enough to
have a node name of a intended start or nothing in that attribute
(nothing means stop everything and initiate geo-failover if needed). If
f.e. fencing happens for a node intended to start resource, then stop
will be called again after the next start failure after failure-timeout
lapses. That would be much better than no information at all. Total stop
or geo-failover will happen just with some (configurable) delay instead
of rendering the whole filesystem to an unusable state requiring manual

Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?

2016-06-06 Thread Andrew Beekhof
On Tue, Jun 7, 2016 at 9:07 AM, Ken Gaillot  wrote:
> On 06/06/2016 05:45 PM, Adam Spiers wrote:
>> Adam Spiers  wrote:
>>> Andrew Beekhof  wrote:
 On Tue, Jun 7, 2016 at 8:29 AM, Adam Spiers  wrote:
> Ken Gaillot  wrote:
>> My main question is how useful would it actually be in the proposed use
>> cases. Considering the possibility that the expected start might never
>> happen (or fail), can an RA really do anything different if
>> start_expected=true?
>
> That's the wrong question :-)
>
>> If the use case is there, I have no problem with
>> adding it, but I want to make sure it's worthwhile.
>
> The use case which started this whole thread is for
> start_expected=false, not start_expected=true.

 Isn't this just two sides of the same coin?
 If you're not doing the same thing for both cases, then you're just
 reversing the order of the clauses.
>>>
>>> No, because the stated concern about unreliable expectations
>>> ("Considering the possibility that the expected start might never
>>> happen (or fail)") was regarding start_expected=true, and that's the
>>> side of the coin we don't care about, so it doesn't matter if it's
>>> unreliable.
>>
>> BTW, if the expected start happens but fails, then Pacemaker will just
>> keep repeating until migration-threshold is hit, at which point it
>> will call the RA 'stop' action finally with start_expected=false.
>> So that's of no concern.
>
> To clarify, that's configurable, via start-failure-is-fatal and on-fail
>
>> Maybe your point was that if the expected start never happens (so
>> never even gets a chance to fail), we still want to do a nova
>> service-disable?
>
> That is a good question, which might mean it should be done on every
> stop -- or could that cause problems (besides delays)?
>
> Another aspect of this is that the proposed feature could only look at a
> single transition. What if stop is called with start_expected=false, but
> then Pacemaker is able to start the service on the same node in the next
> transition immediately afterward? Would having called service-disable
> cause problems for that start?
>
>> Yes that would be nice, but this proposal was never intended to
>> address that.  I guess we'd need an entirely different mechanism in
>> Pacemaker for that.  But let's not allow perfection to become the
>> enemy of the good ;-)
>
> The ultimate concern is that this will encourage people to write RAs
> that leave services in a dangerous state after stop is called.
>
> I think with naming and documenting it properly, I'm fine to provide the
> option, but I'm on the fence. Beekhof needs a little more convincing :-)

I think the new name is a big step in the right direction

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?

2016-06-06 Thread Andrew Beekhof
On Tue, Jun 7, 2016 at 8:45 AM, Adam Spiers  wrote:
> Adam Spiers  wrote:
>> Andrew Beekhof  wrote:
>> > On Tue, Jun 7, 2016 at 8:29 AM, Adam Spiers  wrote:
>> > > Ken Gaillot  wrote:
>> > >> My main question is how useful would it actually be in the proposed use
>> > >> cases. Considering the possibility that the expected start might never
>> > >> happen (or fail), can an RA really do anything different if
>> > >> start_expected=true?
>> > >
>> > > That's the wrong question :-)
>> > >
>> > >> If the use case is there, I have no problem with
>> > >> adding it, but I want to make sure it's worthwhile.
>> > >
>> > > The use case which started this whole thread is for
>> > > start_expected=false, not start_expected=true.
>> >
>> > Isn't this just two sides of the same coin?
>> > If you're not doing the same thing for both cases, then you're just
>> > reversing the order of the clauses.
>>
>> No, because the stated concern about unreliable expectations
>> ("Considering the possibility that the expected start might never
>> happen (or fail)") was regarding start_expected=true, and that's the
>> side of the coin we don't care about, so it doesn't matter if it's
>> unreliable.
>
> BTW, if the expected start happens but fails, then Pacemaker will just
> keep repeating until migration-threshold is hit, at which point it
> will call the RA 'stop' action finally with start_expected=false.

Maybe. Maybe not. People cannot rely on this and I'd put money on them
trying :-)

> So that's of no concern.
>
> Maybe your point was that if the expected start never happens (so
> never even gets a chance to fail), we still want to do a nova
> service-disable?

Exactly :)

>
> Yes that would be nice, but this proposal was never intended to
> address that.  I guess we'd need an entirely different mechanism in
> Pacemaker for that.  But let's not allow perfection to become the
> enemy of the good ;-)
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?

2016-06-06 Thread Ken Gaillot
On 06/06/2016 03:30 PM, Vladislav Bogdanov wrote:
> 06.06.2016 22:43, Ken Gaillot wrote:
>> On 06/06/2016 12:25 PM, Vladislav Bogdanov wrote:
>>> 06.06.2016 19:39, Ken Gaillot wrote:
 On 06/05/2016 07:27 PM, Andrew Beekhof wrote:
> On Sat, Jun 4, 2016 at 12:16 AM, Ken Gaillot 
> wrote:
>> On 06/02/2016 08:01 PM, Andrew Beekhof wrote:
>>> On Fri, May 20, 2016 at 1:53 AM, Ken Gaillot 
>>> wrote:
 A recent thread discussed a proposed new feature, a new environment
 variable that would be passed to resource agents, indicating
 whether a
 stop action was part of a recovery.

 Since that thread was long and covered a lot of topics, I'm
 starting a
 new one to focus on the core issue remaining:

 The original idea was to pass the number of restarts remaining
 before
 the resource will no longer tried to be started on the same node.
 This
 involves calculating (fail-count - migration-threshold), and that
 implies certain limitations: (1) it will only be set when the
 cluster
 checks migration-threshold; (2) it will only be set for the failed
 resource itself, not for other resources that may be recovered
 due to
 dependencies on it.

 Ulrich Windl proposed an alternative: setting a boolean value
 instead. I
 forgot to cc the list on my reply, so I'll summarize now: We would
 set a
 new variable like OCF_RESKEY_CRM_recovery=true
>>>
>>> This concept worries me, especially when what we've implemented is
>>> called OCF_RESKEY_CRM_restarting.
>>
>> Agreed; I plan to rename it yet again, to
>> OCF_RESKEY_CRM_start_expected.
>>
>>> The name alone encourages people to "optimise" the agent to not
>>> actually stop the service "because its just going to start again
>>> shortly".  I know thats not what Adam would do, but not everyone
>>> understands how clusters work.
>>>
>>> There are any number of reasons why a cluster that intends to
>>> restart
>>> a service may not do so.  In such a scenario, a badly written agent
>>> would cause the cluster to mistakenly believe that the service is
>>> stopped - allowing it to start elsewhere.
>>>
>>> Its true there are any number of ways to write bad agents, but I
>>> would
>>> argue that we shouldn't be nudging people in that direction :)
>>
>> I do have mixed feelings about that. I think if we name it
>> start_expected, and document it carefully, we can avoid any casual
>> mistakes.
>>
>> My main question is how useful would it actually be in the
>> proposed use
>> cases. Considering the possibility that the expected start might
>> never
>> happen (or fail), can an RA really do anything different if
>> start_expected=true?
>
> I would have thought not.  Correctness should trump optimal.
> But I'm prepared to be mistaken.
>
>> If the use case is there, I have no problem with
>> adding it, but I want to make sure it's worthwhile.

 Anyone have comments on this?

 A simple example: pacemaker calls an RA stop with start_expected=true,
 then before the start happens, someone disables the resource, so the
 start is never called. Or the node is fenced before the start happens,
 etc.

 Is there anything significant an RA can do differently based on
 start_expected=true/false without causing problems if an expected start
 never happens?
>>>
>>> Yep.
>>>
>>> It may request stop of other resources
>>> * on that node by removing some node attributes which participate in
>>> location constraints
>>> * or cluster-wide by revoking/putting to standby cluster ticket other
>>> resources depend on
>>>
>>> Latter case is that's why I asked about the possibility of passing the
>>> node name resource is intended to be started on instead of a boolean
>>> value (in comments to PR #1026) - I would use it to request stop of
>>> lustre MDTs and OSTs by revoking ticket they depend on if MGS (primary
>>> lustre component which does all "request routing") fails to start
>>> anywhere in cluster. That way, if RA does not receive any node name,
>>
>> Why would ordering constraints be insufficient?
> 
> They are in place, but advisory ones to allow MGS fail/switch-over.
>>
>> What happens if the MDTs/OSTs continue running because a start of MGS
>> was expected, but something prevents the start from actually happening?
> 
> Nothing critical, lustre clients won't be able to contact them without
> MGS running and will hang.
> But it is safer to shutdown them if it is known that MGS cannot be
> started right now. Especially if geo-cluster failover is expected in
> that case (as MGS can be local to a site, countrary to all other lustre
> parts which need to 

Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?

2016-06-06 Thread Ken Gaillot
On 06/06/2016 05:45 PM, Adam Spiers wrote:
> Adam Spiers  wrote:
>> Andrew Beekhof  wrote:
>>> On Tue, Jun 7, 2016 at 8:29 AM, Adam Spiers  wrote:
 Ken Gaillot  wrote:
> My main question is how useful would it actually be in the proposed use
> cases. Considering the possibility that the expected start might never
> happen (or fail), can an RA really do anything different if
> start_expected=true?

 That's the wrong question :-)

> If the use case is there, I have no problem with
> adding it, but I want to make sure it's worthwhile.

 The use case which started this whole thread is for
 start_expected=false, not start_expected=true.
>>>
>>> Isn't this just two sides of the same coin?
>>> If you're not doing the same thing for both cases, then you're just
>>> reversing the order of the clauses.
>>
>> No, because the stated concern about unreliable expectations
>> ("Considering the possibility that the expected start might never
>> happen (or fail)") was regarding start_expected=true, and that's the
>> side of the coin we don't care about, so it doesn't matter if it's
>> unreliable.
> 
> BTW, if the expected start happens but fails, then Pacemaker will just
> keep repeating until migration-threshold is hit, at which point it
> will call the RA 'stop' action finally with start_expected=false.
> So that's of no concern.

To clarify, that's configurable, via start-failure-is-fatal and on-fail

> Maybe your point was that if the expected start never happens (so
> never even gets a chance to fail), we still want to do a nova
> service-disable?

That is a good question, which might mean it should be done on every
stop -- or could that cause problems (besides delays)?

Another aspect of this is that the proposed feature could only look at a
single transition. What if stop is called with start_expected=false, but
then Pacemaker is able to start the service on the same node in the next
transition immediately afterward? Would having called service-disable
cause problems for that start?

> Yes that would be nice, but this proposal was never intended to
> address that.  I guess we'd need an entirely different mechanism in
> Pacemaker for that.  But let's not allow perfection to become the
> enemy of the good ;-)

The ultimate concern is that this will encourage people to write RAs
that leave services in a dangerous state after stop is called.

I think with naming and documenting it properly, I'm fine to provide the
option, but I'm on the fence. Beekhof needs a little more convincing :-)

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?

2016-06-06 Thread Adam Spiers
Adam Spiers  wrote:
> Andrew Beekhof  wrote:
> > On Tue, Jun 7, 2016 at 8:29 AM, Adam Spiers  wrote:
> > > Ken Gaillot  wrote:
> > >> My main question is how useful would it actually be in the proposed use
> > >> cases. Considering the possibility that the expected start might never
> > >> happen (or fail), can an RA really do anything different if
> > >> start_expected=true?
> > >
> > > That's the wrong question :-)
> > >
> > >> If the use case is there, I have no problem with
> > >> adding it, but I want to make sure it's worthwhile.
> > >
> > > The use case which started this whole thread is for
> > > start_expected=false, not start_expected=true.
> > 
> > Isn't this just two sides of the same coin?
> > If you're not doing the same thing for both cases, then you're just
> > reversing the order of the clauses.
> 
> No, because the stated concern about unreliable expectations
> ("Considering the possibility that the expected start might never
> happen (or fail)") was regarding start_expected=true, and that's the
> side of the coin we don't care about, so it doesn't matter if it's
> unreliable.

BTW, if the expected start happens but fails, then Pacemaker will just
keep repeating until migration-threshold is hit, at which point it
will call the RA 'stop' action finally with start_expected=false.
So that's of no concern.

Maybe your point was that if the expected start never happens (so
never even gets a chance to fail), we still want to do a nova
service-disable?

Yes that would be nice, but this proposal was never intended to
address that.  I guess we'd need an entirely different mechanism in
Pacemaker for that.  But let's not allow perfection to become the
enemy of the good ;-)

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?

2016-06-06 Thread Adam Spiers
Andrew Beekhof  wrote:
> On Tue, Jun 7, 2016 at 8:29 AM, Adam Spiers  wrote:
> > Ken Gaillot  wrote:
> >> On 06/02/2016 08:01 PM, Andrew Beekhof wrote:
> >> > On Fri, May 20, 2016 at 1:53 AM, Ken Gaillot  wrote:
> >> >> A recent thread discussed a proposed new feature, a new environment
> >> >> variable that would be passed to resource agents, indicating whether a
> >> >> stop action was part of a recovery.
> >> >>
> >> >> Since that thread was long and covered a lot of topics, I'm starting a
> >> >> new one to focus on the core issue remaining:
> >> >>
> >> >> The original idea was to pass the number of restarts remaining before
> >> >> the resource will no longer tried to be started on the same node. This
> >> >> involves calculating (fail-count - migration-threshold), and that
> >> >> implies certain limitations: (1) it will only be set when the cluster
> >> >> checks migration-threshold; (2) it will only be set for the failed
> >> >> resource itself, not for other resources that may be recovered due to
> >> >> dependencies on it.
> >> >>
> >> >> Ulrich Windl proposed an alternative: setting a boolean value instead. I
> >> >> forgot to cc the list on my reply, so I'll summarize now: We would set a
> >> >> new variable like OCF_RESKEY_CRM_recovery=true
> >> >
> >> > This concept worries me, especially when what we've implemented is
> >> > called OCF_RESKEY_CRM_restarting.
> >>
> >> Agreed; I plan to rename it yet again, to OCF_RESKEY_CRM_start_expected.
> >
> > [snipped]
> >
> >> My main question is how useful would it actually be in the proposed use
> >> cases. Considering the possibility that the expected start might never
> >> happen (or fail), can an RA really do anything different if
> >> start_expected=true?
> >
> > That's the wrong question :-)
> >
> >> If the use case is there, I have no problem with
> >> adding it, but I want to make sure it's worthwhile.
> >
> > The use case which started this whole thread is for
> > start_expected=false, not start_expected=true.
> 
> Isn't this just two sides of the same coin?
> If you're not doing the same thing for both cases, then you're just
> reversing the order of the clauses.

No, because the stated concern about unreliable expectations
("Considering the possibility that the expected start might never
happen (or fail)") was regarding start_expected=true, and that's the
side of the coin we don't care about, so it doesn't matter if it's
unreliable.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?

2016-06-06 Thread Andrew Beekhof
On Tue, Jun 7, 2016 at 8:29 AM, Adam Spiers  wrote:
> Ken Gaillot  wrote:
>> On 06/02/2016 08:01 PM, Andrew Beekhof wrote:
>> > On Fri, May 20, 2016 at 1:53 AM, Ken Gaillot  wrote:
>> >> A recent thread discussed a proposed new feature, a new environment
>> >> variable that would be passed to resource agents, indicating whether a
>> >> stop action was part of a recovery.
>> >>
>> >> Since that thread was long and covered a lot of topics, I'm starting a
>> >> new one to focus on the core issue remaining:
>> >>
>> >> The original idea was to pass the number of restarts remaining before
>> >> the resource will no longer tried to be started on the same node. This
>> >> involves calculating (fail-count - migration-threshold), and that
>> >> implies certain limitations: (1) it will only be set when the cluster
>> >> checks migration-threshold; (2) it will only be set for the failed
>> >> resource itself, not for other resources that may be recovered due to
>> >> dependencies on it.
>> >>
>> >> Ulrich Windl proposed an alternative: setting a boolean value instead. I
>> >> forgot to cc the list on my reply, so I'll summarize now: We would set a
>> >> new variable like OCF_RESKEY_CRM_recovery=true
>> >
>> > This concept worries me, especially when what we've implemented is
>> > called OCF_RESKEY_CRM_restarting.
>>
>> Agreed; I plan to rename it yet again, to OCF_RESKEY_CRM_start_expected.
>
> [snipped]
>
>> My main question is how useful would it actually be in the proposed use
>> cases. Considering the possibility that the expected start might never
>> happen (or fail), can an RA really do anything different if
>> start_expected=true?
>
> That's the wrong question :-)
>
>> If the use case is there, I have no problem with
>> adding it, but I want to make sure it's worthwhile.
>
> The use case which started this whole thread is for
> start_expected=false, not start_expected=true.

Isn't this just two sides of the same coin?
If you're not doing the same thing for both cases, then you're just
reversing the order of the clauses.

"A isn't different from B, B is different from A!" :-)

> When it's false for
> NovaCompute, we call nova service-disable to ensure that nova doesn't
> attempt to schedule any more VMs on that host.
>
> If start_expected=true, we don't *want* to do anything different.  So
> it doesn't matter even if the expected start never happens.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?

2016-06-06 Thread Adam Spiers
Ken Gaillot  wrote:
> On 06/02/2016 08:01 PM, Andrew Beekhof wrote:
> > On Fri, May 20, 2016 at 1:53 AM, Ken Gaillot  wrote:
> >> A recent thread discussed a proposed new feature, a new environment
> >> variable that would be passed to resource agents, indicating whether a
> >> stop action was part of a recovery.
> >>
> >> Since that thread was long and covered a lot of topics, I'm starting a
> >> new one to focus on the core issue remaining:
> >>
> >> The original idea was to pass the number of restarts remaining before
> >> the resource will no longer tried to be started on the same node. This
> >> involves calculating (fail-count - migration-threshold), and that
> >> implies certain limitations: (1) it will only be set when the cluster
> >> checks migration-threshold; (2) it will only be set for the failed
> >> resource itself, not for other resources that may be recovered due to
> >> dependencies on it.
> >>
> >> Ulrich Windl proposed an alternative: setting a boolean value instead. I
> >> forgot to cc the list on my reply, so I'll summarize now: We would set a
> >> new variable like OCF_RESKEY_CRM_recovery=true
> > 
> > This concept worries me, especially when what we've implemented is
> > called OCF_RESKEY_CRM_restarting.
> 
> Agreed; I plan to rename it yet again, to OCF_RESKEY_CRM_start_expected.

[snipped]

> My main question is how useful would it actually be in the proposed use
> cases. Considering the possibility that the expected start might never
> happen (or fail), can an RA really do anything different if
> start_expected=true?

That's the wrong question :-)

> If the use case is there, I have no problem with
> adding it, but I want to make sure it's worthwhile.

The use case which started this whole thread is for
start_expected=false, not start_expected=true.  When it's false for
NovaCompute, we call nova service-disable to ensure that nova doesn't
attempt to schedule any more VMs on that host.

If start_expected=true, we don't *want* to do anything different.  So
it doesn't matter even if the expected start never happens.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?

2016-06-06 Thread Vladislav Bogdanov

06.06.2016 22:43, Ken Gaillot wrote:

On 06/06/2016 12:25 PM, Vladislav Bogdanov wrote:

06.06.2016 19:39, Ken Gaillot wrote:

On 06/05/2016 07:27 PM, Andrew Beekhof wrote:

On Sat, Jun 4, 2016 at 12:16 AM, Ken Gaillot 
wrote:

On 06/02/2016 08:01 PM, Andrew Beekhof wrote:

On Fri, May 20, 2016 at 1:53 AM, Ken Gaillot 
wrote:

A recent thread discussed a proposed new feature, a new environment
variable that would be passed to resource agents, indicating
whether a
stop action was part of a recovery.

Since that thread was long and covered a lot of topics, I'm
starting a
new one to focus on the core issue remaining:

The original idea was to pass the number of restarts remaining before
the resource will no longer tried to be started on the same node.
This
involves calculating (fail-count - migration-threshold), and that
implies certain limitations: (1) it will only be set when the cluster
checks migration-threshold; (2) it will only be set for the failed
resource itself, not for other resources that may be recovered due to
dependencies on it.

Ulrich Windl proposed an alternative: setting a boolean value
instead. I
forgot to cc the list on my reply, so I'll summarize now: We would
set a
new variable like OCF_RESKEY_CRM_recovery=true


This concept worries me, especially when what we've implemented is
called OCF_RESKEY_CRM_restarting.


Agreed; I plan to rename it yet again, to
OCF_RESKEY_CRM_start_expected.


The name alone encourages people to "optimise" the agent to not
actually stop the service "because its just going to start again
shortly".  I know thats not what Adam would do, but not everyone
understands how clusters work.

There are any number of reasons why a cluster that intends to restart
a service may not do so.  In such a scenario, a badly written agent
would cause the cluster to mistakenly believe that the service is
stopped - allowing it to start elsewhere.

Its true there are any number of ways to write bad agents, but I would
argue that we shouldn't be nudging people in that direction :)


I do have mixed feelings about that. I think if we name it
start_expected, and document it carefully, we can avoid any casual
mistakes.

My main question is how useful would it actually be in the proposed use
cases. Considering the possibility that the expected start might never
happen (or fail), can an RA really do anything different if
start_expected=true?


I would have thought not.  Correctness should trump optimal.
But I'm prepared to be mistaken.


If the use case is there, I have no problem with
adding it, but I want to make sure it's worthwhile.


Anyone have comments on this?

A simple example: pacemaker calls an RA stop with start_expected=true,
then before the start happens, someone disables the resource, so the
start is never called. Or the node is fenced before the start happens,
etc.

Is there anything significant an RA can do differently based on
start_expected=true/false without causing problems if an expected start
never happens?


Yep.

It may request stop of other resources
* on that node by removing some node attributes which participate in
location constraints
* or cluster-wide by revoking/putting to standby cluster ticket other
resources depend on

Latter case is that's why I asked about the possibility of passing the
node name resource is intended to be started on instead of a boolean
value (in comments to PR #1026) - I would use it to request stop of
lustre MDTs and OSTs by revoking ticket they depend on if MGS (primary
lustre component which does all "request routing") fails to start
anywhere in cluster. That way, if RA does not receive any node name,


Why would ordering constraints be insufficient?


They are in place, but advisory ones to allow MGS fail/switch-over.


What happens if the MDTs/OSTs continue running because a start of MGS
was expected, but something prevents the start from actually happening?


Nothing critical, lustre clients won't be able to contact them without 
MGS running and will hang.
But it is safer to shutdown them if it is known that MGS cannot be 
started right now. Especially if geo-cluster failover is expected in 
that case (as MGS can be local to a site, countrary to all other lustre 
parts which need to be replicated). Actually that is the only part of a 
puzzle remaining to "solve" that big project, and IMHO it is enough to 
have a node name of a intended start or nothing in that attribute 
(nothing means stop everything and initiate geo-failover if needed). If 
f.e. fencing happens for a node intended to start resource, then stop 
will be called again after the next start failure after failure-timeout 
lapses. That would be much better than no information at all. Total stop 
or geo-failover will happen just with some (configurable) delay instead 
of rendering the whole filesystem to an unusable state requiring manual 
intervention.





then it can be "almost sure" pacemaker does not intend to restart

Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?

2016-06-06 Thread Ken Gaillot
On 06/06/2016 12:25 PM, Vladislav Bogdanov wrote:
> 06.06.2016 19:39, Ken Gaillot wrote:
>> On 06/05/2016 07:27 PM, Andrew Beekhof wrote:
>>> On Sat, Jun 4, 2016 at 12:16 AM, Ken Gaillot 
>>> wrote:
 On 06/02/2016 08:01 PM, Andrew Beekhof wrote:
> On Fri, May 20, 2016 at 1:53 AM, Ken Gaillot 
> wrote:
>> A recent thread discussed a proposed new feature, a new environment
>> variable that would be passed to resource agents, indicating
>> whether a
>> stop action was part of a recovery.
>>
>> Since that thread was long and covered a lot of topics, I'm
>> starting a
>> new one to focus on the core issue remaining:
>>
>> The original idea was to pass the number of restarts remaining before
>> the resource will no longer tried to be started on the same node.
>> This
>> involves calculating (fail-count - migration-threshold), and that
>> implies certain limitations: (1) it will only be set when the cluster
>> checks migration-threshold; (2) it will only be set for the failed
>> resource itself, not for other resources that may be recovered due to
>> dependencies on it.
>>
>> Ulrich Windl proposed an alternative: setting a boolean value
>> instead. I
>> forgot to cc the list on my reply, so I'll summarize now: We would
>> set a
>> new variable like OCF_RESKEY_CRM_recovery=true
>
> This concept worries me, especially when what we've implemented is
> called OCF_RESKEY_CRM_restarting.

 Agreed; I plan to rename it yet again, to
 OCF_RESKEY_CRM_start_expected.

> The name alone encourages people to "optimise" the agent to not
> actually stop the service "because its just going to start again
> shortly".  I know thats not what Adam would do, but not everyone
> understands how clusters work.
>
> There are any number of reasons why a cluster that intends to restart
> a service may not do so.  In such a scenario, a badly written agent
> would cause the cluster to mistakenly believe that the service is
> stopped - allowing it to start elsewhere.
>
> Its true there are any number of ways to write bad agents, but I would
> argue that we shouldn't be nudging people in that direction :)

 I do have mixed feelings about that. I think if we name it
 start_expected, and document it carefully, we can avoid any casual
 mistakes.

 My main question is how useful would it actually be in the proposed use
 cases. Considering the possibility that the expected start might never
 happen (or fail), can an RA really do anything different if
 start_expected=true?
>>>
>>> I would have thought not.  Correctness should trump optimal.
>>> But I'm prepared to be mistaken.
>>>
 If the use case is there, I have no problem with
 adding it, but I want to make sure it's worthwhile.
>>
>> Anyone have comments on this?
>>
>> A simple example: pacemaker calls an RA stop with start_expected=true,
>> then before the start happens, someone disables the resource, so the
>> start is never called. Or the node is fenced before the start happens,
>> etc.
>>
>> Is there anything significant an RA can do differently based on
>> start_expected=true/false without causing problems if an expected start
>> never happens?
> 
> Yep.
> 
> It may request stop of other resources
> * on that node by removing some node attributes which participate in
> location constraints
> * or cluster-wide by revoking/putting to standby cluster ticket other
> resources depend on
> 
> Latter case is that's why I asked about the possibility of passing the
> node name resource is intended to be started on instead of a boolean
> value (in comments to PR #1026) - I would use it to request stop of
> lustre MDTs and OSTs by revoking ticket they depend on if MGS (primary
> lustre component which does all "request routing") fails to start
> anywhere in cluster. That way, if RA does not receive any node name,

Why would ordering constraints be insufficient?

What happens if the MDTs/OSTs continue running because a start of MGS
was expected, but something prevents the start from actually happening?

> then it can be "almost sure" pacemaker does not intend to restart
> resource (yet) and can request it to stop everything else (because
> filesystem is not usable anyways). Later, if another start attempt
> (caused by failure-timeout expiration) succeeds, RA may grant the ticket
> back, and all other resources start again.
> 
> Best,
> Vladislav

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?

2016-06-06 Thread Vladislav Bogdanov

06.06.2016 19:39, Ken Gaillot wrote:

On 06/05/2016 07:27 PM, Andrew Beekhof wrote:

On Sat, Jun 4, 2016 at 12:16 AM, Ken Gaillot  wrote:

On 06/02/2016 08:01 PM, Andrew Beekhof wrote:

On Fri, May 20, 2016 at 1:53 AM, Ken Gaillot  wrote:

A recent thread discussed a proposed new feature, a new environment
variable that would be passed to resource agents, indicating whether a
stop action was part of a recovery.

Since that thread was long and covered a lot of topics, I'm starting a
new one to focus on the core issue remaining:

The original idea was to pass the number of restarts remaining before
the resource will no longer tried to be started on the same node. This
involves calculating (fail-count - migration-threshold), and that
implies certain limitations: (1) it will only be set when the cluster
checks migration-threshold; (2) it will only be set for the failed
resource itself, not for other resources that may be recovered due to
dependencies on it.

Ulrich Windl proposed an alternative: setting a boolean value instead. I
forgot to cc the list on my reply, so I'll summarize now: We would set a
new variable like OCF_RESKEY_CRM_recovery=true


This concept worries me, especially when what we've implemented is
called OCF_RESKEY_CRM_restarting.


Agreed; I plan to rename it yet again, to OCF_RESKEY_CRM_start_expected.


The name alone encourages people to "optimise" the agent to not
actually stop the service "because its just going to start again
shortly".  I know thats not what Adam would do, but not everyone
understands how clusters work.

There are any number of reasons why a cluster that intends to restart
a service may not do so.  In such a scenario, a badly written agent
would cause the cluster to mistakenly believe that the service is
stopped - allowing it to start elsewhere.

Its true there are any number of ways to write bad agents, but I would
argue that we shouldn't be nudging people in that direction :)


I do have mixed feelings about that. I think if we name it
start_expected, and document it carefully, we can avoid any casual mistakes.

My main question is how useful would it actually be in the proposed use
cases. Considering the possibility that the expected start might never
happen (or fail), can an RA really do anything different if
start_expected=true?


I would have thought not.  Correctness should trump optimal.
But I'm prepared to be mistaken.


If the use case is there, I have no problem with
adding it, but I want to make sure it's worthwhile.


Anyone have comments on this?

A simple example: pacemaker calls an RA stop with start_expected=true,
then before the start happens, someone disables the resource, so the
start is never called. Or the node is fenced before the start happens, etc.

Is there anything significant an RA can do differently based on
start_expected=true/false without causing problems if an expected start
never happens?


Yep.

It may request stop of other resources
* on that node by removing some node attributes which participate in 
location constraints
* or cluster-wide by revoking/putting to standby cluster ticket other 
resources depend on


Latter case is that's why I asked about the possibility of passing the 
node name resource is intended to be started on instead of a boolean 
value (in comments to PR #1026) - I would use it to request stop of 
lustre MDTs and OSTs by revoking ticket they depend on if MGS (primary 
lustre component which does all "request routing") fails to start 
anywhere in cluster. That way, if RA does not receive any node name, 
then it can be "almost sure" pacemaker does not intend to restart 
resource (yet) and can request it to stop everything else (because 
filesystem is not usable anyways). Later, if another start attempt 
(caused by failure-timeout expiration) succeeds, RA may grant the ticket 
back, and all other resources start again.


Best,
Vladislav



___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org




___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?

2016-06-06 Thread Ken Gaillot
On 06/05/2016 07:27 PM, Andrew Beekhof wrote:
> On Sat, Jun 4, 2016 at 12:16 AM, Ken Gaillot  wrote:
>> On 06/02/2016 08:01 PM, Andrew Beekhof wrote:
>>> On Fri, May 20, 2016 at 1:53 AM, Ken Gaillot  wrote:
 A recent thread discussed a proposed new feature, a new environment
 variable that would be passed to resource agents, indicating whether a
 stop action was part of a recovery.

 Since that thread was long and covered a lot of topics, I'm starting a
 new one to focus on the core issue remaining:

 The original idea was to pass the number of restarts remaining before
 the resource will no longer tried to be started on the same node. This
 involves calculating (fail-count - migration-threshold), and that
 implies certain limitations: (1) it will only be set when the cluster
 checks migration-threshold; (2) it will only be set for the failed
 resource itself, not for other resources that may be recovered due to
 dependencies on it.

 Ulrich Windl proposed an alternative: setting a boolean value instead. I
 forgot to cc the list on my reply, so I'll summarize now: We would set a
 new variable like OCF_RESKEY_CRM_recovery=true
>>>
>>> This concept worries me, especially when what we've implemented is
>>> called OCF_RESKEY_CRM_restarting.
>>
>> Agreed; I plan to rename it yet again, to OCF_RESKEY_CRM_start_expected.
>>
>>> The name alone encourages people to "optimise" the agent to not
>>> actually stop the service "because its just going to start again
>>> shortly".  I know thats not what Adam would do, but not everyone
>>> understands how clusters work.
>>>
>>> There are any number of reasons why a cluster that intends to restart
>>> a service may not do so.  In such a scenario, a badly written agent
>>> would cause the cluster to mistakenly believe that the service is
>>> stopped - allowing it to start elsewhere.
>>>
>>> Its true there are any number of ways to write bad agents, but I would
>>> argue that we shouldn't be nudging people in that direction :)
>>
>> I do have mixed feelings about that. I think if we name it
>> start_expected, and document it carefully, we can avoid any casual mistakes.
>>
>> My main question is how useful would it actually be in the proposed use
>> cases. Considering the possibility that the expected start might never
>> happen (or fail), can an RA really do anything different if
>> start_expected=true?
> 
> I would have thought not.  Correctness should trump optimal.
> But I'm prepared to be mistaken.
> 
>> If the use case is there, I have no problem with
>> adding it, but I want to make sure it's worthwhile.

Anyone have comments on this?

A simple example: pacemaker calls an RA stop with start_expected=true,
then before the start happens, someone disables the resource, so the
start is never called. Or the node is fenced before the start happens, etc.

Is there anything significant an RA can do differently based on
start_expected=true/false without causing problems if an expected start
never happens?

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?

2016-06-05 Thread Andrew Beekhof
On Sat, Jun 4, 2016 at 12:16 AM, Ken Gaillot  wrote:
> On 06/02/2016 08:01 PM, Andrew Beekhof wrote:
>> On Fri, May 20, 2016 at 1:53 AM, Ken Gaillot  wrote:
>>> A recent thread discussed a proposed new feature, a new environment
>>> variable that would be passed to resource agents, indicating whether a
>>> stop action was part of a recovery.
>>>
>>> Since that thread was long and covered a lot of topics, I'm starting a
>>> new one to focus on the core issue remaining:
>>>
>>> The original idea was to pass the number of restarts remaining before
>>> the resource will no longer tried to be started on the same node. This
>>> involves calculating (fail-count - migration-threshold), and that
>>> implies certain limitations: (1) it will only be set when the cluster
>>> checks migration-threshold; (2) it will only be set for the failed
>>> resource itself, not for other resources that may be recovered due to
>>> dependencies on it.
>>>
>>> Ulrich Windl proposed an alternative: setting a boolean value instead. I
>>> forgot to cc the list on my reply, so I'll summarize now: We would set a
>>> new variable like OCF_RESKEY_CRM_recovery=true
>>
>> This concept worries me, especially when what we've implemented is
>> called OCF_RESKEY_CRM_restarting.
>
> Agreed; I plan to rename it yet again, to OCF_RESKEY_CRM_start_expected.
>
>> The name alone encourages people to "optimise" the agent to not
>> actually stop the service "because its just going to start again
>> shortly".  I know thats not what Adam would do, but not everyone
>> understands how clusters work.
>>
>> There are any number of reasons why a cluster that intends to restart
>> a service may not do so.  In such a scenario, a badly written agent
>> would cause the cluster to mistakenly believe that the service is
>> stopped - allowing it to start elsewhere.
>>
>> Its true there are any number of ways to write bad agents, but I would
>> argue that we shouldn't be nudging people in that direction :)
>
> I do have mixed feelings about that. I think if we name it
> start_expected, and document it carefully, we can avoid any casual mistakes.
>
> My main question is how useful would it actually be in the proposed use
> cases. Considering the possibility that the expected start might never
> happen (or fail), can an RA really do anything different if
> start_expected=true?

I would have thought not.  Correctness should trump optimal.
But I'm prepared to be mistaken.

> If the use case is there, I have no problem with
> adding it, but I want to make sure it's worthwhile.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?

2016-06-03 Thread Ken Gaillot
On 06/02/2016 08:01 PM, Andrew Beekhof wrote:
> On Fri, May 20, 2016 at 1:53 AM, Ken Gaillot  wrote:
>> A recent thread discussed a proposed new feature, a new environment
>> variable that would be passed to resource agents, indicating whether a
>> stop action was part of a recovery.
>>
>> Since that thread was long and covered a lot of topics, I'm starting a
>> new one to focus on the core issue remaining:
>>
>> The original idea was to pass the number of restarts remaining before
>> the resource will no longer tried to be started on the same node. This
>> involves calculating (fail-count - migration-threshold), and that
>> implies certain limitations: (1) it will only be set when the cluster
>> checks migration-threshold; (2) it will only be set for the failed
>> resource itself, not for other resources that may be recovered due to
>> dependencies on it.
>>
>> Ulrich Windl proposed an alternative: setting a boolean value instead. I
>> forgot to cc the list on my reply, so I'll summarize now: We would set a
>> new variable like OCF_RESKEY_CRM_recovery=true
> 
> This concept worries me, especially when what we've implemented is
> called OCF_RESKEY_CRM_restarting.

Agreed; I plan to rename it yet again, to OCF_RESKEY_CRM_start_expected.

> The name alone encourages people to "optimise" the agent to not
> actually stop the service "because its just going to start again
> shortly".  I know thats not what Adam would do, but not everyone
> understands how clusters work.
> 
> There are any number of reasons why a cluster that intends to restart
> a service may not do so.  In such a scenario, a badly written agent
> would cause the cluster to mistakenly believe that the service is
> stopped - allowing it to start elsewhere.
> 
> Its true there are any number of ways to write bad agents, but I would
> argue that we shouldn't be nudging people in that direction :)

I do have mixed feelings about that. I think if we name it
start_expected, and document it carefully, we can avoid any casual mistakes.

My main question is how useful would it actually be in the proposed use
cases. Considering the possibility that the expected start might never
happen (or fail), can an RA really do anything different if
start_expected=true? If the use case is there, I have no problem with
adding it, but I want to make sure it's worthwhile.

>> whenever a start is
>> scheduled after a stop on the same node in the same transition. This
>> would avoid the corner cases of the previous approach; instead of being
>> tied to migration-threshold, it would be set whenever a recovery was
>> being attempted, for any reason. And with this approach, it should be
>> easier to set the variable for all actions on the resource
>> (demote/stop/start/promote), rather than just the stop.
>>
>> I think the boolean approach fits all the envisioned use cases that have
>> been discussed. Any objections to going that route instead of the count?
>> --
>> Ken Gaillot 

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?

2016-06-02 Thread Andrew Beekhof
On Fri, May 20, 2016 at 1:53 AM, Ken Gaillot  wrote:
> A recent thread discussed a proposed new feature, a new environment
> variable that would be passed to resource agents, indicating whether a
> stop action was part of a recovery.
>
> Since that thread was long and covered a lot of topics, I'm starting a
> new one to focus on the core issue remaining:
>
> The original idea was to pass the number of restarts remaining before
> the resource will no longer tried to be started on the same node. This
> involves calculating (fail-count - migration-threshold), and that
> implies certain limitations: (1) it will only be set when the cluster
> checks migration-threshold; (2) it will only be set for the failed
> resource itself, not for other resources that may be recovered due to
> dependencies on it.
>
> Ulrich Windl proposed an alternative: setting a boolean value instead. I
> forgot to cc the list on my reply, so I'll summarize now: We would set a
> new variable like OCF_RESKEY_CRM_recovery=true

This concept worries me, especially when what we've implemented is
called OCF_RESKEY_CRM_restarting.

The name alone encourages people to "optimise" the agent to not
actually stop the service "because its just going to start again
shortly".  I know thats not what Adam would do, but not everyone
understands how clusters work.

There are any number of reasons why a cluster that intends to restart
a service may not do so.  In such a scenario, a badly written agent
would cause the cluster to mistakenly believe that the service is
stopped - allowing it to start elsewhere.

Its true there are any number of ways to write bad agents, but I would
argue that we shouldn't be nudging people in that direction :)

> whenever a start is
> scheduled after a stop on the same node in the same transition. This
> would avoid the corner cases of the previous approach; instead of being
> tied to migration-threshold, it would be set whenever a recovery was
> being attempted, for any reason. And with this approach, it should be
> easier to set the variable for all actions on the resource
> (demote/stop/start/promote), rather than just the stop.
>
> I think the boolean approach fits all the envisioned use cases that have
> been discussed. Any objections to going that route instead of the count?
> --
> Ken Gaillot 
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?

2016-05-20 Thread Adam Spiers
Ken Gaillot  wrote:
> A recent thread discussed a proposed new feature, a new environment
> variable that would be passed to resource agents, indicating whether a
> stop action was part of a recovery.
> 
> Since that thread was long and covered a lot of topics, I'm starting a
> new one to focus on the core issue remaining:
> 
> The original idea was to pass the number of restarts remaining before
> the resource will no longer tried to be started on the same node. This
> involves calculating (fail-count - migration-threshold), and that
> implies certain limitations: (1) it will only be set when the cluster
> checks migration-threshold; (2) it will only be set for the failed
> resource itself, not for other resources that may be recovered due to
> dependencies on it.
> 
> Ulrich Windl proposed an alternative: setting a boolean value instead. I
> forgot to cc the list on my reply, so I'll summarize now: We would set a
> new variable like OCF_RESKEY_CRM_recovery=true whenever a start is
> scheduled after a stop on the same node in the same transition. This
> would avoid the corner cases of the previous approach; instead of being
> tied to migration-threshold, it would be set whenever a recovery was
> being attempted, for any reason. And with this approach, it should be
> easier to set the variable for all actions on the resource
> (demote/stop/start/promote), rather than just the stop.
> 
> I think the boolean approach fits all the envisioned use cases that have
> been discussed. Any objections to going that route instead of the count?

I think that sounds fine to me.  Thanks!

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?

2016-05-19 Thread Jehan-Guillaume de Rorthais
Le Thu, 19 May 2016 13:15:20 -0500,
Ken Gaillot  a écrit :

> On 05/19/2016 11:43 AM, Jehan-Guillaume de Rorthais wrote:
>> Le Thu, 19 May 2016 10:53:31 -0500,
>> Ken Gaillot  a écrit :
>> 
>>> A recent thread discussed a proposed new feature, a new environment
>>> variable that would be passed to resource agents, indicating whether a
>>> stop action was part of a recovery.
>>>
>>> Since that thread was long and covered a lot of topics, I'm starting a
>>> new one to focus on the core issue remaining:
>>>
>>> The original idea was to pass the number of restarts remaining before
>>> the resource will no longer tried to be started on the same node. This
>>> involves calculating (fail-count - migration-threshold), and that
>>> implies certain limitations: (1) it will only be set when the cluster
>>> checks migration-threshold; (2) it will only be set for the failed
>>> resource itself, not for other resources that may be recovered due to
>>> dependencies on it.
>>>
>>> Ulrich Windl proposed an alternative: setting a boolean value instead. I
>>> forgot to cc the list on my reply, so I'll summarize now: We would set a
>>> new variable like OCF_RESKEY_CRM_recovery=true whenever a start is
>>> scheduled after a stop on the same node in the same transition. This
>>> would avoid the corner cases of the previous approach; instead of being
>>> tied to migration-threshold, it would be set whenever a recovery was
>>> being attempted, for any reason. And with this approach, it should be
>>> easier to set the variable for all actions on the resource
>>> (demote/stop/start/promote), rather than just the stop.
>> 
>> I can see the value of having such variable during various actions.
>> However, we can also deduce the transition is a recovering during the
>> notify actions with the notify variables (the only information we lack is
>> the order of the actions). A most flexible approach would be to make sure
>> the notify variables are always available during the whole transaction for
>> **all** actions, not just notify. It seems like it's already the case, but
>> a recent discussion emphase this is just a side effect of the current
>> implementation. I understand this as they were sometime available outside
>> of notification "by accident".
> 
> It does seem that a recovery could be implied from the
> notify_{start,stop}_uname variables, but notify variables are only set
> for clones that support the notify action. I think the goal here is to
> work with any resource type. Even for clones, if they don't otherwise
> need notifications, they'd have to add the overhead of notify calls on
> all instances, that would do nothing.

Exact, notify variables are only available for clones, presently. What I was
suggesting is that notify variables were always available, whatever the
resource is a clone, a ms or a standard one.

And I wasn't meaning notify *action* should be activated all the time for
all the resources. The notify switch for clones/ms could be kept to false by
default so the notify action is not called itself during the transitions.

> > Also, I can see the benefit of having the remaining attempt for the current
> > action before hitting the migration-threshold. I might misunderstand
> > something here, but it seems to me both informations are different. 
> 
> I think the use cases that have been mentioned would all be happy with
> just the boolean. Does anyone need the actual count, or just whether
> this is a stop-start vs a full stop?

I was thinking of a use case where a graceful demote or stop action failed
multiple times and to give a chance to the RA to choose another method to stop
the resource before it requires a migration. As instance, PostgreSQL has 3
different kind of stop, the last one being not graceful, but still better than
a kill -9.

> The problem with the migration-threshold approach is that there are
> recoveries that will be missed because they don't involve
> migration-threshold. If the count is really needed, the
> migration-threshold approach is necessary, but if recovery is the really
> interesting information, then a boolean would be more accurate.

I think I misunderstood the original use cases you try to achieve. It seems to
me we are talking about different a feature.

>> Basically, what we need is a better understanding of the transition itself
>> from the RA actions.
>> 
>> If you are still brainstorming on this, as a RA dev, what I would
>> suggest is:
>> 
>>   * provide and enforce the notify variables in all actions
>>   * add the actions order during the current transition to these variables
>> using eg. OCF_RESKEY_CRM_meta_notify_*_actionid
> 
> The action ID would be different for each node being acted on, so it
> would be more complicated (maybe *_actions="NODE1:ID1,NODE2:ID2,..."?).

Following the principle adopted for other variables, each ID would apply to the
corresponding resource and node in OCF_RESKEY_CRM_meta_notify_*_uname and

Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?

2016-05-19 Thread Jehan-Guillaume de Rorthais
Le Thu, 19 May 2016 10:53:31 -0500,
Ken Gaillot  a écrit :

> A recent thread discussed a proposed new feature, a new environment
> variable that would be passed to resource agents, indicating whether a
> stop action was part of a recovery.
> 
> Since that thread was long and covered a lot of topics, I'm starting a
> new one to focus on the core issue remaining:
> 
> The original idea was to pass the number of restarts remaining before
> the resource will no longer tried to be started on the same node. This
> involves calculating (fail-count - migration-threshold), and that
> implies certain limitations: (1) it will only be set when the cluster
> checks migration-threshold; (2) it will only be set for the failed
> resource itself, not for other resources that may be recovered due to
> dependencies on it.
> 
> Ulrich Windl proposed an alternative: setting a boolean value instead. I
> forgot to cc the list on my reply, so I'll summarize now: We would set a
> new variable like OCF_RESKEY_CRM_recovery=true whenever a start is
> scheduled after a stop on the same node in the same transition. This
> would avoid the corner cases of the previous approach; instead of being
> tied to migration-threshold, it would be set whenever a recovery was
> being attempted, for any reason. And with this approach, it should be
> easier to set the variable for all actions on the resource
> (demote/stop/start/promote), rather than just the stop.

I can see the value of having such variable during various actions. However, we
can also deduce the transition is a recovering during the notify actions with
the notify variables (the only information we lack is the order of the
actions). A most flexible approach would be to make sure the notify variables
are always available during the whole transaction for **all** actions, not just
notify. It seems like it's already the case, but a recent discussion emphase
this is just a side effect of the current implementation. I understand this as 
they were sometime available outside of notification "by accident".

Also, I can see the benefit of having the remaining attempt for the current
action before hitting the migration-threshold. I might misunderstand something
here, but it seems to me both informations are different. 

Basically, what we need is a better understanding of the transition itself
from the RA actions.

If you are still brainstorming on this, as a RA dev, what I would
suggest is:

  * provide and enforce the notify variables in all actions
  * add the actions order during the current transition to these variables using
eg. OCF_RESKEY_CRM_meta_notify_*_actionid
  * add a new variable with remaining action attempt before migration. This one
has the advantage to survive the transition breakage when a failure occurs.

On a second step, we would be able to provide some helper function in the
ocf_shellfuncs (and in my perl module equivalent) to compute if the transition
is a switchover, a failover, a recovery, etc, based on the notify variables.

Presently, I am detecting such scenarios directly in my RA during the notify
actions and tracking them as private attributes to be aware of the situation 
during the real actions (demote and stop). See:

https://github.com/dalibo/PAF/blob/952cb3cf2f03aad18fbeafe3a91f997a56c3b606/script/pgsqlms#L95

Regards,

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org