Re: [Linux-ha-dev] Adding "reload" to the OCF specification

Lars Marowsky-Bree Thu, 15 Jun 2006 09:22:48 -0700

On 2006-06-15T09:58:37, Alan Robertson <[EMAIL PROTECTED]> wrote:

> >1. The administrator of course may not change the parameters so much
> >that the RA can no longer identify the already running resource
> >instance. (Do we need to provide hints in the meta data to identify
> >which parameters are safe to change and which are not?)
> 
> Good question...
> 
> One possible answer...
> If the RA can't identify if after they're changed, then it shouldn't 
> claim to support it ;-)


Ah, but, in theory, the instance parameters should be the minimal set to
identify and then manage the resource instance. So, any change there
does seem critical. In theory, the instance parameters which can be
changed are the exception...

(Maybe excluding some monitor options. But, if these were passed
exclusively as attributes to the monitor op, and not to the instance as
a whole, they'd only cause the monitor op to be restarted. In theory.)

I'm not saying these options don't exist, say, one could even remount a
Filesystem with different options (-o remount), w/o affecting anything
else. So yes, it would be useful to have, but we need more smarts than
we have right now.

> Another possible answer:
> IIRC, some of the RA parameters are marked <unique> in the metadata 
> indicating that together they uniquely identify the resource.  If you 
> changed any of those, then it would need to be stopped then restarted 
> instead.  [I'm aware that this would cause us (heartbeat/CRM) some more 
> difficulty in implementing it].

Well, the truth is that the CRM doesn't know about these at all.

(A) This might end up being a new attribute "on_change=(reload|restart)"
(to the instance parameter nvpair in the CIB) which defaults to
"restart" (and being set by the GUI).  Then, we could compute two
op-digests: one for those which require a "restart" (stop with _old_
options, start with the new ones), and a "reload" one (should we provide
both the old and new options here?), and we could easily spot what is
required from us.

(Note to Andrew: If there was a "reload" op in the lrm status section,
that would supercede the "start" op present there too, that should also
work and be quite simple to implement.)

Sorry, I headed down implementation detail lane for a bit, but that
might just work, don't you think?

> >2. If reload fails, should the resource be treated as failed, or merely
> >the reload? (I'm inclined towards the first one, it's easier to code I
> >think ;-)
> 
> I agree with your assessment.

OK.

Maybe we should allow a "sorry, I'd have to restart the instance for
this" exit code too though.

(To stick with the fs example, one can remount a filesystem with new
options most of the time, so we are good for attempting this - but it
couldn't remount r/o for example if something still had files open r/w.
It's not failed, it just didn't make the change and is running with the
old options... The GUI could then display this accordingly, allowing the
admin to then instigate a restart if he/she so chooses.)

> >3.  You say "without this being visible to the resources depending on
> >it", but, then, if it is not visible at _all_ (ie, _absolutely_ no
> >observable change _anywhere_), there wouldn't be a point in reloading
> >it, would there?
> 
> What I meant was without it requiring the resources depending on it to 
> be restarted.  That's a better phrase I think...  Of course, it has some 
> effect, but it shouldn't affect the dependent resources negatively.

Ok.

> >I think in the interest of purity & simplicity, this question could be
> >phrased as: "If it doesn't have an impact on our view of the cluster,
> >why do we need to care - can't this be done outside our control then?"
> 
> No. If it involves an RA parameter, those have to be updated - through - 
> the system.  That's the case for the customer I have in mind.  What 
> they're doing now would curl your hair - even as short as you keep your 
> hair ;-)

I imagine so, and I guess we'll have to acknowledge that RA parameters
are used beyond what would be pure from a design PoV. ;-)

> >I don't want to nit-pick here, but I'd like to understand whether your
> >thinking has changed in general here, in which case I'd like to re-raise
> >the point of the yellow "warning" monitor status too ;-) Or in what case
> >this feature is different?
> I don't see the connection between these two -- at all.

Well, the connection I was (and actually, still am) seeing is that they
both go beyond what is strictly necessary if all was pure - ie, if the
instance parameters _were_ the minimal set, this would likely not be
required. But, because people are using them to actually _configure_ the
resource instance, you win. (So, they are using it beyond mere fail-over
capabilities.)

Vice-versa, if people would only want to use us for strict "ok / failed"
monitoring, we wouldn't need a "yellow" state, but they are actually
using it to manage and monitor the cluster beyond that.

> One is 'I know I need to notify the RA that something has changed, but I 
> want to avoid a half-hour needed to restart the resource and everything 
> which depends on it'.   This is a real issue.  With real impacts, and a 
> clear real usefulness.  And, it's done at an administrator's request.

Ok.

> This proposal should be considered on its own merits, not by false 
> association with some other proposal.  That's a form of guilt by (false) 
> association.

I'm sorry if it came across as such. The link is a bit abstract, though.

And yes, I think it would be useful to have for these scenarios. Just
trying to figure out the best approach, the relation to the spirit of
what we already have and whether we can actually implement it. ;-)

But, I think the answers to all of these seems affirmative, so why don't
we plan that for 2.0.7 or .8? (I'd like to see Xen migrations first,
though. Oh, and of course Andrew might have thoughts _and_ opinions
about this too ;-)


Sincerely,
    Lars Marowsky-Brée

-- 
High Availability & Clustering
SUSE Labs, Research and Development
SUSE LINUX Products GmbH - A Novell Business     -- Charles Darwin
"Ignorance more frequently begets confidence than does knowledge"

_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] Adding "reload" to the OCF specification

Reply via email to