Re: Re: [Linux-ha-dev] Adding reload to the OCF specification

2006-08-16 Thread Andrew Beekhof

On 8/16/06, Alan Robertson [EMAIL PROTECTED] wrote:

Andrew Beekhof wrote:
 On 8/15/06, Alan Robertson [EMAIL PROTECTED] wrote:
 Andrew Beekhof wrote:
  On 6/28/06, Lars Marowsky-Bree [EMAIL PROTECTED] wrote:
  On 2006-06-15T18:26:24, Lars Marowsky-Bree [EMAIL PROTECTED] wrote:
 
  Lack of disagreement implies agreement.
 
  Alan, how about we plan this as an enhancement for 2.0.7?
 
  Unlikely.
  This is a significant departure from the current design and a major
  piece of work.
  Essentially all the same problems from the migrate discussion apply.


 I don't quite follow that.

 The assumption here is that if you enable reload, that it has no visible
 negative effect on its dependent children.

 So, you sneak in and do it, and nothing else is effected.


 right - lmb helped me understand that further on in the thread.
 the problem is knowing what variables cause a reload and which ones
 cause a restart.
 to know that we need to change the lrm

Sorry I replied before reading the rest of the thread.  Mea culpa.

That information is present in the OCF specified metadata.


Its nowhere I can see in
http://www.opencf.org/cgi-bin/viewcvs.cgi/specs/ra/resource-agent-api.txt


 The LRM
certainly doesn't have any knowledge of XML that would tell it, nor can
it be determined without that information.


No-one is asking it to.  Please go back and read the email from July 4.
This is about not knowing what values changed, not about metadata.



I know you've resisted this so far -- but it's in that nasty metadata  ;-)

The metadata was clearly intended for something CRM-like to use.

It's a design decision that we've made that has so far kept the CRM from
reading it.

So, perhaps the CIB (through the GUI or user editing) should give you
some more hints.  But, IMHO, it's not an LRM issue.  The correct
information to determine this should be present in the CIB -- and not
determined at run time.

Or maybe I'm still confused (certainly possible).

--
Alan Robertson [EMAIL PROTECTED]

Openness is the foundation and preservative of friendship...  Let me
claim from you at all times your undisguised opinions. - William
Wilberforce
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Adding reload to the OCF specification

2006-08-15 Thread Alan Robertson
Andrew Beekhof wrote:
 On 6/28/06, Lars Marowsky-Bree [EMAIL PROTECTED] wrote:
 On 2006-06-15T18:26:24, Lars Marowsky-Bree [EMAIL PROTECTED] wrote:

 Lack of disagreement implies agreement.

 Alan, how about we plan this as an enhancement for 2.0.7?
 
 Unlikely.
 This is a significant departure from the current design and a major
 piece of work.
 Essentially all the same problems from the migrate discussion apply.


I don't quite follow that.

The assumption here is that if you enable reload, that it has no visible
negative effect on its dependent children.

So, you sneak in and do it, and nothing else is effected.


-- 
Alan Robertson [EMAIL PROTECTED]

Openness is the foundation and preservative of friendship...  Let me
claim from you at all times your undisguised opinions. - William
Wilberforce
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: Re: [Linux-ha-dev] Adding reload to the OCF specification

2006-08-15 Thread Andrew Beekhof

On 8/15/06, Alan Robertson [EMAIL PROTECTED] wrote:

Andrew Beekhof wrote:
 On 6/28/06, Lars Marowsky-Bree [EMAIL PROTECTED] wrote:
 On 2006-06-15T18:26:24, Lars Marowsky-Bree [EMAIL PROTECTED] wrote:

 Lack of disagreement implies agreement.

 Alan, how about we plan this as an enhancement for 2.0.7?

 Unlikely.
 This is a significant departure from the current design and a major
 piece of work.
 Essentially all the same problems from the migrate discussion apply.


I don't quite follow that.

The assumption here is that if you enable reload, that it has no visible
negative effect on its dependent children.

So, you sneak in and do it, and nothing else is effected.



right - lmb helped me understand that further on in the thread.
the problem is knowing what variables cause a reload and which ones
cause a restart.
to know that we need to change the lrm
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Adding reload to the OCF specification

2006-08-15 Thread Alan Robertson
Andrew Beekhof wrote:
 On 8/15/06, Alan Robertson [EMAIL PROTECTED] wrote:
 Andrew Beekhof wrote:
  On 6/28/06, Lars Marowsky-Bree [EMAIL PROTECTED] wrote:
  On 2006-06-15T18:26:24, Lars Marowsky-Bree [EMAIL PROTECTED] wrote:
 
  Lack of disagreement implies agreement.
 
  Alan, how about we plan this as an enhancement for 2.0.7?
 
  Unlikely.
  This is a significant departure from the current design and a major
  piece of work.
  Essentially all the same problems from the migrate discussion apply.


 I don't quite follow that.

 The assumption here is that if you enable reload, that it has no visible
 negative effect on its dependent children.

 So, you sneak in and do it, and nothing else is effected.

 
 right - lmb helped me understand that further on in the thread.
 the problem is knowing what variables cause a reload and which ones
 cause a restart.
 to know that we need to change the lrm

Sorry I replied before reading the rest of the thread.  Mea culpa.

That information is present in the OCF specified metadata.  The LRM
certainly doesn't have any knowledge of XML that would tell it, nor can
it be determined without that information.

I know you've resisted this so far -- but it's in that nasty metadata  ;-)

The metadata was clearly intended for something CRM-like to use.

It's a design decision that we've made that has so far kept the CRM from
reading it.

So, perhaps the CIB (through the GUI or user editing) should give you
some more hints.  But, IMHO, it's not an LRM issue.  The correct
information to determine this should be present in the CIB -- and not
determined at run time.

Or maybe I'm still confused (certainly possible).

-- 
Alan Robertson [EMAIL PROTECTED]

Openness is the foundation and preservative of friendship...  Let me
claim from you at all times your undisguised opinions. - William
Wilberforce
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Adding reload to the OCF specification

2006-07-03 Thread Lars Marowsky-Bree
On 2006-07-01T08:43:05, Andrew Beekhof [EMAIL PROTECTED] wrote:

 Lack of disagreement implies agreement.
 
 Alan, how about we plan this as an enhancement for 2.0.7?
 
 Unlikely.
 This is a significant departure from the current design and a major
 piece of work.

Well, ok. So not 2.0.7. ;-)

But, is it such a major departure from the current design?

We do notice already when the instance parameters have changed. So, for
some resources (or some instance parameters, which may be harder), we
know figure that this means we don't have to stop  start them, but that
we send them a reload with the new parameters. Which we can then use
from there on.

If it fails, we treat it like we treat any other failed op, namely: the
resource has failed.

The reload op is even tracked in the lrmd for us.

Of course it's different, but fundamentally so? I think not. I may be
wrong here.

 Essentially all the same problems from the migrate discussion apply.

Well, the migrate discussion is special in the sense that an op on one
node causes a resource to appear somewhere else, ie an op which crosses
node boundaries. That is, indeed, a major change.

I must have been dumb here, and missing something. What is it? ;-)

(If we really can't readily do it in the current design, it's something
to put on the plate for the PE rewrite at least, which I think should
happen for 2.0.[78] (within the next 3-9 months), given that most issues
we currently have seem to require it, from migrate to m/s resources
etc...)


Sincerely,
Lars Marowsky-Brée

-- 
High Availability  Clustering
SUSE Labs, Research and Development
SUSE LINUX Products GmbH - A Novell Business -- Charles Darwin
Ignorance more frequently begets confidence than does knowledge

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Adding reload to the OCF specification

2006-07-03 Thread Lars Marowsky-Bree
On 2006-07-03T16:16:20, Andrew Beekhof [EMAIL PROTECTED] wrote:

 think of a stack of resources and what happens when something in the
 middle changes... how we pull down the stack to the resource that
 changed and build it up again.
 
 thats very different to starting at the resource that changed and
 working upwards.
 
 of course if anything above you (logically) doesn't support reload
 then the PE has to revert to stop/start.  figuring that out isn't
 trivial nor is figuring out *when* to do the calculation.

No no no! The whole _point_ of reload is that _it does not affect
resources depending on it_. It is purely local to the given resource.

Only if the reload fails would we treat it as failed and the whole
process, but, we already know how to treat resources as failed ;-)


Sincerely,
Lars Marowsky-Brée

-- 
High Availability  Clustering
SUSE Labs, Research and Development
SUSE LINUX Products GmbH - A Novell Business -- Charles Darwin
Ignorance more frequently begets confidence than does knowledge

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: Re: [Linux-ha-dev] Adding reload to the OCF specification

2006-07-03 Thread Andrew Beekhof

On 7/3/06, Lars Marowsky-Bree [EMAIL PROTECTED] wrote:

On 2006-07-03T16:16:20, Andrew Beekhof [EMAIL PROTECTED] wrote:

 think of a stack of resources and what happens when something in the
 middle changes... how we pull down the stack to the resource that
 changed and build it up again.

 thats very different to starting at the resource that changed and
 working upwards.

 of course if anything above you (logically) doesn't support reload
 then the PE has to revert to stop/start.  figuring that out isn't
 trivial nor is figuring out *when* to do the calculation.

No no no! The whole _point_ of reload is that _it does not affect
resources depending on it_. It is purely local to the given resource.


to which i would respond how do we know that?

we store a hash of the parameters remember... we know something
changed but have no idea what nor if it was important... in which case
you would have to assume it was.


Only if the reload fails would we treat it as failed and the whole
process, but, we already know how to treat resources as failed ;-)

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Adding reload to the OCF specification

2006-07-03 Thread Lars Marowsky-Bree
On 2006-07-03T21:34:37, Andrew Beekhof [EMAIL PROTECTED] wrote:

 No no no! The whole _point_ of reload is that _it does not affect
 resources depending on it_. It is purely local to the given resource.
 to which i would respond how do we know that?
 
 we store a hash of the parameters remember... we know something
 changed but have no idea what nor if it was important... in which case
 you would have to assume it was.

Uhm, I don't want to be annoying, but did you actually read this thread?
;-)

First, one could assume that for a RA supporting reload, all changes
are reloadable - and put it into the admins care to not change
attributes which wouldn't be. (This could be done with a hint to the
GUI, but wouldn't require additional smarts in the PE.)

The second alternative was that even the CRM could take a look at, say,
a reloadable=(yes|no) flag for the instance parameter/meta
attribute(?) in question (defaults to no), and if it is set to yes,
store it in a different hash. (So we could always tell whether one of
the reloadable or one of the non-reloadable parameters changed.)

Both are reasonable to implement, _I think_. Not necessarily now, but
the ability to reduce perceived unnecessary restarts is also important.
I've been whapped on the head by SAP for our current modus operandi ;-)


(Tangent: Who have, in the same way, asked for a way to not apply
unnecessary churn in case several resources are modified one after
another. But, that get's us back into the GUI discussion, which we
should track separately too.)


Sincerely,
Lars Marowsky-Brée

-- 
High Availability  Clustering
SUSE Labs, Research and Development
SUSE LINUX Products GmbH - A Novell Business -- Charles Darwin
Ignorance more frequently begets confidence than does knowledge

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Adding reload to the OCF specification

2006-06-28 Thread Lars Marowsky-Bree
On 2006-06-15T18:26:24, Lars Marowsky-Bree [EMAIL PROTECTED] wrote:

Lack of disagreement implies agreement.

Alan, how about we plan this as an enhancement for 2.0.7?


Sincerely,
Lars Marowsky-Brée

-- 
High Availability  Clustering
SUSE Labs, Research and Development
SUSE LINUX Products GmbH - A Novell Business -- Charles Darwin
Ignorance more frequently begets confidence than does knowledge

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Adding reload to the OCF specification

2006-06-15 Thread Lars Marowsky-Bree
On 2006-06-15T08:56:00, Alan Robertson [EMAIL PROTECTED] wrote:

 Many LSB init scripts implement a 'reload' action which permits them to 
 reread their configurations without interrupting service.
 
 By design, OCF spec is upwards-compatible with the LSB.
 
 I think it would be good to specifically add the reload operation to the 
 OCF spec.
 
 Saying something like this:
 --
 The reload operation is an optional operation which can be supported by 
 OCF resource agents.  This operation will cause the resource to examine 
 its parameters and its configuration files, and continue running these 
 new configuration values, without interrupting service in a way which is 
 visible to resources which depend on it.
 
 If an OCF resource agent wishes to support the reload operation, it is 
 required to list it in the operations section of the metadata given by 
 the meta-data operation.
 
 Even though a resource supports a reload operation, a conforming cluster 
 manager need not make use of it.  Of course, if it does, then service 
 updates can be made with fewer service interruptions, so this is likely 
 to be seen as a desirable feature.
 --
 
 And as to whether there are resource agents which could make use of this 
 feature - the answer is yes.  I have a customer who would like such a 
 capability today.  At the moment, they go far out of their way to work 
 around not having it.
 
 As written, this is optional for both resource agents and cluster 
 managers, and it seems like a reasonable addition to the OCF spec.
 
 My guess is that this would be an easy addition for many cluster 
 managers to support.  Of course, nothing is impossible to he who doesn't 
 have to do it.


1. The administrator of course may not change the parameters so much
that the RA can no longer identify the already running resource
instance. (Do we need to provide hints in the meta data to identify
which parameters are safe to change and which are not?)

2. If reload fails, should the resource be treated as failed, or merely
the reload? (I'm inclined towards the first one, it's easier to code I
think ;-)


3.  You say without this being visible to the resources depending on
it, but, then, if it is not visible at _all_ (ie, _absolutely_ no
observable change _anywhere_), there wouldn't be a point in reloading
it, would there?

I think in the interest of purity  simplicity, this question could be
phrased as: If it doesn't have an impact on our view of the cluster,
why do we need to care - can't this be done outside our control then?

It's somewhat similar to the question as to whether or not we should
have a yellow monitor result: Yes, it would go some way to make the
cluster more of a general management tool, but it is also not strictly
necessary; the stance you took back then was that you didn't want that,
if I wasn't mistaken. We left this health monitoring to more dedicated
apps. Continuing that line of thought seems to suggest that we don't
need reload either.

I don't want to nit-pick here, but I'd like to understand whether your
thinking has changed in general here, in which case I'd like to re-raise
the point of the yellow warning monitor status too ;-) Or in what case
this feature is different?


Sincerely,
Lars Marowsky-Brée

-- 
High Availability  Clustering
SUSE Labs, Research and Development
SUSE LINUX Products GmbH - A Novell Business -- Charles Darwin
Ignorance more frequently begets confidence than does knowledge

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Adding reload to the OCF specification

2006-06-15 Thread Lars Marowsky-Bree
On 2006-06-15T09:58:37, Alan Robertson [EMAIL PROTECTED] wrote:

 1. The administrator of course may not change the parameters so much
 that the RA can no longer identify the already running resource
 instance. (Do we need to provide hints in the meta data to identify
 which parameters are safe to change and which are not?)
 
 Good question...
 
 One possible answer...
 If the RA can't identify if after they're changed, then it shouldn't 
 claim to support it ;-)

Ah, but, in theory, the instance parameters should be the minimal set to
identify and then manage the resource instance. So, any change there
does seem critical. In theory, the instance parameters which can be
changed are the exception...

(Maybe excluding some monitor options. But, if these were passed
exclusively as attributes to the monitor op, and not to the instance as
a whole, they'd only cause the monitor op to be restarted. In theory.)

I'm not saying these options don't exist, say, one could even remount a
Filesystem with different options (-o remount), w/o affecting anything
else. So yes, it would be useful to have, but we need more smarts than
we have right now.

 Another possible answer:
 IIRC, some of the RA parameters are marked unique in the metadata 
 indicating that together they uniquely identify the resource.  If you 
 changed any of those, then it would need to be stopped then restarted 
 instead.  [I'm aware that this would cause us (heartbeat/CRM) some more 
 difficulty in implementing it].

Well, the truth is that the CRM doesn't know about these at all.

(A) This might end up being a new attribute on_change=(reload|restart)
(to the instance parameter nvpair in the CIB) which defaults to
restart (and being set by the GUI).  Then, we could compute two
op-digests: one for those which require a restart (stop with _old_
options, start with the new ones), and a reload one (should we provide
both the old and new options here?), and we could easily spot what is
required from us.

(Note to Andrew: If there was a reload op in the lrm status section,
that would supercede the start op present there too, that should also
work and be quite simple to implement.)

Sorry, I headed down implementation detail lane for a bit, but that
might just work, don't you think?

 2. If reload fails, should the resource be treated as failed, or merely
 the reload? (I'm inclined towards the first one, it's easier to code I
 think ;-)
 
 I agree with your assessment.

OK.

Maybe we should allow a sorry, I'd have to restart the instance for
this exit code too though.

(To stick with the fs example, one can remount a filesystem with new
options most of the time, so we are good for attempting this - but it
couldn't remount r/o for example if something still had files open r/w.
It's not failed, it just didn't make the change and is running with the
old options... The GUI could then display this accordingly, allowing the
admin to then instigate a restart if he/she so chooses.)

 3.  You say without this being visible to the resources depending on
 it, but, then, if it is not visible at _all_ (ie, _absolutely_ no
 observable change _anywhere_), there wouldn't be a point in reloading
 it, would there?
 
 What I meant was without it requiring the resources depending on it to 
 be restarted.  That's a better phrase I think...  Of course, it has some 
 effect, but it shouldn't affect the dependent resources negatively.

Ok.

 I think in the interest of purity  simplicity, this question could be
 phrased as: If it doesn't have an impact on our view of the cluster,
 why do we need to care - can't this be done outside our control then?
 
 No. If it involves an RA parameter, those have to be updated - through - 
 the system.  That's the case for the customer I have in mind.  What 
 they're doing now would curl your hair - even as short as you keep your 
 hair ;-)

I imagine so, and I guess we'll have to acknowledge that RA parameters
are used beyond what would be pure from a design PoV. ;-)

 I don't want to nit-pick here, but I'd like to understand whether your
 thinking has changed in general here, in which case I'd like to re-raise
 the point of the yellow warning monitor status too ;-) Or in what case
 this feature is different?
 I don't see the connection between these two -- at all.

Well, the connection I was (and actually, still am) seeing is that they
both go beyond what is strictly necessary if all was pure - ie, if the
instance parameters _were_ the minimal set, this would likely not be
required. But, because people are using them to actually _configure_ the
resource instance, you win. (So, they are using it beyond mere fail-over
capabilities.)

Vice-versa, if people would only want to use us for strict ok / failed
monitoring, we wouldn't need a yellow state, but they are actually
using it to manage and monitor the cluster beyond that.

 One is 'I know I need to notify the RA that something has changed, but I 
 want to avoid a half-hour needed to restart