Re: Re: [Linux-ha-dev] Adding reload to the OCF specification
On 8/16/06, Alan Robertson [EMAIL PROTECTED] wrote: Andrew Beekhof wrote: On 8/15/06, Alan Robertson [EMAIL PROTECTED] wrote: Andrew Beekhof wrote: On 6/28/06, Lars Marowsky-Bree [EMAIL PROTECTED] wrote: On 2006-06-15T18:26:24, Lars Marowsky-Bree [EMAIL PROTECTED] wrote: Lack of disagreement implies agreement. Alan, how about we plan this as an enhancement for 2.0.7? Unlikely. This is a significant departure from the current design and a major piece of work. Essentially all the same problems from the migrate discussion apply. I don't quite follow that. The assumption here is that if you enable reload, that it has no visible negative effect on its dependent children. So, you sneak in and do it, and nothing else is effected. right - lmb helped me understand that further on in the thread. the problem is knowing what variables cause a reload and which ones cause a restart. to know that we need to change the lrm Sorry I replied before reading the rest of the thread. Mea culpa. That information is present in the OCF specified metadata. Its nowhere I can see in http://www.opencf.org/cgi-bin/viewcvs.cgi/specs/ra/resource-agent-api.txt The LRM certainly doesn't have any knowledge of XML that would tell it, nor can it be determined without that information. No-one is asking it to. Please go back and read the email from July 4. This is about not knowing what values changed, not about metadata. I know you've resisted this so far -- but it's in that nasty metadata ;-) The metadata was clearly intended for something CRM-like to use. It's a design decision that we've made that has so far kept the CRM from reading it. So, perhaps the CIB (through the GUI or user editing) should give you some more hints. But, IMHO, it's not an LRM issue. The correct information to determine this should be present in the CIB -- and not determined at run time. Or maybe I'm still confused (certainly possible). -- Alan Robertson [EMAIL PROTECTED] Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions. - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Adding reload to the OCF specification
Andrew Beekhof wrote: On 6/28/06, Lars Marowsky-Bree [EMAIL PROTECTED] wrote: On 2006-06-15T18:26:24, Lars Marowsky-Bree [EMAIL PROTECTED] wrote: Lack of disagreement implies agreement. Alan, how about we plan this as an enhancement for 2.0.7? Unlikely. This is a significant departure from the current design and a major piece of work. Essentially all the same problems from the migrate discussion apply. I don't quite follow that. The assumption here is that if you enable reload, that it has no visible negative effect on its dependent children. So, you sneak in and do it, and nothing else is effected. -- Alan Robertson [EMAIL PROTECTED] Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions. - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: Re: [Linux-ha-dev] Adding reload to the OCF specification
On 8/15/06, Alan Robertson [EMAIL PROTECTED] wrote: Andrew Beekhof wrote: On 6/28/06, Lars Marowsky-Bree [EMAIL PROTECTED] wrote: On 2006-06-15T18:26:24, Lars Marowsky-Bree [EMAIL PROTECTED] wrote: Lack of disagreement implies agreement. Alan, how about we plan this as an enhancement for 2.0.7? Unlikely. This is a significant departure from the current design and a major piece of work. Essentially all the same problems from the migrate discussion apply. I don't quite follow that. The assumption here is that if you enable reload, that it has no visible negative effect on its dependent children. So, you sneak in and do it, and nothing else is effected. right - lmb helped me understand that further on in the thread. the problem is knowing what variables cause a reload and which ones cause a restart. to know that we need to change the lrm ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Adding reload to the OCF specification
Andrew Beekhof wrote: On 8/15/06, Alan Robertson [EMAIL PROTECTED] wrote: Andrew Beekhof wrote: On 6/28/06, Lars Marowsky-Bree [EMAIL PROTECTED] wrote: On 2006-06-15T18:26:24, Lars Marowsky-Bree [EMAIL PROTECTED] wrote: Lack of disagreement implies agreement. Alan, how about we plan this as an enhancement for 2.0.7? Unlikely. This is a significant departure from the current design and a major piece of work. Essentially all the same problems from the migrate discussion apply. I don't quite follow that. The assumption here is that if you enable reload, that it has no visible negative effect on its dependent children. So, you sneak in and do it, and nothing else is effected. right - lmb helped me understand that further on in the thread. the problem is knowing what variables cause a reload and which ones cause a restart. to know that we need to change the lrm Sorry I replied before reading the rest of the thread. Mea culpa. That information is present in the OCF specified metadata. The LRM certainly doesn't have any knowledge of XML that would tell it, nor can it be determined without that information. I know you've resisted this so far -- but it's in that nasty metadata ;-) The metadata was clearly intended for something CRM-like to use. It's a design decision that we've made that has so far kept the CRM from reading it. So, perhaps the CIB (through the GUI or user editing) should give you some more hints. But, IMHO, it's not an LRM issue. The correct information to determine this should be present in the CIB -- and not determined at run time. Or maybe I'm still confused (certainly possible). -- Alan Robertson [EMAIL PROTECTED] Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions. - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Adding reload to the OCF specification
On 2006-07-01T08:43:05, Andrew Beekhof [EMAIL PROTECTED] wrote: Lack of disagreement implies agreement. Alan, how about we plan this as an enhancement for 2.0.7? Unlikely. This is a significant departure from the current design and a major piece of work. Well, ok. So not 2.0.7. ;-) But, is it such a major departure from the current design? We do notice already when the instance parameters have changed. So, for some resources (or some instance parameters, which may be harder), we know figure that this means we don't have to stop start them, but that we send them a reload with the new parameters. Which we can then use from there on. If it fails, we treat it like we treat any other failed op, namely: the resource has failed. The reload op is even tracked in the lrmd for us. Of course it's different, but fundamentally so? I think not. I may be wrong here. Essentially all the same problems from the migrate discussion apply. Well, the migrate discussion is special in the sense that an op on one node causes a resource to appear somewhere else, ie an op which crosses node boundaries. That is, indeed, a major change. I must have been dumb here, and missing something. What is it? ;-) (If we really can't readily do it in the current design, it's something to put on the plate for the PE rewrite at least, which I think should happen for 2.0.[78] (within the next 3-9 months), given that most issues we currently have seem to require it, from migrate to m/s resources etc...) Sincerely, Lars Marowsky-Brée -- High Availability Clustering SUSE Labs, Research and Development SUSE LINUX Products GmbH - A Novell Business -- Charles Darwin Ignorance more frequently begets confidence than does knowledge ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Adding reload to the OCF specification
On 2006-07-03T16:16:20, Andrew Beekhof [EMAIL PROTECTED] wrote: think of a stack of resources and what happens when something in the middle changes... how we pull down the stack to the resource that changed and build it up again. thats very different to starting at the resource that changed and working upwards. of course if anything above you (logically) doesn't support reload then the PE has to revert to stop/start. figuring that out isn't trivial nor is figuring out *when* to do the calculation. No no no! The whole _point_ of reload is that _it does not affect resources depending on it_. It is purely local to the given resource. Only if the reload fails would we treat it as failed and the whole process, but, we already know how to treat resources as failed ;-) Sincerely, Lars Marowsky-Brée -- High Availability Clustering SUSE Labs, Research and Development SUSE LINUX Products GmbH - A Novell Business -- Charles Darwin Ignorance more frequently begets confidence than does knowledge ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: Re: [Linux-ha-dev] Adding reload to the OCF specification
On 7/3/06, Lars Marowsky-Bree [EMAIL PROTECTED] wrote: On 2006-07-03T16:16:20, Andrew Beekhof [EMAIL PROTECTED] wrote: think of a stack of resources and what happens when something in the middle changes... how we pull down the stack to the resource that changed and build it up again. thats very different to starting at the resource that changed and working upwards. of course if anything above you (logically) doesn't support reload then the PE has to revert to stop/start. figuring that out isn't trivial nor is figuring out *when* to do the calculation. No no no! The whole _point_ of reload is that _it does not affect resources depending on it_. It is purely local to the given resource. to which i would respond how do we know that? we store a hash of the parameters remember... we know something changed but have no idea what nor if it was important... in which case you would have to assume it was. Only if the reload fails would we treat it as failed and the whole process, but, we already know how to treat resources as failed ;-) ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Adding reload to the OCF specification
On 2006-07-03T21:34:37, Andrew Beekhof [EMAIL PROTECTED] wrote: No no no! The whole _point_ of reload is that _it does not affect resources depending on it_. It is purely local to the given resource. to which i would respond how do we know that? we store a hash of the parameters remember... we know something changed but have no idea what nor if it was important... in which case you would have to assume it was. Uhm, I don't want to be annoying, but did you actually read this thread? ;-) First, one could assume that for a RA supporting reload, all changes are reloadable - and put it into the admins care to not change attributes which wouldn't be. (This could be done with a hint to the GUI, but wouldn't require additional smarts in the PE.) The second alternative was that even the CRM could take a look at, say, a reloadable=(yes|no) flag for the instance parameter/meta attribute(?) in question (defaults to no), and if it is set to yes, store it in a different hash. (So we could always tell whether one of the reloadable or one of the non-reloadable parameters changed.) Both are reasonable to implement, _I think_. Not necessarily now, but the ability to reduce perceived unnecessary restarts is also important. I've been whapped on the head by SAP for our current modus operandi ;-) (Tangent: Who have, in the same way, asked for a way to not apply unnecessary churn in case several resources are modified one after another. But, that get's us back into the GUI discussion, which we should track separately too.) Sincerely, Lars Marowsky-Brée -- High Availability Clustering SUSE Labs, Research and Development SUSE LINUX Products GmbH - A Novell Business -- Charles Darwin Ignorance more frequently begets confidence than does knowledge ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Adding reload to the OCF specification
On 2006-06-15T18:26:24, Lars Marowsky-Bree [EMAIL PROTECTED] wrote: Lack of disagreement implies agreement. Alan, how about we plan this as an enhancement for 2.0.7? Sincerely, Lars Marowsky-Brée -- High Availability Clustering SUSE Labs, Research and Development SUSE LINUX Products GmbH - A Novell Business -- Charles Darwin Ignorance more frequently begets confidence than does knowledge ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Adding reload to the OCF specification
On 2006-06-15T08:56:00, Alan Robertson [EMAIL PROTECTED] wrote: Many LSB init scripts implement a 'reload' action which permits them to reread their configurations without interrupting service. By design, OCF spec is upwards-compatible with the LSB. I think it would be good to specifically add the reload operation to the OCF spec. Saying something like this: -- The reload operation is an optional operation which can be supported by OCF resource agents. This operation will cause the resource to examine its parameters and its configuration files, and continue running these new configuration values, without interrupting service in a way which is visible to resources which depend on it. If an OCF resource agent wishes to support the reload operation, it is required to list it in the operations section of the metadata given by the meta-data operation. Even though a resource supports a reload operation, a conforming cluster manager need not make use of it. Of course, if it does, then service updates can be made with fewer service interruptions, so this is likely to be seen as a desirable feature. -- And as to whether there are resource agents which could make use of this feature - the answer is yes. I have a customer who would like such a capability today. At the moment, they go far out of their way to work around not having it. As written, this is optional for both resource agents and cluster managers, and it seems like a reasonable addition to the OCF spec. My guess is that this would be an easy addition for many cluster managers to support. Of course, nothing is impossible to he who doesn't have to do it. 1. The administrator of course may not change the parameters so much that the RA can no longer identify the already running resource instance. (Do we need to provide hints in the meta data to identify which parameters are safe to change and which are not?) 2. If reload fails, should the resource be treated as failed, or merely the reload? (I'm inclined towards the first one, it's easier to code I think ;-) 3. You say without this being visible to the resources depending on it, but, then, if it is not visible at _all_ (ie, _absolutely_ no observable change _anywhere_), there wouldn't be a point in reloading it, would there? I think in the interest of purity simplicity, this question could be phrased as: If it doesn't have an impact on our view of the cluster, why do we need to care - can't this be done outside our control then? It's somewhat similar to the question as to whether or not we should have a yellow monitor result: Yes, it would go some way to make the cluster more of a general management tool, but it is also not strictly necessary; the stance you took back then was that you didn't want that, if I wasn't mistaken. We left this health monitoring to more dedicated apps. Continuing that line of thought seems to suggest that we don't need reload either. I don't want to nit-pick here, but I'd like to understand whether your thinking has changed in general here, in which case I'd like to re-raise the point of the yellow warning monitor status too ;-) Or in what case this feature is different? Sincerely, Lars Marowsky-Brée -- High Availability Clustering SUSE Labs, Research and Development SUSE LINUX Products GmbH - A Novell Business -- Charles Darwin Ignorance more frequently begets confidence than does knowledge ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Adding reload to the OCF specification
On 2006-06-15T09:58:37, Alan Robertson [EMAIL PROTECTED] wrote: 1. The administrator of course may not change the parameters so much that the RA can no longer identify the already running resource instance. (Do we need to provide hints in the meta data to identify which parameters are safe to change and which are not?) Good question... One possible answer... If the RA can't identify if after they're changed, then it shouldn't claim to support it ;-) Ah, but, in theory, the instance parameters should be the minimal set to identify and then manage the resource instance. So, any change there does seem critical. In theory, the instance parameters which can be changed are the exception... (Maybe excluding some monitor options. But, if these were passed exclusively as attributes to the monitor op, and not to the instance as a whole, they'd only cause the monitor op to be restarted. In theory.) I'm not saying these options don't exist, say, one could even remount a Filesystem with different options (-o remount), w/o affecting anything else. So yes, it would be useful to have, but we need more smarts than we have right now. Another possible answer: IIRC, some of the RA parameters are marked unique in the metadata indicating that together they uniquely identify the resource. If you changed any of those, then it would need to be stopped then restarted instead. [I'm aware that this would cause us (heartbeat/CRM) some more difficulty in implementing it]. Well, the truth is that the CRM doesn't know about these at all. (A) This might end up being a new attribute on_change=(reload|restart) (to the instance parameter nvpair in the CIB) which defaults to restart (and being set by the GUI). Then, we could compute two op-digests: one for those which require a restart (stop with _old_ options, start with the new ones), and a reload one (should we provide both the old and new options here?), and we could easily spot what is required from us. (Note to Andrew: If there was a reload op in the lrm status section, that would supercede the start op present there too, that should also work and be quite simple to implement.) Sorry, I headed down implementation detail lane for a bit, but that might just work, don't you think? 2. If reload fails, should the resource be treated as failed, or merely the reload? (I'm inclined towards the first one, it's easier to code I think ;-) I agree with your assessment. OK. Maybe we should allow a sorry, I'd have to restart the instance for this exit code too though. (To stick with the fs example, one can remount a filesystem with new options most of the time, so we are good for attempting this - but it couldn't remount r/o for example if something still had files open r/w. It's not failed, it just didn't make the change and is running with the old options... The GUI could then display this accordingly, allowing the admin to then instigate a restart if he/she so chooses.) 3. You say without this being visible to the resources depending on it, but, then, if it is not visible at _all_ (ie, _absolutely_ no observable change _anywhere_), there wouldn't be a point in reloading it, would there? What I meant was without it requiring the resources depending on it to be restarted. That's a better phrase I think... Of course, it has some effect, but it shouldn't affect the dependent resources negatively. Ok. I think in the interest of purity simplicity, this question could be phrased as: If it doesn't have an impact on our view of the cluster, why do we need to care - can't this be done outside our control then? No. If it involves an RA parameter, those have to be updated - through - the system. That's the case for the customer I have in mind. What they're doing now would curl your hair - even as short as you keep your hair ;-) I imagine so, and I guess we'll have to acknowledge that RA parameters are used beyond what would be pure from a design PoV. ;-) I don't want to nit-pick here, but I'd like to understand whether your thinking has changed in general here, in which case I'd like to re-raise the point of the yellow warning monitor status too ;-) Or in what case this feature is different? I don't see the connection between these two -- at all. Well, the connection I was (and actually, still am) seeing is that they both go beyond what is strictly necessary if all was pure - ie, if the instance parameters _were_ the minimal set, this would likely not be required. But, because people are using them to actually _configure_ the resource instance, you win. (So, they are using it beyond mere fail-over capabilities.) Vice-versa, if people would only want to use us for strict ok / failed monitoring, we wouldn't need a yellow state, but they are actually using it to manage and monitor the cluster beyond that. One is 'I know I need to notify the RA that something has changed, but I want to avoid a half-hour needed to restart