Re: [openstack-dev] [Heat] [TripleO] Rolling updates spec re-written. RFC

Steven Dake Wed, 05 Feb 2014 07:51:42 -0800

On 02/04/2014 06:34 PM, Robert Collins wrote:

On 5 February 2014 13:14, Zane Bitter <zbit...@redhat.com> wrote:

That's not a great example, because one DB server depends on the other,
forcing them into updating serially anyway.

I have to say that even in general, this whole idea about applying update
policies to non-grouped resources doesn't make a whole lot of sense to me.
For non-grouped resources you control the resource definitions individually
- if you don't want them to update at a particular time, you have the option
of just not updating them.

Well, I don't particularly like the idea of doing thousands of
discrete heat stack-update calls, which would seem to be what you're
proposing.

On groups: autoscale groups are a problem for secure minded
deployments because every server has identical resources (today) and
we very much want discrete credentials per server - at least this is
my understanding of the reason we're not using scaling groups in
TripleO.

Where you _do_ need it is for scaling groups where every server is based on
the same launch config, so you need a way to control the members
individually - by batching up operations (done), adding delays (done) or,
even better, notifications and callbacks.

So it seems like doing 'rolling' updates for any random subset of resources
is effectively turning Heat into something of a poor-man's workflow service,
and IMHO that is probably a mistake.

I mean to reply to the other thread, but here is just as good :) -
heat as a way to describe the intended state, and heat takes care of
transitions, is a brilliant model. It absolutely implies a bunch of
workflows - the AWS update policy is probably the key example.

Being able to gracefully, *automatically* work through a transition
between two defined states, allowing the nodes in question to take
care of their own needs along the way seems like a pretty core
function to fit inside Heat itself. Its not at all the same as 'allow
users to define abitrary workflows'.

-Rob

Rob,

I'm not precisely certain what your proposing, but I think we need totake care not to turn the Heat DSL into a full-fledged programminglanguage. IMO thousands of updates done through heat is a perfect wayfor a third party service to do such things - eg control workflow.Clearly there is a workflow gap in OpenStack, and possibly that thingdoing the thousands of updates should be a workflow service, rather thenTripleO, but workflow is out of scope for Heat proper. Such a workflowservice could potentially fit in the Orchestration program alongsideHeat and Autoscaling. It is too bad there isn't a workflow servicealready because we are getting alot of pressure to make Heat fill thisgap. I personally believe filling this gap with heat would be a mistakeand the correct course of action would be for a workflow service toemerge to fill this need (and depend on Heat for orchestration).

I believe this may be what Zane is reacting to; I believe the Heatcommunity would like to avoid making the DSL more programmable becausethen it is harder to use and support. The parameters,resources,outputsDSL objects are difficult enough for new folks to pick up and its only 3things to understand...


Regards
-steve

What we do need for all resources (not just scaling groups) is a way for the
user to say "for this particular resource, notify me when it has updated
(but, if possible, before we have taken any destructive actions on it), give
me a chance to test it and accept or reject the update". For example, when
you resize a server, give the user a chance to confirm or reject the change
at the VERIFY_RESIZE step (Trove requires this). Or when you replace a
server during an update, give the user a chance to test the new server and
either keep it (continue on and delete the old one) or not (roll back). Or
when you replace a server in a scaling group, notify the load balancer _or
some other thing_ (e.g. OpenShift broker node) that a replacement has been
created and wait for it to switch over to the new one before deleting the
old one. Or, of course, when you update a server to some new config, give
the user a chance to test it out and make sure it works before continuing
with the stack update. All of these use cases can, I think, be solved with a
single feature.

The open questions for me are:
1) How do we notify the user that it's time to check on a resource?
(Marconi?)

This is the graceful update stuff I referred to in my mail to Clint -
the proposal from hallway discussions in HK was to do this by
notifying the server itself (that way we don't create a centralised
point of fail). I can see though that in a general sense not all
resources are servers. But - how about allowing to specify where to
notify (and notifing is always by setting a value in metadata
somewhere) - users can then pull that out themselves however they want
to. Adding push notifications is orthogonal IMO - we'd like that for
all metadata changes, for instance.

2) How does the user ack/nack? (You're suggesting reusing WaitCondition, and
that makes sense to me.)

The server would use a WaitCondition yes.

3) How do we break up the operations so the notification occurs at the right
time? (With difficulty, but it should be do-able.)

Just wrap the existing operations - if <should notify> then:
notify-wait-do, otherwise just do.

4) How does the user indicate for which resources they want to be notified?
(Inside an update_policy? Another new directive at the
type/properties/depends_on/update_policy level?)

I would say per resource.

-Rob



_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Heat] [TripleO] Rolling updates spec re-written. RFC

Reply via email to