Re: [openstack-dev] [Heat] Stack breakpoint
On 17/03/14 21:18, Mike Spreitzer wrote: Zane Bitter zbit...@redhat.com wrote on 03/17/2014 07:03:25 PM: On 17/03/14 17:03, Ton Ngo wrote: - How to handle resources with timer, e.g. wait condition: pause/resume timer value Handle it by only allowing pauses before and after. In most cases I'm not sure what it would mean to pause _during_. I'm not sure I follow this part. If at some time a timer is started, and the event(s) upon which it is waiting are delayed by hitting a breakpoint and waiting for human interaction --- I think this is the scenario that concerned Ton. It seems to me the right answer is that all downstream timers have to stop ticking between break and resume. Perhaps this was too general. To be specific, there is exactly one resource with a timer* - a WaitCondition. A WaitCondition is usually configured to be dependent on the server that should trigger it. Nothing interesting happens while a WaitCondition is waiting, so there is no point allowing a break point in the middle. You would either set the breakpoint after the server has completed or before the WaitCondition starts (which amount to the same thing, assuming no other dependencies). You could, in theory, set a breakpoint after the WaitCondition complete, though the use case for that is less obvious. In any event, at no time is the stack paused _while_ the WaitCondition is running, and therefore no need to use anything but wallclock time to determine the timeout. cheers, Zane. * Technically there is another: autoscaling groups during update with an UpdatePolicy specified... however these use a nested stack, and the solution here is to use this same feature within the nested stack to implement the delays rather than complicate things in the stack containing the autoscaling group resource. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Heat] Stack breakpoint
The notify/callback mechanism seems like a good solution. This should enable creating a high level debugger for different DSL (HOT, Tosca, ...), running as a separate process. The debugger would attach to a stack, present a logical model to the user and interact with the Heat engine. This would be similar to the typical program debugger. This mechanism should also allow integrating with a process engine to handle human interaction. About the resource with timer, I was not sure if there are other resources beside WaitCondition that contains a timer, so it's good to know that currently only WaitCondition falls into this category. My concern was the scenario when the timer might get kicked off and then a resource that should interact with the timer hits a breakpoint, but Zane's point is that this is not possible for WaitCondition since there is supposed to be a dependency on the resource. Then to debug the scenario why did my WaitCondition fails, the user would set a breakpoint before the WaitCondition, or after any of the resources that the WaitCondition depends on. In this case, we would know that the timer will never get kicked off because of the dependency. Ton Ngo, From: Zane Bitter zbit...@redhat.com To: openstack-dev@lists.openstack.org, Date: 03/18/2014 09:45 AM Subject:Re: [openstack-dev] [Heat] Stack breakpoint On 17/03/14 21:18, Mike Spreitzer wrote: Zane Bitter zbit...@redhat.com wrote on 03/17/2014 07:03:25 PM: On 17/03/14 17:03, Ton Ngo wrote: - How to handle resources with timer, e.g. wait condition: pause/resume timer value Handle it by only allowing pauses before and after. In most cases I'm not sure what it would mean to pause _during_. I'm not sure I follow this part. If at some time a timer is started, and the event(s) upon which it is waiting are delayed by hitting a breakpoint and waiting for human interaction --- I think this is the scenario that concerned Ton. It seems to me the right answer is that all downstream timers have to stop ticking between break and resume. Perhaps this was too general. To be specific, there is exactly one resource with a timer* - a WaitCondition. A WaitCondition is usually configured to be dependent on the server that should trigger it. Nothing interesting happens while a WaitCondition is waiting, so there is no point allowing a break point in the middle. You would either set the breakpoint after the server has completed or before the WaitCondition starts (which amount to the same thing, assuming no other dependencies). You could, in theory, set a breakpoint after the WaitCondition complete, though the use case for that is less obvious. In any event, at no time is the stack paused _while_ the WaitCondition is running, and therefore no need to use anything but wallclock time to determine the timeout. cheers, Zane. * Technically there is another: autoscaling groups during update with an UpdatePolicy specified... however these use a nested stack, and the solution here is to use this same feature within the nested stack to implement the delays rather than complicate things in the stack containing the autoscaling group resource. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [Heat] Stack breakpoint
I would like to revisit with more details an idea that was mentioned in the last design summit and hopefully get some feedback. The scenario is troubleshooting a failed template. Currently we can stop on the point of failure by disabling rollback: this works well for stack-create; stack-update requires some more work but that's different thread. In many cases however, the point of failure may be too late or too hard to debug because the condition causing the failure may not be obvious or may have been changed. If we can pause the stack at a point before the failure, then we can check whether the state of the environment and the stack is what we expect. The analogy with program debugging is breakpoint/step, so it may be useful to introduce this same concept in a stack. The usage would be something like: -Run stack-create (or stack-update once it can handle failure) with one or more resource name specified as breakpoint -As the engine traverses down the dependency graph, it would stop at the breakpoint resource and all dependent resources. Other resources with no dependency will proceed to completion. -After debugging, continue the stack by: -Stepping: remove current breakpoint, set breakpoint for next resource (s) in dependency graph, resume stack-create (or stack-update) -Running to completion: remove current breakpoint, resume stack-create (or stack-update) Some other possible uses for this breakpoint: - While developing new template or resource type, bring up a stack to a point before the new code is to be executed - Introduce human process: pause the partial stack so the user can get the stack info and perform some tasks before continuing Some issues to consider (with initial feedback from shardy): - Granularity of stepping: resource level or internal steps within a resource - How to specify breakpoints: CLI argument or coded in template or both - How to handle resources with timer, e.g. wait condition: pause/resume timer value - New state for a resource: PAUSED Thanks. Ton Ngo, ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Heat] Stack breakpoint
Ton, could you repost this as a new thread? It has very little to do with the referenced thread. Excerpts from Ton Ngo's message of 2014-03-17 12:10:33 -0700: I would like to revisit with more details an idea that was mentioned in the last design summit and hopefully get some feedback. The scenario is troubleshooting a failed template. Currently we can stop on the point of failure by disabling rollback: this works well for stack-create; stack-update requires some more work but that's different thread. In many cases however, the point of failure may be too late or too hard to debug because the condition causing the failure may not be obvious or may have been changed. If we can pause the stack at a point before the failure, then we can check whether the state of the environment and the stack is what we expect. The analogy with program debugging is breakpoint/step, so it may be useful to introduce this same concept in a stack. The usage would be something like: -Run stack-create (or stack-update once it can handle failure) with one or more resource name specified as breakpoint -As the engine traverses down the dependency graph, it would stop at the breakpoint resource and all dependent resources. Other resources with no dependency will proceed to completion. -After debugging, continue the stack by: -Stepping: remove current breakpoint, set breakpoint for next resource (s) in dependency graph, resume stack-create (or stack-update) -Running to completion: remove current breakpoint, resume stack-create (or stack-update) Some other possible uses for this breakpoint: - While developing new template or resource type, bring up a stack to a point before the new code is to be executed - Introduce human process: pause the partial stack so the user can get the stack info and perform some tasks before continuing Some issues to consider (with initial feedback from shardy): - Granularity of stepping: resource level or internal steps within a resource - How to specify breakpoints: CLI argument or coded in template or both - How to handle resources with timer, e.g. wait condition: pause/resume timer value - New state for a resource: PAUSED Thanks. Ton Ngo, ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [Heat] Stack breakpoint
(reposting as new thread) I would like to revisit with more details an idea that was mentioned in the last design summit and hopefully get some feedback. The scenario is troubleshooting a failed template. Currently we can stop on the point of failure by disabling rollback: this works well for stack-create; stack-update requires some more work but that's different thread. In many cases however, the point of failure may be too late or too hard to debug because the condition causing the failure may not be obvious or may have been changed. If we can pause the stack at a point before the failure, then we can check whether the state of the environment and the stack is what we expect. The analogy with program debugging is breakpoint/step, so it may be useful to introduce this same concept in a stack. The usage would be something like: -Run stack-create (or stack-update once it can handle failure) with one or more resource name specified as breakpoint -As the engine traverses down the dependency graph, it would stop at the breakpoint resource and all dependent resources. Other resources with no dependency will proceed to completion. -After debugging, continue the stack by: -Stepping: remove current breakpoint, set breakpoint for next resource (s) in dependency graph, resume stack-create (or stack-update) -Running to completion: remove current breakpoint, resume stack-create (or stack-update) Some other possible uses for this breakpoint: - While developing new template or resource type, bring up a stack to a point before the new code is to be executed - Introduce human process: pause the partial stack so the user can get the stack info and perform some tasks before continuing Some issues to consider (with some initial feedback from shardy): - Granularity of stepping: resource level or internal steps within a resource - How to specify breakpoints: CLI argument or coded in template or both - How to handle resources with timer, e.g. wait condition: pause/resume timer value - New state for a resource: PAUSED Thanks. Ton Ngo, ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Heat] Stack breakpoint
On 17/03/14 17:03, Ton Ngo wrote: (reposting as new thread) I would like to revisit with more details an idea that was mentioned in the last design summit and hopefully get some feedback. The scenario is troubleshooting a failed template. Currently we can stop on the point of failure by disabling rollback: this works well for stack-create; stack-update requires some more work but that's different thread. In many cases however, the point of failure may be too late or too hard to debug because the condition causing the failure may not be obvious or may have been changed. If we can pause the stack at a point before the failure, then we can check whether the state of the environment and the stack is what we expect. The analogy with program debugging is breakpoint/step, so it may be useful to introduce this same concept in a stack. The usage would be something like: -Run stack-create (or stack-update once it can handle failure) with one or more resource name specified as breakpoint -As the engine traverses down the dependency graph, it would stop at the breakpoint resource and all dependent resources. Other resources with no dependency will proceed to completion. -After debugging, continue the stack by: -Stepping: remove current breakpoint, set breakpoint for next resource (s) in dependency graph, resume stack-create (or stack-update) -Running to completion: remove current breakpoint, resume stack-create (or stack-update) Some other possible uses for this breakpoint: - While developing new template or resource type, bring up a stack to a point before the new code is to be executed - Introduce human process: pause the partial stack so the user can get the stack info and perform some tasks before continuing I would like to see this solved with some sort of notify/callback mechanism. There are a bunch of use cases which IMHO could all be solved with a single feature: - Debugging template operations by pausing and stepping to allow a user to debug - Adding a manual task into the stack creation process - Automatically augmenting the stack creation process by inserting tasks into the workflow - Providing a hook to the Autoscaling engine (when it is separated out into a separate process) to allow it to update the load-balancer (or, more generically, any shared resource) at the appropriate times - Providing a hook for e.g. Trove to confirm resizes of Nova servers. (I think I counted 6 use cases in a previous thread, and iirc that didn't include the debugging one.) A feature that: 1) Optionally notifies the user before or after performing some operation on a resource, and 2) After sending such a notification, waits for confirmation before proceeding... should be able to solve all of the use cases above. (This is my favourite kind of feature ;) The details of how to implement that (in particular, what channel do you send notifications through?) are more tricky to figure out. There was some discussion already in the Rolling Updates spec re-written. RFC thread. Start here and keep going: http://lists.openstack.org/pipermail/openstack-dev/2014-February/026329.html Some issues to consider (with some initial feedback from shardy): - Granularity of stepping: resource level or internal steps within a resource Before and after a resource is processed, and at any logical steps during, such as the CONFIRM step when resizing a Nova server. - How to specify breakpoints: CLI argument or coded in template or both I think I'd vote for both in the template and the environment. - How to handle resources with timer, e.g. wait condition: pause/resume timer value Handle it by only allowing pauses before and after. In most cases I'm not sure what it would mean to pause _during_. - New state for a resource: PAUSED It's the workflow that's paused, not the resource, so I don't see the need for a new state. cheers, Zane. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Heat] Stack breakpoint
Zane Bitter zbit...@redhat.com wrote on 03/17/2014 07:03:25 PM: On 17/03/14 17:03, Ton Ngo wrote: - How to handle resources with timer, e.g. wait condition: pause/resume timer value Handle it by only allowing pauses before and after. In most cases I'm not sure what it would mean to pause _during_. I'm not sure I follow this part. If at some time a timer is started, and the event(s) upon which it is waiting are delayed by hitting a breakpoint and waiting for human interaction --- I think this is the scenario that concerned Ton. It seems to me the right answer is that all downstream timers have to stop ticking between break and resume. Regards, Mike___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev