Re: [openstack-dev] [Heat] Stack breakpoint

2014-03-18 Thread Zane Bitter

On 17/03/14 21:18, Mike Spreitzer wrote:

Zane Bitter zbit...@redhat.com wrote on 03/17/2014 07:03:25 PM:

  On 17/03/14 17:03, Ton Ngo wrote:

   - How to handle resources with timer, e.g. wait condition:
  pause/resume
   timer value
 
  Handle it by only allowing pauses before and after. In most cases I'm
  not sure what it would mean to pause _during_.

I'm not sure I follow this part.  If at some time a timer is started,
and the event(s) upon which it is waiting are delayed by hitting a
breakpoint and waiting for human interaction --- I think this is the
scenario that concerned Ton.  It seems to me the right answer is that
all downstream timers have to stop ticking between break and resume.


Perhaps this was too general. To be specific, there is exactly one 
resource with a timer* - a WaitCondition. A WaitCondition is usually 
configured to be dependent on the server that should trigger it. Nothing 
interesting happens while a WaitCondition is waiting, so there is no 
point allowing a break point in the middle. You would either set the 
breakpoint after the server has completed or before the WaitCondition 
starts (which amount to the same thing, assuming no other dependencies). 
You could, in theory, set a breakpoint after the WaitCondition complete, 
though the use case for that is less obvious. In any event, at no time 
is the stack paused _while_ the WaitCondition is running, and therefore 
no need to use anything but wallclock time to determine the timeout.


cheers,
Zane.

* Technically there is another: autoscaling groups during update with an 
UpdatePolicy specified... however these use a nested stack, and the 
solution here is to use this same feature within the nested stack to 
implement the delays rather than complicate things in the stack 
containing the autoscaling group resource.


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Heat] Stack breakpoint

2014-03-18 Thread Ton Ngo
The notify/callback mechanism seems like a good solution.  This should
enable creating a high level debugger for different DSL (HOT, Tosca, ...),
running as a separate process.  The debugger would attach to a stack,
present a logical model to the user and interact with the Heat engine.
This would be similar to the typical program debugger.   This mechanism
should also allow integrating with a process engine to handle human
interaction.

About the resource with timer, I was not sure if there are other resources
beside WaitCondition that contains a timer, so it's good to know that
currently only WaitCondition falls into this category.  My concern was the
scenario when the timer might get kicked off and then a resource that
should interact with the timer hits a breakpoint, but Zane's point is that
this is not possible for WaitCondition since there is supposed to be a
dependency on the resource.  Then to debug the scenario why did my
WaitCondition fails, the user would set a breakpoint before the
WaitCondition, or after any of the resources that the WaitCondition depends
on.   In this case, we would know that the timer will never get kicked off
because of the dependency.

Ton Ngo,



From:   Zane Bitter zbit...@redhat.com
To: openstack-dev@lists.openstack.org,
Date:   03/18/2014 09:45 AM
Subject:Re: [openstack-dev] [Heat] Stack breakpoint



On 17/03/14 21:18, Mike Spreitzer wrote:
 Zane Bitter zbit...@redhat.com wrote on 03/17/2014 07:03:25 PM:

   On 17/03/14 17:03, Ton Ngo wrote:

- How to handle resources with timer, e.g. wait condition:
   pause/resume
timer value
  
   Handle it by only allowing pauses before and after. In most cases I'm
   not sure what it would mean to pause _during_.

 I'm not sure I follow this part.  If at some time a timer is started,
 and the event(s) upon which it is waiting are delayed by hitting a
 breakpoint and waiting for human interaction --- I think this is the
 scenario that concerned Ton.  It seems to me the right answer is that
 all downstream timers have to stop ticking between break and resume.

Perhaps this was too general. To be specific, there is exactly one
resource with a timer* - a WaitCondition. A WaitCondition is usually
configured to be dependent on the server that should trigger it. Nothing
interesting happens while a WaitCondition is waiting, so there is no
point allowing a break point in the middle. You would either set the
breakpoint after the server has completed or before the WaitCondition
starts (which amount to the same thing, assuming no other dependencies).
You could, in theory, set a breakpoint after the WaitCondition complete,
though the use case for that is less obvious. In any event, at no time
is the stack paused _while_ the WaitCondition is running, and therefore
no need to use anything but wallclock time to determine the timeout.

cheers,
Zane.

* Technically there is another: autoscaling groups during update with an
UpdatePolicy specified... however these use a nested stack, and the
solution here is to use this same feature within the nested stack to
implement the delays rather than complicate things in the stack
containing the autoscaling group resource.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [Heat] Stack breakpoint

2014-03-17 Thread Ton Ngo
I would like to revisit with more details an idea that was mentioned in the
last design summit and hopefully get some feedback.

The scenario is troubleshooting a failed template.
Currently we can stop on the point of failure by disabling rollback:  this
works well for stack-create; stack-update requires some more work but
that's different thread.  In many cases however, the point of failure may
be too late or too hard to debug because the condition causing the failure
may not be obvious or may have been changed.  If we can pause the stack at
a point before the failure, then we can check whether the state of the
environment and the stack is what we expect.
The analogy with program debugging is breakpoint/step, so it may be useful
to introduce this same concept in a stack.

The usage would be something like:
-Run stack-create (or stack-update once it can handle failure) with one or
more resource name specified as breakpoint
-As the engine traverses down the dependency graph, it would stop at the
breakpoint resource and all dependent resources.  Other resources with no
dependency will proceed to completion.
-After debugging, continue the stack by:
-Stepping: remove current breakpoint, set breakpoint for next resource
(s) in dependency graph, resume stack-create (or stack-update)
-Running to completion: remove current breakpoint, resume stack-create
(or stack-update)

Some other possible uses for this breakpoint:
- While developing new template or resource type, bring up a stack to a
point before the new code is to be executed
- Introduce human process: pause the partial stack so the user can get the
stack info and perform some tasks before continuing

Some issues to consider (with initial feedback from shardy):
- Granularity of stepping:  resource level or internal steps within a
resource
- How to specify breakpoints:  CLI argument or coded in template or both
- How to handle resources with timer, e.g. wait condition:  pause/resume
timer value
- New state for a resource:  PAUSED

Thanks.

Ton Ngo,


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Heat] Stack breakpoint

2014-03-17 Thread Clint Byrum
Ton, could you repost this as a new thread? It has very little to do
with the referenced thread.

Excerpts from Ton Ngo's message of 2014-03-17 12:10:33 -0700:
 I would like to revisit with more details an idea that was mentioned in the
 last design summit and hopefully get some feedback.
 
 The scenario is troubleshooting a failed template.
 Currently we can stop on the point of failure by disabling rollback:  this
 works well for stack-create; stack-update requires some more work but
 that's different thread.  In many cases however, the point of failure may
 be too late or too hard to debug because the condition causing the failure
 may not be obvious or may have been changed.  If we can pause the stack at
 a point before the failure, then we can check whether the state of the
 environment and the stack is what we expect.
 The analogy with program debugging is breakpoint/step, so it may be useful
 to introduce this same concept in a stack.
 
 The usage would be something like:
 -Run stack-create (or stack-update once it can handle failure) with one or
 more resource name specified as breakpoint
 -As the engine traverses down the dependency graph, it would stop at the
 breakpoint resource and all dependent resources.  Other resources with no
 dependency will proceed to completion.
 -After debugging, continue the stack by:
 -Stepping: remove current breakpoint, set breakpoint for next resource
 (s) in dependency graph, resume stack-create (or stack-update)
 -Running to completion: remove current breakpoint, resume stack-create
 (or stack-update)
 
 Some other possible uses for this breakpoint:
 - While developing new template or resource type, bring up a stack to a
 point before the new code is to be executed
 - Introduce human process: pause the partial stack so the user can get the
 stack info and perform some tasks before continuing
 
 Some issues to consider (with initial feedback from shardy):
 - Granularity of stepping:  resource level or internal steps within a
 resource
 - How to specify breakpoints:  CLI argument or coded in template or both
 - How to handle resources with timer, e.g. wait condition:  pause/resume
 timer value
 - New state for a resource:  PAUSED
 
 Thanks.
 
 Ton Ngo,
 

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [Heat] Stack breakpoint

2014-03-17 Thread Ton Ngo

(reposting as new thread)

I would like to revisit with more details an idea that was mentioned in the
last design summit and hopefully get some feedback.

The scenario is troubleshooting a failed template.
Currently we can stop on the point of failure by disabling rollback:  this
works well for stack-create; stack-update requires some more work but
that's different thread.  In many cases however, the point of failure may
be too late or too hard to debug because the condition causing the failure
may not be obvious or may have been changed.  If we can pause the stack at
a point before the failure, then we can check whether the state of the
environment and the stack is what we expect.
The analogy with program debugging is breakpoint/step, so it may be useful
to introduce this same concept in a stack.

The usage would be something like:
-Run stack-create (or stack-update once it can handle failure) with one or
more resource name specified as breakpoint
-As the engine traverses down the dependency graph, it would stop at the
breakpoint resource and all dependent resources.  Other resources with no
dependency will proceed to completion.
-After debugging, continue the stack by:
-Stepping: remove current breakpoint, set breakpoint for next resource
(s) in dependency graph, resume stack-create (or stack-update)
-Running to completion: remove current breakpoint, resume stack-create
(or stack-update)

Some other possible uses for this breakpoint:
- While developing new template or resource type, bring up a stack to a
point before the new code is to be executed
- Introduce human process: pause the partial stack so the user can get the
stack info and perform some tasks before continuing

Some issues to consider (with some initial feedback from shardy):
- Granularity of stepping:  resource level or internal steps within a
resource
- How to specify breakpoints:  CLI argument or coded in template or both
- How to handle resources with timer, e.g. wait condition:  pause/resume
timer value
- New state for a resource:  PAUSED

Thanks.

Ton Ngo,


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Heat] Stack breakpoint

2014-03-17 Thread Zane Bitter

On 17/03/14 17:03, Ton Ngo wrote:


(reposting as new thread)

I would like to revisit with more details an idea that was mentioned in the
last design summit and hopefully get some feedback.

The scenario is troubleshooting a failed template.
Currently we can stop on the point of failure by disabling rollback:  this
works well for stack-create; stack-update requires some more work but
that's different thread.  In many cases however, the point of failure may
be too late or too hard to debug because the condition causing the failure
may not be obvious or may have been changed.  If we can pause the stack at
a point before the failure, then we can check whether the state of the
environment and the stack is what we expect.
The analogy with program debugging is breakpoint/step, so it may be useful
to introduce this same concept in a stack.

The usage would be something like:
-Run stack-create (or stack-update once it can handle failure) with one or
more resource name specified as breakpoint
-As the engine traverses down the dependency graph, it would stop at the
breakpoint resource and all dependent resources.  Other resources with no
dependency will proceed to completion.
-After debugging, continue the stack by:
 -Stepping: remove current breakpoint, set breakpoint for next resource
(s) in dependency graph, resume stack-create (or stack-update)
 -Running to completion: remove current breakpoint, resume stack-create
(or stack-update)

Some other possible uses for this breakpoint:
- While developing new template or resource type, bring up a stack to a
point before the new code is to be executed
- Introduce human process: pause the partial stack so the user can get the
stack info and perform some tasks before continuing


I would like to see this solved with some sort of notify/callback 
mechanism. There are a bunch of use cases which IMHO could all be solved 
with a single feature:


- Debugging template operations by pausing and stepping to allow a user 
to debug

- Adding a manual task into the stack creation process
- Automatically augmenting the stack creation process by inserting tasks 
into the workflow
- Providing a hook to the Autoscaling engine (when it is separated out 
into a separate process) to allow it to update the load-balancer (or, 
more generically, any shared resource) at the appropriate times

- Providing a hook for e.g. Trove to confirm resizes of Nova servers.

(I think I counted 6 use cases in a previous thread, and iirc that 
didn't include the debugging one.)


A feature that:
1) Optionally notifies the user before or after performing some 
operation on a resource, and
2) After sending such a notification, waits for confirmation before 
proceeding...
should be able to solve all of the use cases above. (This is my 
favourite kind of feature ;)


The details of how to implement that (in particular, what channel do you 
send notifications through?) are more tricky to figure out. There was 
some discussion already in the Rolling Updates spec re-written. RFC 
thread. Start here and keep going:


http://lists.openstack.org/pipermail/openstack-dev/2014-February/026329.html


Some issues to consider (with some initial feedback from shardy):
- Granularity of stepping:  resource level or internal steps within a
resource


Before and after a resource is processed, and at any logical steps 
during, such as the CONFIRM step when resizing a Nova server.



- How to specify breakpoints:  CLI argument or coded in template or both


I think I'd vote for both in the template and the environment.


- How to handle resources with timer, e.g. wait condition:  pause/resume
timer value


Handle it by only allowing pauses before and after. In most cases I'm 
not sure what it would mean to pause _during_.



- New state for a resource:  PAUSED


It's the workflow that's paused, not the resource, so I don't see the 
need for a new state.


cheers,
Zane.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Heat] Stack breakpoint

2014-03-17 Thread Mike Spreitzer
Zane Bitter zbit...@redhat.com wrote on 03/17/2014 07:03:25 PM:

 On 17/03/14 17:03, Ton Ngo wrote:

  - How to handle resources with timer, e.g. wait condition: 
pause/resume
  timer value
 
 Handle it by only allowing pauses before and after. In most cases I'm 
 not sure what it would mean to pause _during_.

I'm not sure I follow this part.  If at some time a timer is started, and 
the event(s) upon which it is waiting are delayed by hitting a breakpoint 
and waiting for human interaction --- I think this is the scenario that 
concerned Ton.  It seems to me the right answer is that all downstream 
timers have to stop ticking between break and resume.

Regards,
Mike___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev