On 19/08/16 09:55, Anant Patil wrote:

    What I'm suggesting is very close to that:

    (1) stack-cancel-update <stack_id> will start another update using the
    previous template/environment. We'll start rolling back; in-progress
    resources will be allowed to complete normally.
    (2) stack-cancel-update <stack_id> --no-rollback will set the
    traversal_id to None so no further resources will be updated;
    in-progress resources will be allowed to complete normally.
    (3) resource-mark-unhealthy <stack_id> <resource_id> ... <resource_id>
    Kill any threads running a CREATE or UPDATE on the given resources, mark
    as CHECK_FAILED if they are not already in UPDATE_FAILED, don't do
    anything else. If the resource was in progress, the stack won't progress
    further, other resources currently in-progress will complete, and if
    rollback is enabled and no other traversal has started then it will roll
    back to the previous template/environment.

I have started implementation of the above three mechanisms. The first
two are implemented in https://review.openstack.org/#/c/357618

This looks great, thanks! That covers both our internal use of update-cancel and the current user API update-cancel nicely.

Note that the (2) needs a change in heat client (openstack client?) to
have a --no-rollback option.

Yeah, and also a (very minor) REST API change. I'd be in favour of trying to get this in before Newton FF, it'd be really useful to have.

(3) is a bit of long haul, and needs:
https://review.openstack.org/343076 : Adds mechanism to interrupt
convergence worker threads
https://review.openstack.org/301483 : Mechanism to send cancel message
and cancel worker upon receiving messages

Another thing I forgot is that when we delete a stack, we cancel all the threads working on it, so that any in-progress update/create used to be stopped (you're about to delete that stuff anyway, so you might as well not bother with anything else), and the lack of this functionality in convergence is causing problems for some users. It looks like this patch is intended to build on the previous two to resolve that:

https://review.openstack.org/#/c/354000/

(This is actually going to be much better than the old behaviour, because it turned out that cancelling threads was very much not the right thing to do, and it's much better to stop them at a yield point.)

So I think all of the above apart from the API/client change for (2) are going to be critical to land for Newton. (They're all in a sense bugs at the moment.)

Apart from the above two, I am implementing the actual patch which will
leverage the above two to complete resource-mark-unhealthy feature in
convergence.

Great! Hopefully people will rarely need this, but it'll be much more comfortable unleashing convergence on the world if we know that this exists as a circuit-breaker in case something does get stuck.

Let me know if I can help with any of this stuff without stepping on any toes (time zones unfortunately make it hard for you and I to co-ordinate). I'll at least try to circle back regularly to the reviews.

cheers,
Zane.

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to