Evgeniy is right. For non-critical roles we continue deploy. List of critical roles for HA: https://github.com/stackforge/fuel-web/blob/master/nailgun/nailgun/orchestrator/deployment_serializers.py#L340
On Thu, Sep 11, 2014 at 4:02 PM, Evgeniy L <e...@mirantis.com> wrote: > Hi, > > >> Also, let's think and work on possible failures. What if Fuel Master > node goes off during patching? What is going to be affected? How we can > complete patching when Fuel Master comes back online? > > The question can be summarised as "What if you kill orchestrator during > the deployment?" > In this case user will get hung progress bar on UI until he removes task > from nailgun. > And I'm not sure if after that he will be able to continue deployment > without additional changes in db. > Actually the same questions related not only to patching, but to every > task which we run under orchestrator. > The reason for this is our architecture, orchestrator was designed as a > worker without persistent state. > But you need to keep somewhere the state in order to complete task after > failure. > As far as I understand Mistral can help as with this issue. > > >> Or compute node under patching breaks for some reason (e.g. disk > issues or memory), how would it affect the patching process? How we can > safely continue patching of other nodes? > > How it works now, Vladimir Sharshov, correct me if I'm wrong. > We use the same strategy as for deployment. > > Error during primary-controller patching - fail whole patching process > Error during patching of other roles - continue patching process > > And I'm not sure if current strategy is wrong or right. > On the one hand we shouldn't leave user's env in half patched state. > On the other hand we can break whole user's cluster because we ignore the > fact that several computes died during the patching procedure. > > Thanks, > > > On Tue, Sep 9, 2014 at 12:15 PM, Mike Scherbakov <mscherba...@mirantis.com > > wrote: > >> Folks, >> I was the one who initially requested this. I thought it's going to be >> pretty similar to Stop Deployment. I becomes obvious, that it is not. >> >> I'm fine if we have it in API. Though I think what is much more important >> here is an ability for the user to choose a few hosts for patching first, >> in order to check how patching would work on a very small part of the >> cluster. Ideally we would even move workloads to other nodes before doing >> patching. We should disable scheduling of workloads for sure for these >> experimental hosts. >> Then user can run patching against these nodes, and see how it goes. If >> all goes fine, patching can be applied to the rest of the environment. I do >> not think though that we should do all, let's say 100 nodes, at once. This >> sounds dangerous to me. I think we would need to come up with some less >> dangerous scenario. >> >> Also, let's think and work on possible failures. What if Fuel Master node >> goes off during patching? What is going to be affected? How we can complete >> patching when Fuel Master comes back online? >> >> Or compute node under patching breaks for some reason (e.g. disk issues >> or memory), how would it affect the patching process? How we can safely >> continue patching of other nodes? >> >> Thanks, >> >> On Tue, Sep 9, 2014 at 12:08 PM, Vladimir Kuklin <vkuk...@mirantis.com> >> wrote: >> >>> Sorry again. Look 2 messages below, please. >>> 09 сент. 2014 г. 12:06 пользователь "Vladimir Kuklin" < >>> vkuk...@mirantis.com> написал: >>> >>>> Sorry, hit reply instead of replyall. >>>> 09 сент. 2014 г. 12:05 пользователь "Vladimir Kuklin" < >>>> vkuk...@mirantis.com> написал: >>>> >>>>> +1 >>>>> >>>>> Also, I think, we should add stop patching at least to api in order to >>>>> allow advanced users and service team to do what they want. >>>>> 09 сент. 2014 г. 12:02 пользователь "Igor Kalnitsky" < >>>>> ikalnit...@mirantis.com> написал: >>>>> >>>>> What we should to do with nodes in case of interrupt patching? I think >>>>>> we need to mark them for re-deployment, since nodes' state may be >>>>>> broken. >>>>>> >>>>>> Any opinion? >>>>>> >>>>>> - Igor >>>>>> >>>>>> On Mon, Sep 8, 2014 at 3:28 PM, Evgeniy L <e...@mirantis.com> wrote: >>>>>> > Hi, >>>>>> > >>>>>> > We were working on implementation of experimental feature >>>>>> > where user could interrupt openstack patching procedure [1]. >>>>>> > >>>>>> > It's not as easy to implement as we thought it would be. >>>>>> > Current stop deployment mechanism [2] stops puppet, erases >>>>>> > nodes and reboots them into bootstrap. It's ok for stop >>>>>> > deployment, but it's not ok for patching, because user >>>>>> > can loose his data. We can rewrite some logic in nailgun >>>>>> > and in orchestrator to stop puppet and not to erase nodes. >>>>>> > But I'm not sure if it works correctly because such use >>>>>> > case wasn't tested. And I can see the problems like >>>>>> > yum/apt-get locks cleaning after puppet interruption. >>>>>> > >>>>>> > As result I have several questions: >>>>>> > 1. should we try to make it work for the current release? >>>>>> > 2. if we shouldn't, will we need this feature for the future >>>>>> > releases? Definitely additional design and research is >>>>>> > required. >>>>>> > >>>>>> > [1] https://bugs.launchpad.net/fuel/+bug/1364907 >>>>>> > [2] >>>>>> > >>>>>> https://github.com/stackforge/fuel-astute/blob/b622d9b36dbdd1e03b282b9ee5b7435ba649e711/lib/astute/server/dispatcher.rb#L163-L164 >>>>>> > >>>>>> > >>>>>> > -- >>>>>> > Mailing list: https://launchpad.net/~fuel-dev >>>>>> > Post to : fuel-dev@lists.launchpad.net >>>>>> > Unsubscribe : https://launchpad.net/~fuel-dev >>>>>> > More help : https://help.launchpad.net/ListHelp >>>>>> > >>>>>> >>>>>> -- >>>>>> Mailing list: https://launchpad.net/~fuel-dev >>>>>> Post to : fuel-dev@lists.launchpad.net >>>>>> Unsubscribe : https://launchpad.net/~fuel-dev >>>>>> More help : https://help.launchpad.net/ListHelp >>>>>> >>>>> >>> -- >>> Mailing list: https://launchpad.net/~fuel-dev >>> Post to : fuel-dev@lists.launchpad.net >>> Unsubscribe : https://launchpad.net/~fuel-dev >>> More help : https://help.launchpad.net/ListHelp >>> >>> >> >> >> -- >> Mike Scherbakov >> #mihgen >> >> >> -- >> Mailing list: https://launchpad.net/~fuel-dev >> Post to : fuel-dev@lists.launchpad.net >> Unsubscribe : https://launchpad.net/~fuel-dev >> More help : https://help.launchpad.net/ListHelp >> >> > > -- > Mailing list: https://launchpad.net/~fuel-dev > Post to : fuel-dev@lists.launchpad.net > Unsubscribe : https://launchpad.net/~fuel-dev > More help : https://help.launchpad.net/ListHelp > >
-- Mailing list: https://launchpad.net/~fuel-dev Post to : fuel-dev@lists.launchpad.net Unsubscribe : https://launchpad.net/~fuel-dev More help : https://help.launchpad.net/ListHelp