[Openstack-operators] New OpenStack project for rolling maintenance and upgrade in interaction with application on top of it

2018-05-29 Thread Juvonen, Tomi (Nokia - FI/Espoo)
Hi,

I am the PTL of the OPNFV Doctor project.

I have been working for a couple of years figuring out the infrastructure 
maintenance in interaction with application on top of it. Looked into Nova, 
Craton and had several Ops sessions. Past half a year there has been couple of 
different POCs, the last in March in the ONS [1] [2]

In OpenStack Vancouver summit last week it was time to present [3]. In Forum 
discussion following the presentation it was whether to make this just by 
utilizing different existing projects, but to make this generic, pluggable, 
easily adapted and future proof, it now goes down to start what I almost 
started a couple of years ago; the OpenStack Fenix project [4].

On behalf of OPNFV Doctor I would welcome any last thoughts before starting the 
project and would also love to see somebody joining to make the Fenix fly.

Main use cases to list most of them:
*   As a cloud admin I want to maintain and upgrade my infrastructure in a 
rolling fashion.
*   As a cloud admin I want to have a pluggable workflow to maintain and 
upgrade my infrastructure, to ensure it can be done with complicated 
infrastructure components and in interaction with different application 
payloads on top of it.
*   As a infrastructure service, I need to know whether infrastructure 
unavailability is because of planned maintenance.
*   As a critical application owner, I want to be aware of any planned 
downtime effecting to my service.
*   As a critical application owner, I want to have interaction with 
infrastructure rolling maintenance workflow to have a time window to ensure 
zero down time for my service and to be able to decide to make admin actions 
like migration of my instance.
*   As an application owner, I need to know when admin action like 
migration is complete.
*   As an application owner, I want to know about new capabilities coming 
because of infrastructure maintenance or upgrade, so I can take it also into 
use by my application. This could be hardware capability or for example 
OpenStack upgrade.
*   As a critical application that needs to scale by varying load, I need 
to interactively know about infrastructure resources scaling up and down, so I 
can scale my application at the same and keeping zero downtime for my service
*   As a critical application, I want to have retirement of my service done 
in controlled fashion.

[1] Infrastructure Maintenance & Upgrade: Zero VNF Downtime with OPNFV Doctor 
on OCP Hardware video
[2] Infrastructure Maintenance & Upgrade: Zero VNF Downtime with OPNFV Doctor 
on OCP Hardware 
slides
[3] How to gain VNF zero down-time during Infrastructure Maintenance and 
Upgrade
[4] Fenix project wiki
[5] Doctor design guideline 
draft


Best Regards,
Tomi Juvonen



___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova] Does anyone use the TypeAffinityFilter?

2017-04-10 Thread Juvonen, Tomi (Nokia - FI/Espoo)
Thanks Matt,

Affinity and anti-affinity for an instance sure are basic Telco requirements.
Instead of TypeAffinityFilter one is able to use the server-groups
(ServerGroupAffinityFilter and ServerGroupAntiAffinityFilter). Just that you can
only define server-group for an instance when creating a server, so it causes
problems when the TypeAffinityFilter is deprecated and you happened to use it.
Already existing instances are properly placed, but if the server-group
information is not added to them, the new instances using server-groups instead
might land to hypervisor that is against the existing instance placement
according to TypeAffinityFilter.

I have internally checked that we should not be using TypeAffinityFilter.

Br,
Tomi
> -Original Message-
> From: Matt Riedemann [mailto:mriede...@gmail.com]
> Sent: Friday, April 07, 2017 6:21 AM
> To: openstack-operators@lists.openstack.org
> Subject: Re: [Openstack-operators] [nova] Does anyone use the
> TypeAffinityFilter?
> 
> On 4/6/2017 7:34 PM, Matt Riedemann wrote:
> > While working on trying to trim some RPC traffic between compute nodes
> > and the scheduler [1] I came across the TypeAffinityFilter which relies
> > on the instance.instance_type_id field, which is the original flavor.id
> > (primary key) that the instance was created with on a given host. The
> > idea being if I have an instance with type 20 on a host, then I can't
> > schedule another host with type 20 on it.
> 
> Oops, "then I can't schedule another *instance* with type 20 on it (the
> same host)".
> 
> >
> > The issue with this is that flavors can't be updated, they have to be
> > deleted and recreated. This is why we're changing the flavor
> > representation in the server response details in Pike [2] because the
> > instance.instance_type_id can point to a flavor that no longer exists,
> > so you can't look up the details on the flavor that was used to create a
> > given instance via the API (you could figure it out in the database, but
> > that's no fun).
> >
> > So the big question is, does anyone use this filter and if so, have you
> > already hit the issue described here and if so, how are you working
> > around it? If no one is using it, I'd like to deprecate it.
> >
> > [1]
> > https://blueprints.launchpad.net/nova/+spec/put-host-manager-instance-
> info-on-a-diet
> >
> > [2]
> > https://specs.openstack.org/openstack/nova-
> specs/specs/pike/approved/instance-flavor-api.html
> >
> >
> 
> 
> --
> 
> Thanks,
> 
> Matt
> 
> ___
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] Milan Ops MidCycle - Inventory and Fleet Management

2017-03-14 Thread Juvonen, Tomi (Nokia - FI/Espoo)
Here is a draft of the etherpad for the session in Thursday afternoon. Please 
add anything you want to discuss, so I can prepare accordingly.

https://etherpad.openstack.org/p/MIL-ops-inventory-and-fleet-management

Thanks,
Tomi

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [openstack-operators][telecom-nfv] Milan Meetup

2017-02-15 Thread Juvonen, Tomi (Nokia - FI/Espoo)
Hi,


I have been working with "planned host maintenance" as it is in our 
requirements in the OPNFV Doctor project. I have discussed the feature with 
Nova and Ops in general trough sessions in Austin and Barcelona summits to have 
wider scope of requirements and the way forward.

Now as moving to implement, it is about time to see this also in "telecom-nfv" 
session.

Hope we get a good session in Milan and thanks Shintaro if you can moderate it.

Br,
Tomi

> -Original Message-
> From: Shintaro Mizuno [mailto:mizuno.shint...@lab.ntt.co.jp]
> Sent: Thursday, February 16, 2017 2:36 AM
> To: openstack-operators@lists.openstack.org
> Subject: Re: [Openstack-operators] [openstack-operators][telecom-nfv] Milan
> Meetup
> 
> Hi Curtis,
> 
> Yes, I will be there and love to moderate the session if there are
> enough people interested in the topic.
> I've put my name on the moderator list.
> 
> Best,
> Shintaro
> 
> On 2017/02/15 23:45, Curtis wrote:
> > Hi,
> >
> > I just wanted to note that I won't be able to go to the Milan meetup,
> > just took my name off the moderators list. Apologies, just bad timing.
> > Getting to Milan is hard from Edmonton, I have to snowshoe to Red Deer
> > to take a dogsled to Calgary. ;)
> >
> > In our telecom-nfv meeting last week we had people from the LCOO and
> > Massively Distributed group, and there was no one in that meeting that
> > was attending either, to the best of my knowledge. Perhaps that will
> > change by the meetup.
> >
> > However, I see a fair number of +1s and a mention of the "planned
> > maintenance" feature in the etherpad, so there may very well be enough
> > people with NFV responsibilities at the meetup for a good NFV session,
> > I'm not sure. I believe Shintaro will be there and is usually involved
> > in NFV related items. :)
> >
> > I just wanted to let you all know, as I have not been able to make the
> > weekly operators meetings.
> >
> > Thanks,
> > Curtis.
> >
> > ___
> > OpenStack-operators mailing list
> > OpenStack-operators@lists.openstack.org
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
> >
> >
> 
> 
> --
> Shintaro MIZUNO (水野伸太郎)
> NTT Software Innovation Center
> TEL: 0422-59-4977
> E-mail: mizuno.shint...@lab.ntt.co.jp
> 
> 
> 
> ___
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] host maintenance

2016-10-12 Thread Juvonen, Tomi (Nokia - FI/Espoo)
Hi,

Had the session in Austin summit for the maintenance:
https://etherpad.openstack.org/p/AUS-ops-Nova-maint

Now the discussion have gotten to a point that should start prototyping a 
service hosting the maintenance. For maintenance Nova could have a link to this 
new service, but no functionality for the maintenance should be placed in Nova 
project. Was working to have this, but now looking better to have the prototype 
first:
https://review.openstack.org/310510/

>From the discussion over above review, the new service might have maintenance 
>API connection point that links to host by utilizing "hostid" used in Nova and 
>then there should be "tenant _id" specific end point to get what needed by 
>project. Something like:
http://maintenancethingy/maintenance/{hostid}
http://maintenancethingy/maintenance/{hostid}/{tenant_id}
This will ensure tenant will not know details about host, but can get needed 
information about maintenance effecting to his instances.

In Telco/NFV side we have OPNFV Doctor project that sets the requirements for 
this from that direction. I am personally interested in that part, but to have 
this to serve all operator requirements, it is best to bring this here.

This could be further discussed in Barcelona and should get other people 
interested to help starting with this. Any suggestion for the Ops session?

Looking forward,
Tomi


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [HA] RFC: user story including hypervisor reservation / host maintenance / storage AZs / event history (fwd)

2016-06-29 Thread Juvonen, Tomi (Nokia - FI/Espoo)
Thanks again for help and comments Adam.

I need to look those other discussions you have linked here. Will take some
time as going on a holiday on Friday and coming back in august.

Meanwhile begin to think that having this new field in Nova would really be
just for maintenance and maybe no need for the URL to something external. Any
external tool could anyhow consume the notification and further logic could
be inside the tool. Downside is as Nova team anyhow did not want a big change
for this, "just one new field", it is not that usable for different maintenance
state information. Some different "states" one might need:
- Maintenance window (begin time - end time: if end time missing, the HW is not
coming back. This is needed if VM would be left on host during maintenance)
- In maintenance (visible to VMs left on host)
- Test (only operator can use this host after maintenance to test it works.
Needs new "MaintenanceModeFilter" for this purpose)

Ok, looking these "3 states", 2 could be reserved words that one can expect:
- In maintenance
- Test
For normal running situation we would know that there is no value, but for
"maintenance  window" it could be tricky. Also would one want to tell more
details about this, meaning it would be behind some URL. Then one might need
to know difference between not maintained and maintained system. To launch
VM to maintained or not maintained system. As a some kind of state that
might be ugly as running versioning number and not convenient if again some
"MaintenanceModeFilter" would need to map to that.

Need to continue to find the best solution. Discuss also with nova guys and
in review when back from holiday.

Br,
Tomi

> -Original Message-
> From: Adam Spiers [mailto:aspi...@suse.com]
> Sent: Tuesday, June 28, 2016 6:42 PM
> To: Juvonen, Tomi (Nokia - FI/Espoo) 
> Cc: openstack-operators mailing list  operat...@lists.openstack.org>
> Subject: Re: [Openstack-operators] [HA] RFC: user story including
> hypervisor reservation / host maintenance / storage AZs / event history
> (fwd)
> 
> Juvonen, Tomi (Nokia - FI/Espoo)  wrote:
> > Thank you very much from the interest. Need to look over other
> > discussion and perhaps have a session in Barcelona to look the
> > way forward after change in Nova.
> 
> Indeed, sounds good!
> 
> > > -Original Message-----
> > > From: Adam Spiers [mailto:aspi...@suse.com]
> > > Sent: Monday, June 20, 2016 4:43 PM
> > > To: Juvonen, Tomi (Nokia - FI/Espoo) 
> > > Cc: openstack-operators mailing list  > > operat...@lists.openstack.org>
> > > Subject: Re: [Openstack-operators] [HA] RFC: user story including
> > > hypervisor reservation / host maintenance / storage AZs / event history
> > > (fwd)
> > >
> > > Hi Tomi,
> > >
> > > Juvonen, Tomi (Nokia - FI/Espoo)  wrote:
> > > > I'm working in the OPNFV Doctor project that is about fault
> > > > management and maintenance (NFV). The goal of the project is to
> > > > build fault management and maintenance framework for high
> > > > availability of Network Services on top of virtualized
> > > > infrastructure.
> > > >
> > > > https://wiki.opnfv.org/display/doctor
> > > >
> > > > Currently there is already landed effort to OpenStack to have
> > > > ability to detect failures fast, change states in OpenStack (Nova),
> > > > add state information that was missing and also to expose that to
> > > > owner of a VM. Also alarm is triggered. By all this one can now rely
> > > > the states and get notice about faults in a split second. Surely
> > > > with system configured monitor different faults and make actions
> > > > based configured policies, or leave some actions for consumers of
> > > > the alarms risen.
> > >
> > > Sounds very interesting - thanks.  Does this really have to be limited
> > > to OPNFV though?  It sounds like it would be very useful within
> > > OpenStack generally.
> > Surely not just for OPNFV, but for all operators.
> 
> Right - so why is it part of the OPNFV project?  That gives the
> impression that it would only be usable in NFV contexts.
> 
> > If playing with the idea
> > of having link to some external tool to have more than
> > "host_maintenance_reason", like it now would seem some more generic
> > "host_details", where one could have external REST API to call to have
> any
> > wanted host specific details that one would like to expose also to
> > tenant/owner of server.
> 
> Sounds like you are talking about some kind 

Re: [Openstack-operators] [HA] RFC: user story including hypervisor reservation / host maintenance / storage AZs / event history (fwd)

2016-06-27 Thread Juvonen, Tomi (Nokia - FI/Espoo)
Thank you very much from the interest. Need to look over other
discussion and perhaps have a session in Barcelona to look the
way forward after change in Nova.
> -Original Message-
> From: Adam Spiers [mailto:aspi...@suse.com]
> Sent: Monday, June 20, 2016 4:43 PM
> To: Juvonen, Tomi (Nokia - FI/Espoo) 
> Cc: openstack-operators mailing list  operat...@lists.openstack.org>
> Subject: Re: [Openstack-operators] [HA] RFC: user story including
> hypervisor reservation / host maintenance / storage AZs / event history
> (fwd)
> 
> Hi Tomi,
> 
> Juvonen, Tomi (Nokia - FI/Espoo)  wrote:
> > I'm working in the OPNFV Doctor project that is about fault
> > management and maintenance (NFV). The goal of the project is to
> > build fault management and maintenance framework for high
> > availability of Network Services on top of virtualized
> > infrastructure.
> >
> > https://wiki.opnfv.org/display/doctor
> >
> > Currently there is already landed effort to OpenStack to have
> > ability to detect failures fast, change states in OpenStack (Nova),
> > add state information that was missing and also to expose that to
> > owner of a VM. Also alarm is triggered. By all this one can now rely
> > the states and get notice about faults in a split second. Surely
> > with system configured monitor different faults and make actions
> > based configured policies, or leave some actions for consumers of
> > the alarms risen.
> 
> Sounds very interesting - thanks.  Does this really have to be limited
> to OPNFV though?  It sounds like it would be very useful within
> OpenStack generally.
Surely not just for OPNFV, but for all operators. If playing with the idea
of having link to some external tool to have more than 
"host_maintenance_reason", like it now would seem some more generic
"host_details", where one could have external REST API to call to have any
wanted host specific details that one would like to expose also to
tenant/owner of server. If having that tool it could also have maintenance
or host failure specific scenarios implemented. Could have admin to do 
things manually, or configure tool VNF / instance specifically to do some
actions.. OPNFV use case here is just the more specific maintenance state
to begin with, but who knows what one might want to implement there at the
end. Auto evacuate... ? That is anyhow far in next steps as of complex to
build. It is even case specific, what to do in different scenarios:
- Manually do any action by admin.
- Automatically move VM (maybe not if problem with bigger scale)
- Let it stay on host over maintenance (not busy hour for service)
- Let VM owner remove/add VM (to host already gone through maintenance)
...
> 
> > For maintenance I had a session in Austin to talk with Ops and Nova
> > core about the maintenance part. There it was seen that Nova didn't
> > want more specific information about host maintenance (maintenance
> > state, maintenance window...), so as a result of the discussion
> > there is a spec that was now transferred to Ocata:
> >
> > https://review.openstack.org/310510/
> 
> That's great - thanks a lot for highlighting, as it certainly seems to
> overlap a lot with the functionality which NTT proposed and is now
> described here:
> 
>   http://specs.openstack.org/openstack/openstack-user-stories/user-
> stories/proposed/ha_vm.html

Thanks, need to familiarize into this as well as other requests in the
field.
> 
> > The spec proposes a link to Nova external tool to provide more
> > specific information about host (compute) maintenance and by latest
> > comments it could have any host specific extra information to the
> > same place (for example you have mentioned event history). Still if
> > looking this kind of tool, why not make it configurable for anything
> > convenient for different operator scenario like automatic operations
> > if so wanted.
> 
> Yes, that definitely makes sense to me.
> 
> > Anyhow project like Nova do not want big new functionalities, so all
> > "more complex flows" should reside somewhere outside.
> 
> Right.  I can certainly understand that desire, but I'm a bit confused
> why the spec is proposing both extending Nova's API / DB schema *and*
> adding an external tool.
I understand this point as just the text field is also usable. External
tool is kind of out of scope of the spec. Anyhow would mention it to
have the understanding that the aim is to build more functionality in
the future into OpenStack and not to limit to what single string can offer.

Br,
Tomi


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [HA] RFC: user story including hypervisor reservation / host maintenance / storage AZs / event history (fwd)

2016-06-13 Thread Juvonen, Tomi (Nokia - FI/Espoo)
Hi,

I'm working in the OPNFV Doctor project that is about fault management and 
maintenance (NFV). The goal of the project is to build fault management and 
maintenance framework for high availability of Network Services on top of 
virtualized infrastructure.
https://wiki.opnfv.org/display/doctor

Currently there is already landed effort to OpenStack to have ability to detect 
failures fast, change states in OpenStack (Nova), add state information that 
was missing and also to expose that to owner of a VM. Also alarm is triggered. 
By all this one can now rely the states and get notice about faults in a split 
second. Surely with system configured monitor different faults and make actions 
based configured policies, or leave some actions for consumers of the alarms 
risen.

For maintenance I had a session in Austin to talk with Ops and Nova core about 
the maintenance part. There it was seen that Nova didn't want more specific 
information about host maintenance (maintenance state, maintenance window...), 
so as a result of the discussion there is a spec that was now transferred to 
Ocata: https://review.openstack.org/310510/
The spec proposes a link to Nova external tool to provide more specific 
information about host (compute) maintenance and by latest comments it could 
have any host specific extra information to the same place (for example you 
have mentioned event history). Still if looking this kind of tool, why not make 
it configurable for anything convenient for different operator scenario like 
automatic operations if so wanted. Anyhow project like Nova do not want big new 
functionalities, so all "more complex flows" should reside somewhere outside.

Br,
Tomi


> -Original Message-
> From: Adam Spiers [mailto:aspi...@suse.com]
> Sent: Monday, June 13, 2016 12:19 PM
> To: openstack-operators mailing list  operat...@lists.openstack.org>
> Subject: [Openstack-operators] [HA] RFC: user story including hypervisor
> reservation / host maintenance / storage AZs / event history (fwd)
> 
> Hi all,
> 
> Apologies for not thinking to Cc this openstack-operators list first
> time round when I sent the below mail!  It concerns four usage
> scenarios which all principally involve cloud operators, so with
> hindsight that was a really stupid omission :-/
> 
> I would be very interested to hear both:
> 
>   a) whether you think our proposal to create four new user stories
>  for each of these makes sense, and
> 
>   b) feedback on any of the individual usage scenarios.
> 
> Thanks a lot!
> Adam
> 
> - Forwarded message from Adam Spiers  -
> 
> Date: Wed, 8 Jun 2016 00:19:52 +0100
> From: Adam Spiers 
> To: openstack-dev mailing list 
> Cc: OpenStack Product Working Group list 
> Subject: [openstack-dev] [HA] RFC: user story including hypervisor
> reservation / host maintenance / storage AZs / event history
> Reply-To: "OpenStack Development Mailing List (not for usage questions)"
> 
> 
> [Cc'ing product-wg@ - when replying, first please consider whether
> cross-posting is appropriate]
> 
> Hi all,
> 
> Currently the OpenStack HA community is putting a lot of effort into
> converging on a single upstream solution for high availability of VMs
> and hypervisors[0], and we had a lot of very productive discussions in
> Austin on this topic[1].
> 
> One of the first areas of focus is the high level user story:
> 
>http://specs.openstack.org/openstack/openstack-user-stories/user-
> stories/proposed/ha_vm.html
> 
> In particular, there is an open review on which we could use some
> advice from the wider community.  The review proposes adding four
> extra usage scenarios to the existing user story.  All of these
> scenarios are to some degree related to HA of VMs and hypervisors,
> however none of them exclusively - they all have scope extending to
> other areas beyond HA.  Here's a very brief summary of all four, as
> they relate to HA:
> 
> 1. "Sticky" shared storage zones
> 
>Scenario: all compute hosts have access to exactly one shared
>storage "availability zone" (potentially independent of the normal
>availability zones).  For example, there could be multiple NFS
>servers, and every compute host has /var/lib/nova/instances mounted
>to one of them.  On first boot, each VM is *implicitly* assigned to
>a zone, depending on which compute host nova-scheduler picks for it
>(so this could be more or less random).  Subsequent operations such
>as "nova evacuate" would need to ensure the VM only ever moves to
>other hosts in the same zone.
> 
> 2. Hypervisor reservation
> 
>The operator wants a mechanism for reserving some compute hosts
>exclusively for use as failover hosts on which to automatically
>resurrect VMs from other failed compute nodes.
> 
> 3. Host maintenance
> 
>The operator wants a mechanism for flagging hosts as undergoing
>maintenance, so that the HA mechanisms for automatic recovery are
>temporarily disabled during the ma

Re: [Openstack-operators] Maintenance

2016-04-22 Thread Juvonen, Tomi (Nokia - FI/Espoo)
This was originated because theTelco requirements I have described there. The 
implementation will reside in the OpenStack. So we are looking the problem 
described, what else operators need and then how to accomplish that. Most 
probably looking a new tool instead of injecting any existing.

Tomi

Sent from Outlook Mobile<https://aka.ms/blhgte>

From: EXT Tim Bell
Sent: Saturday, April 23, 04:46
Subject: Re: [Openstack-operators] Maintenance
To: Joseph Bajin, Robert Starmer
Cc: OpenStack Operators


The overall requirements are being reviewed in 
https://etherpad.openstack.org/p/AUS-ops-Nova-maint. A future tool may make its 
way in OSOps but I think we should keep the requirements discussion distinct 
from the available community tools and their tool repository.

Tim

From: Joseph Bajin mailto:josephba...@gmail.com>>
Date: Friday 22 April 2016 at 17:55
To: Robert Starmer mailto:rob...@kumul.us>>
Cc: openstack-operators 
mailto:openstack-operators@lists.openstack.org>>
Subject: Re: [Openstack-operators] Maintenance

Rob/Jay,

The use of the OSOps Working group and its repos is a great way to address 
this.. If any of you are coming to the Summit, please take a look at our 
Etherpad that we have created.[1]   This could be a great discussion topic for 
the working sessions and we can brainstorm how we could help with this.

Joe

[1] https://etherpad.openstack.org/p/AUS-ops-OSOps

On Fri, Apr 22, 2016 at 4:02 PM, Robert Starmer 
mailto:rob...@kumul.us>> wrote:

Maybe a result of the discussion can be a set of models (let's not go so far as 
to call them best pracices yet :) for how maintainance can be done at scale, 
perhaps solidifying the descriptions Jay has above with the user stories Tomi 
described in his initial note.  This seems like an achievable outcome from a 
working session, and the output even has a target, either creating scripable 
workflows that could end up in the OSops repository, or as user stories that 
can be mapped to the PM working group.

R

On Fri, Apr 22, 2016 at 12:47 PM, Jay Pipes 
mailto:jaypi...@gmail.com>> wrote:

On 04/14/2016 05:14 AM, Juvonen, Tomi (Nokia - FI/Espoo) wrote:


As admin I want to know when host is ready to actions to be done by admin
during the maintenance. Meaning physical resources are emptied.

You are equating "host maintenance mode" with the end result of a call to `nova 
host-evacuate-live`. The two are not the same.

"host maintenance mode" typically just refers to taking a Nova compute node out 
of consideration for placing new workloads on that compute node. Putting a Nova 
compute node into host maintenance mode is as simple as calling `nova 
service-disable $hostname nova-compute`.

Depending on what you need to perform on the compute node that is in host 
maintenance mode, you *may* want to migrate the workloads from that compute 
node to some other compute node that isn't in host maintenance mode. The `nova 
host-evacuate $hostname` and `nova host-evacuate-live $hostname` commands in 
the Nova CLI [1] can be used to migrate or live-migrate all workloads off the 
target compute node.

Live migration will reduce the disruption that tenant workloads (data plane) 
experience during the workload migration. However, research at Mirantis has 
shown that libvirt/KVM/QEMU live migration performed against workloads with 
even a medium rate of memory page dirtying can easily never complete. Solutions 
like auto-converge and xbzrle compression have minimal effect on this, 
unfortunately. Pausing a workload manually is typically what is done to force 
the live migration to complete.

[1] Note that these are commands in the Nova CLI tool (python-novaclient). 
Neither a host-evacuate nor a host-evacuate-live REST API call exists in the 
Compute API. This fact alone should suggest to folks that the appropriate place 
to put logic associated with performing host maintenance tasks should be 
*outside* of Nova entirely...

As owner of a server I want to prepare for maintenance to minimize downtime,
keep capacity on needed level and switch HA service to server not
affected by maintenance.

This isn't an appropriate use case, IMHO. HA control planes should, by their 
very nature, be established across various failure domains. The whole *point* 
of having an HA service is so that you don't need to "prepare" for some 
maintenance event (planned or unplanned).

All HA control planes worth their salt will be able to notify some external 
listener of a partition in the cluster. This HA control plane is the 
responsibility of the tenant, not the infrastructure (i.e. Nova). I really do 
not want to add coupling between infrastructure control plane services and 
tenant control plane services.

As owner of a server I want to know when my servers will be down because of
host maintenance as it might be servers are not moved to another host.

See above. As an owner of a server involved in 

Re: [Openstack-operators] Maintenance

2016-04-15 Thread Juvonen, Tomi (Nokia - FI/Espoo)
Thanks, this is great.

Br,
Tomi

> -Original Message-
> From: EXT Tom Fifield [mailto:t...@openstack.org]
> Sent: Friday, April 15, 2016 9:33 AM
> To: Juvonen, Tomi (Nokia - FI/Espoo) ; openstack-
> operat...@lists.openstack.org
> Subject: Re: [Openstack-operators] Maintenance
> 
> OK, you're on the agenda!
> 
> 
> Hilton Austin - MR 406
> Monday, April 25, 2:50pm-3:30pm
> https://www.openstack.org/summit/austin-2016/summit-schedule/events/9516
> 
> Moderators guide is at:
> 
> https://wiki.openstack.org/wiki/Operations/Meetups#Moderators_Guide
> 
> Etherpad for your session:
> 
> https://etherpad.openstack.org/p/AUS-ops-Nova-maint
> 
> Regards,
> 
> 
> Tom
> 
> 
> 
> 
> 
> On 14/04/16 18:45, Juvonen, Tomi (Nokia - FI/Espoo) wrote:
> > Hi Tom,
> >
> > Yes, it would be good to have a discussion session and I could be a
> moderator.
> >
> > Br,
> > Tomi
> >
> >> -Original Message-
> >> From: EXT Tom Fifield [mailto:t...@openstack.org]
> >> Sent: Thursday, April 14, 2016 1:16 PM
> >> To: openstack-operators@lists.openstack.org
> >> Subject: Re: [Openstack-operators] Maintenance
> >>
> >> Hi Tomi,
> >>
> >> This seems like a pretty important topic.
> >>
> >> In addition to this thread, would you consider moderating an ops summit
> >> discussion in Austin to gather more about how ops run maintenance on
> >> their nova installs?
> >>
> >> Regards,
> >>
> >>
> >> Tom
> >>
> >> On 14/04/16 17:14, Juvonen, Tomi (Nokia - FI/Espoo) wrote:
> >>> Hi Ops,
> >>> I am working in OPNFV Doctor project that has the Telco perspective
> >>> about host maintenance related requirements to OpenStack. Already
> talked
> >>> some in dev mailing list and Nova team, but would like to have operator
> >>> perspective and interest for maintenance related changes. Not sure
> where
> >>> this will lead, but even a new OpenStack project to fulfil the
> >>> requirements. This will be somehow also close to fault monitoring
> >>> systems as these NFV related flows are very similar and also monitoring
> >>> needs to be aware of the maintenance. I will also be in Austin together
> >>> with other OPNFV Doctor people, if to discuss something there.
> >>> Here is link to OPNFV Doctor requirements:
> >>> _http://artifacts.opnfv.org/doctor/docs/requirements/02-
> >> use_cases.html#nvfi-maintenance_
> >>> <http://artifacts.opnfv.org/doctor/docs/requirements/02-use_cases.html>
> >>> _http://artifacts.opnfv.org/doctor/docs/requirements/03-
> >> architecture.html#nfvi-maintenance_
> >>> <http://artifacts.opnfv.org/doctor/docs/requirements/03-
> >> architecture.html>
> >>> _http://artifacts.opnfv.org/doctor/docs/requirements/05-
> >> implementation.html#nfvi-maintenance_
> >>> <http://artifacts.opnfv.org/doctor/docs/requirements/05-
> >> implementation.html>
> >>> Here is what I could transfer as use cases, but would ask feedback to
> >>> get more:
> >>> As admin I want to set maintenance period for certain host.
> >>> As admin I want to know when host is ready to actions to be done by
> admin
> >>> during the maintenance. Meaning physical resources are emptied.
> >>> As owner of a server I want to prepare for maintenance to minimize
> >> downtime,
> >>> keep capacity on needed level and switch HA service to server not
> >>> affected by
> >>> maintenance.
> >>> As owner of a server I want to know when my servers will be down
> because
> >> of
> >>> host maintenance as it might be servers are not moved to another host.
> >>> As owner of a server I want to know if host is to be totally removed,
> so
> >>> instead of keeping my servers on host during maintenance, I want to
> move
> >>> them
> >>> to somewhere else.
> >>> As owner of a server I want to send acknowledgement to be ready for
> host
> >>> maintenance and I want to state if servers are to be moved or kept on
> >> host.
> >>> Removal and creating of server is in owner's control already.
> Optionally
> >>> server
> >>> Configuration data could hold information about automatic actions to be
> >>> done
> >>> when host is going down unexpectedly or in controlled manner. Also
> >>> actions at
> >>> the same if down permanently or only temporarily. Still this needs
> >>> acknowledgement from server owner as he needs time for application
> level
> >>> controlled HA service switchover.
> >>> Br,
> >>> Tomi
> >>>
> >>>
> >>> ___
> >>> OpenStack-operators mailing list
> >>> OpenStack-operators@lists.openstack.org
> >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
> >>>
> >>
> >>
> >> ___
> >> OpenStack-operators mailing list
> >> OpenStack-operators@lists.openstack.org
> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Maintenance

2016-04-14 Thread Juvonen, Tomi (Nokia - FI/Espoo)
Hi Tom,

Yes, it would be good to have a discussion session and I could be a moderator.

Br,
Tomi

> -Original Message-
> From: EXT Tom Fifield [mailto:t...@openstack.org]
> Sent: Thursday, April 14, 2016 1:16 PM
> To: openstack-operators@lists.openstack.org
> Subject: Re: [Openstack-operators] Maintenance
> 
> Hi Tomi,
> 
> This seems like a pretty important topic.
> 
> In addition to this thread, would you consider moderating an ops summit
> discussion in Austin to gather more about how ops run maintenance on
> their nova installs?
> 
> Regards,
> 
> 
> Tom
> 
> On 14/04/16 17:14, Juvonen, Tomi (Nokia - FI/Espoo) wrote:
> > Hi Ops,
> > I am working in OPNFV Doctor project that has the Telco perspective
> > about host maintenance related requirements to OpenStack. Already talked
> > some in dev mailing list and Nova team, but would like to have operator
> > perspective and interest for maintenance related changes. Not sure where
> > this will lead, but even a new OpenStack project to fulfil the
> > requirements. This will be somehow also close to fault monitoring
> > systems as these NFV related flows are very similar and also monitoring
> > needs to be aware of the maintenance. I will also be in Austin together
> > with other OPNFV Doctor people, if to discuss something there.
> > Here is link to OPNFV Doctor requirements:
> > _http://artifacts.opnfv.org/doctor/docs/requirements/02-
> use_cases.html#nvfi-maintenance_
> > <http://artifacts.opnfv.org/doctor/docs/requirements/02-use_cases.html>
> > _http://artifacts.opnfv.org/doctor/docs/requirements/03-
> architecture.html#nfvi-maintenance_
> > <http://artifacts.opnfv.org/doctor/docs/requirements/03-
> architecture.html>
> > _http://artifacts.opnfv.org/doctor/docs/requirements/05-
> implementation.html#nfvi-maintenance_
> > <http://artifacts.opnfv.org/doctor/docs/requirements/05-
> implementation.html>
> > Here is what I could transfer as use cases, but would ask feedback to
> > get more:
> > As admin I want to set maintenance period for certain host.
> > As admin I want to know when host is ready to actions to be done by admin
> > during the maintenance. Meaning physical resources are emptied.
> > As owner of a server I want to prepare for maintenance to minimize
> downtime,
> > keep capacity on needed level and switch HA service to server not
> > affected by
> > maintenance.
> > As owner of a server I want to know when my servers will be down because
> of
> > host maintenance as it might be servers are not moved to another host.
> > As owner of a server I want to know if host is to be totally removed, so
> > instead of keeping my servers on host during maintenance, I want to move
> > them
> > to somewhere else.
> > As owner of a server I want to send acknowledgement to be ready for host
> > maintenance and I want to state if servers are to be moved or kept on
> host.
> > Removal and creating of server is in owner's control already. Optionally
> > server
> > Configuration data could hold information about automatic actions to be
> > done
> > when host is going down unexpectedly or in controlled manner. Also
> > actions at
> > the same if down permanently or only temporarily. Still this needs
> > acknowledgement from server owner as he needs time for application level
> > controlled HA service switchover.
> > Br,
> > Tomi
> >
> >
> > ___
> > OpenStack-operators mailing list
> > OpenStack-operators@lists.openstack.org
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
> >
> 
> 
> ___
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] Maintenance

2016-04-14 Thread Juvonen, Tomi (Nokia - FI/Espoo)
Hi Ops,

I am working in OPNFV Doctor project that has the Telco perspective about host 
maintenance related requirements to OpenStack. Already talked some in dev 
mailing list and Nova team, but would like to have operator perspective and 
interest for maintenance related changes. Not sure where this will lead, but 
even a new OpenStack project to fulfil the requirements. This will be somehow 
also close to fault monitoring systems as these NFV related flows are very 
similar and also monitoring needs to be aware of the maintenance. I will also 
be in Austin together with other OPNFV Doctor people, if to discuss something 
there.

Here is link to OPNFV Doctor requirements:
http://artifacts.opnfv.org/doctor/docs/requirements/02-use_cases.html#nvfi-maintenance
http://artifacts.opnfv.org/doctor/docs/requirements/03-architecture.html#nfvi-maintenance
http://artifacts.opnfv.org/doctor/docs/requirements/05-implementation.html#nfvi-maintenance

Here is what I could transfer as use cases, but would ask feedback to get more:

As admin I want to set maintenance period for certain host.

As admin I want to know when host is ready to actions to be done by admin
during the maintenance. Meaning physical resources are emptied.

As owner of a server I want to prepare for maintenance to minimize downtime,
keep capacity on needed level and switch HA service to server not affected by
maintenance.

As owner of a server I want to know when my servers will be down because of
host maintenance as it might be servers are not moved to another host.

As owner of a server I want to know if host is to be totally removed, so
instead of keeping my servers on host during maintenance, I want to move them
to somewhere else.

As owner of a server I want to send acknowledgement to be ready for host
maintenance and I want to state if servers are to be moved or kept on host.
Removal and creating of server is in owner's control already. Optionally server
Configuration data could hold information about automatic actions to be done
when host is going down unexpectedly or in controlled manner. Also actions at
the same if down permanently or only temporarily. Still this needs
acknowledgement from server owner as he needs time for application level
controlled HA service switchover.

Br,
Tomi


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators