Re: [openstack-dev] [nova] Upgrade readiness check notes
On 12/16/2016 3:20 AM, Sylvain Bauza wrote: Le 16/12/2016 03:53, Matt Riedemann a écrit : A few of us have talked about writing a command to tell when you're ready to upgrade (restart services with new code) for Ocata because we have a few things landing in Ocata which are going from optional to required, namely cells v2 and the placement API service. We already have the 030 API DB migration for cells v2 which went into the o-2 milestone today which breaks the API DB schema migrations if you haven't run at least 'nova-manage cell_v2 simple_cell_setup'. We have noted a few times in the placement/resource providers discussions the need for something similar for the placement API, but because that spans multiple databases (API and cell DBs), and because it involves making sure the service is up and we can make REST API requests to it, we can't do it in just a DB schema migration. So today dansmith, jaypipes, sdague, edleafe and myself jumped on a call to go over some notes / ideas being kicked around in an etherpad: https://etherpad.openstack.org/p/nova-ocata-ready-for-upgrade Thanks for having done that call and having written that etherpad, that clarifies a lot of open questions I had in mind without having time to kick them off on IRC. We agreed on writing a new CLI outside of nova-manage called nova-status which can perform the upgrade readiness check for both cells v2 and the placement API. For cells v2 it's really going to check basically the same things as the 030 API DB schema migration. For the placement API, it's going to do at least two things: 1. Try and make a request to / on the placement API endpoint from the service catalog. This will at least check that (a) the placement endpoint is in the service catalog, (b) nova.conf is configured with credentials to make the request and (c) the service is running and accepting requests. 2. Count the number of resource_providers in the API DB and compare that to the number of compute_nodes in the cell DB and if there are fewer resource providers than compute nodes, it's an issue which we'll flag in the upgrade readiness CLI. This doesn't necessarily mean you can't upgrade to Ocata, it just means there might be fewer computes available for scheduling once you get to Ocata, so the chance of rebuilds and NoValidHost increases until the computes are upgraded to Ocata and configured to use the placement service to report inventories/usage/allocations for RAM, CPU and disk. Jus to make it clear, that CMD tool is mostly giving you a pre-flight status on whether you could upgrade to Ocata, right? I mean, operators can use that tool for getting a light, either green, orange or red telling them if they'll encounter either no issues, possible ones or definitely errors, right? Correct, that's the intent. You're the second person to bring up colors as status indicators though (red, yellow, green for US traffic signals was the other). I'm going to do something like success/warning/error instead of colors to avoid confusion (I think they have blue traffic signals in Japan?). And corresponding return codes (0 = success, 1 = warnings, 2 = something failed). If so, I think that's a nice tool to have that we could gradually improve to make it convenient for people wanting to know if they miss something before upgrading. That 2nd point is important because we also agreed to make the filter scheduler NOT fallback to querying the compute_nodes table if there are no resource providers available from the placement API. That means when the scheduler gets a request to build or move a server, it's going to query the placement API for possible resource providers to serve the CPU/RAM/disk requirements for the build request spec and if nothing is available it'll result in a NoValidHost failure. That is a change in direction from a fallback plan we originally had in the spec here: https://specs.openstack.org/openstack/nova-specs/specs/ocata/approved/resource-providers-scheduler-db-filters.html#other-deployer-impact I just noticed something wrong with that section since we agreed when reviewing that spec that we won't remove the legacy CPU/RAM/MEM filters in Pike since the CachingScheduler will still use them. We'll just default to disable them for the FilterScheduler, but anyone could just enable them back if they want to use the CachingScheduler. OK, we need to update that spec anyway to remove the provision about the fallback mechanism in the filter scheduler. I'll amend the spec if someone else doesn't get to it first. We're changing direction on that because we really want to make the placement service required in Ocata and not delay it's usage for another release, because as long as it's optional people are going to avoid deploying and using it, which pushes us further out from forward progress around the scheduler, placement service and resource tracker. I fully agree with that goal. We already communicated that we will require
Re: [openstack-dev] [nova] Upgrade readiness check notes
Le 16/12/2016 03:53, Matt Riedemann a écrit : > A few of us have talked about writing a command to tell when you're > ready to upgrade (restart services with new code) for Ocata because we > have a few things landing in Ocata which are going from optional to > required, namely cells v2 and the placement API service. > > We already have the 030 API DB migration for cells v2 which went into > the o-2 milestone today which breaks the API DB schema migrations if you > haven't run at least 'nova-manage cell_v2 simple_cell_setup'. > > We have noted a few times in the placement/resource providers > discussions the need for something similar for the placement API, but > because that spans multiple databases (API and cell DBs), and because it > involves making sure the service is up and we can make REST API requests > to it, we can't do it in just a DB schema migration. > > So today dansmith, jaypipes, sdague, edleafe and myself jumped on a call > to go over some notes / ideas being kicked around in an etherpad: > > https://etherpad.openstack.org/p/nova-ocata-ready-for-upgrade > Thanks for having done that call and having written that etherpad, that clarifies a lot of open questions I had in mind without having time to kick them off on IRC. > We agreed on writing a new CLI outside of nova-manage called nova-status > which can perform the upgrade readiness check for both cells v2 and the > placement API. > > For cells v2 it's really going to check basically the same things as the > 030 API DB schema migration. > > For the placement API, it's going to do at least two things: > > 1. Try and make a request to / on the placement API endpoint from the > service catalog. This will at least check that (a) the placement > endpoint is in the service catalog, (b) nova.conf is configured with > credentials to make the request and (c) the service is running and > accepting requests. > > 2. Count the number of resource_providers in the API DB and compare that > to the number of compute_nodes in the cell DB and if there are fewer > resource providers than compute nodes, it's an issue which we'll flag in > the upgrade readiness CLI. This doesn't necessarily mean you can't > upgrade to Ocata, it just means there might be fewer computes available > for scheduling once you get to Ocata, so the chance of rebuilds and > NoValidHost increases until the computes are upgraded to Ocata and > configured to use the placement service to report > inventories/usage/allocations for RAM, CPU and disk. > Jus to make it clear, that CMD tool is mostly giving you a pre-flight status on whether you could upgrade to Ocata, right? I mean, operators can use that tool for getting a light, either green, orange or red telling them if they'll encounter either no issues, possible ones or definitely errors, right? If so, I think that's a nice tool to have that we could gradually improve to make it convenient for people wanting to know if they miss something before upgrading. > That 2nd point is important because we also agreed to make the filter > scheduler NOT fallback to querying the compute_nodes table if there are > no resource providers available from the placement API. That means when > the scheduler gets a request to build or move a server, it's going to > query the placement API for possible resource providers to serve the > CPU/RAM/disk requirements for the build request spec and if nothing is > available it'll result in a NoValidHost failure. That is a change in > direction from a fallback plan we originally had in the spec here: > > https://specs.openstack.org/openstack/nova-specs/specs/ocata/approved/resource-providers-scheduler-db-filters.html#other-deployer-impact > > I just noticed something wrong with that section since we agreed when reviewing that spec that we won't remove the legacy CPU/RAM/MEM filters in Pike since the CachingScheduler will still use them. We'll just default to disable them for the FilterScheduler, but anyone could just enable them back if they want to use the CachingScheduler. > We're changing direction on that because we really want to make the > placement service required in Ocata and not delay it's usage for another > release, because as long as it's optional people are going to avoid > deploying and using it, which pushes us further out from forward > progress around the scheduler, placement service and resource tracker. > I fully agree with that goal. We already communicated that we will require the Placement service in Ocata in the Newton release notes so there is nothing really new here. > Regarding where this new CLI lived, and how it's deployed, and when it's > called, we had discussed a few options there, even talking about > splitting it out into it's own pip-installable package. We have a few > options but we aren't going to be totally clear on that until we get the > POC code written and then try to integrate it into grenade, so we're > basically deferring that discussion/decision
[openstack-dev] [nova] Upgrade readiness check notes
A few of us have talked about writing a command to tell when you're ready to upgrade (restart services with new code) for Ocata because we have a few things landing in Ocata which are going from optional to required, namely cells v2 and the placement API service. We already have the 030 API DB migration for cells v2 which went into the o-2 milestone today which breaks the API DB schema migrations if you haven't run at least 'nova-manage cell_v2 simple_cell_setup'. We have noted a few times in the placement/resource providers discussions the need for something similar for the placement API, but because that spans multiple databases (API and cell DBs), and because it involves making sure the service is up and we can make REST API requests to it, we can't do it in just a DB schema migration. So today dansmith, jaypipes, sdague, edleafe and myself jumped on a call to go over some notes / ideas being kicked around in an etherpad: https://etherpad.openstack.org/p/nova-ocata-ready-for-upgrade We agreed on writing a new CLI outside of nova-manage called nova-status which can perform the upgrade readiness check for both cells v2 and the placement API. For cells v2 it's really going to check basically the same things as the 030 API DB schema migration. For the placement API, it's going to do at least two things: 1. Try and make a request to / on the placement API endpoint from the service catalog. This will at least check that (a) the placement endpoint is in the service catalog, (b) nova.conf is configured with credentials to make the request and (c) the service is running and accepting requests. 2. Count the number of resource_providers in the API DB and compare that to the number of compute_nodes in the cell DB and if there are fewer resource providers than compute nodes, it's an issue which we'll flag in the upgrade readiness CLI. This doesn't necessarily mean you can't upgrade to Ocata, it just means there might be fewer computes available for scheduling once you get to Ocata, so the chance of rebuilds and NoValidHost increases until the computes are upgraded to Ocata and configured to use the placement service to report inventories/usage/allocations for RAM, CPU and disk. That 2nd point is important because we also agreed to make the filter scheduler NOT fallback to querying the compute_nodes table if there are no resource providers available from the placement API. That means when the scheduler gets a request to build or move a server, it's going to query the placement API for possible resource providers to serve the CPU/RAM/disk requirements for the build request spec and if nothing is available it'll result in a NoValidHost failure. That is a change in direction from a fallback plan we originally had in the spec here: https://specs.openstack.org/openstack/nova-specs/specs/ocata/approved/resource-providers-scheduler-db-filters.html#other-deployer-impact We're changing direction on that because we really want to make the placement service required in Ocata and not delay it's usage for another release, because as long as it's optional people are going to avoid deploying and using it, which pushes us further out from forward progress around the scheduler, placement service and resource tracker. Regarding where this new CLI lived, and how it's deployed, and when it's called, we had discussed a few options there, even talking about splitting it out into it's own pip-installable package. We have a few options but we aren't going to be totally clear on that until we get the POC code written and then try to integrate it into grenade, so we're basically deferring that discussion/decision for now. Wherever it is, we know it needs to be run with the Ocata code (since that's where it's going to be available), and after running the simple_cell_setup command), and it needs to be run before restarting services with the Ocata code. I'm not totally sure if it needs to be run after the DB migrations or not, maybe Dan can clarify, but we'll sort it out for sure when we integrate with grenade. Anyway, the POC is started here: https://review.openstack.org/#/c/411517/ I've got the basic framework in place and there is a patch on top that does the cells v2 check. I plan on working on the placement API checks tomorrow. If you've read this far, congratulations. This email is really just about communicating that things are happening because we have talked about the need for this a few times, but hadn't hashed it out yet. -- Thanks, Matt Riedemann __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev