Re: [Openstack-operators] [nova][ironic][scheduler][placement] IMPORTANT: Getting rid of the automated reschedule functionality
On 05/22/2017 01:55 PM, Jay Pipes wrote: On 05/22/2017 03:53 PM, Jonathan Proulx wrote: To be clear on my view of the whole proposal most of my Rescheduling that I've seen and want are of type "A" where claim exceeds resources. At least I think they are type "A" and not "C" unknown. The exact case is that I over subsribe RAM (1.5x) my users typically over claim so this is OK (my worst case is a hypervisor using only 10% of claimed RAM). But there are some hotspots where propertional utilization is high so libvirt won't start more VMs becasue it really doesn't have the memory. If that's solved (or will be at the time reschedule goes away), teh cases I've actually experienced would be solved. The anit-affinity use cases are currently most important to be of the affinity scheduling and I haven't (to my knowlege) seen collisions in that direction. So I could live with that race becuase for me it is uncommon (though I imagine for others where positive affinity is important teh race may get lost mroe frequently) Thanks for the feedback, Jon. For the record, affinity really doesn't have much of a race condition at all. It's really only anti-affinity that has much of a chance of last-minute violation. Don't they have the same race on instance boot? Two instances being started in the same (initially empty) affinity group could be scheduled in parallel and end up on different compute nodes. Chris ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] [nova][ironic][scheduler][placement] IMPORTANT: Getting rid of the automated reschedule functionality
On May 23, 2017, at 3:15 PM, melanie witt wrote: > > Removing the small VM driver from Nova would allow people to keep using what > they know (Nova API) but would keep a lot of cruft with it. So I would tend > to favor a new porcelain API. This. -- Ed Leafe ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] [nova][ironic][scheduler][placement] IMPORTANT: Getting rid of the automated reschedule functionality
Thanks for the feedback, Curtis, appreciated! On 05/23/2017 04:09 PM, Curtis wrote: On Tue, May 23, 2017 at 1:20 PM, Edward Leafe wrote: On May 23, 2017, at 1:27 PM, James Penick wrote: Perhaps this is a place where the TC and Foundation should step in and foster the existence of a porcelain API. Either by constructing something new, or by growing Nova into that thing. Oh please, not Nova. The last word that comes to mind when thinking about Nova code is “porcelain”. I keep seeing the word "porcelain", but I'm not sure what it means in this context. Could someone help me out here and explain what that is? :) Here's where the term porcelain comes from: https://git-scm.com/book/en/v2/Git-Internals-Plumbing-and-Porcelain Best, -jay ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] [nova][ironic][scheduler][placement] IMPORTANT: Getting rid of the automated reschedule functionality
On Tue, 23 May 2017 20:58:20 +0100 (IST), Chris Dent wrote: If we're talking big crazy changes: Why not take the "small VM driver" (presumably nova-compute) out of Nova? What stays behind is _already_ orchestration but missing some features and having a fair few bugs. I've suggested this a couple of times on the dev ML in replies to other threads in the past. We could either build a new porcelain API fresh and then whittle Nova down into a small VM driver or we could take the small VM driver out of Nova and mold Nova into the porcelain API. New porcelain API would be a fresh start to "do it right" and would involve having people switch over to it. I think there would be sufficient motivation for operators to take on the effort of deploying it, considering there would be a lot of features their end users would want to get. Removing the small VM driver from Nova would allow people to keep using what they know (Nova API) but would keep a lot of cruft with it. So I would tend to favor a new porcelain API. We really need one, like yesterday IMHO. -melanie ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] [nova][ironic][scheduler][placement] IMPORTANT: Getting rid of the automated reschedule functionality
On Tue, May 23, 2017 at 1:20 PM, Edward Leafe wrote: > On May 23, 2017, at 1:27 PM, James Penick wrote: > >> Perhaps this is a place where the TC and Foundation should step in and >> foster the existence of a porcelain API. Either by constructing something >> new, or by growing Nova into that thing. > > > Oh please, not Nova. The last word that comes to mind when thinking about > Nova code is “porcelain”. > I keep seeing the word "porcelain", but I'm not sure what it means in this context. Could someone help me out here and explain what that is? :) For my $0.02 as an operator, most of the time I see retries they are all failures, but I haven't run as big of clouds as a lot of people on this list. I have certainly seen IPMI fail intermittently (I have a script that logs in to a bunch of service processors and restarts them) and would very much like to use Ironic to manage large pools of baremetal nodes, so I could see that being an issue. As a user of cloud resources though I always use some kind of automation tooling with some form of looping for retries, but that it's not always easy to get customers/users to use that kind of tooling. For NFV workloads/clouds there almost always be some kind of higher level abstraction (eg. as mentioned MANO) managing the resources and it can retry (thought not all of them actually have that functionality...yet). So, as an operator and a user, I would personally be Ok with Nova retries if it significantly adds to the complexity of Nova. I certainly would not abandon Ironic if Nova didn't retry. I do wonder what custom code might be required in say a public cloud providing Ironic nodes though. Thanks, Curtis. > > -- Ed Leafe > > > > > > > ___ > OpenStack-operators mailing list > OpenStack-operators@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] [nova][ironic][scheduler][placement] IMPORTANT: Getting rid of the automated reschedule functionality
On Tue, 23 May 2017, James Penick wrote: I agree that a single entry point to OpenStack would be fantastic. If it existed, scheduling, quota, etc would have moved out of Nova a long time ago, and Nova at this point would be just a small VM driver. Unfortunately such a thing does not yet exist, and Nova has the momentum and mind share as -The- entry point for all things Compute in OpenStack. [snip some reality] Perhaps this is a place where the TC and Foundation should step in and foster the existence of a porcelain API. Either by constructing something new, or by growing Nova into that thing. If we're talking big crazy changes: Why not take the "small VM driver" (presumably nova-compute) out of Nova? What stays behind is _already_ orchestration but missing some features and having a fair few bugs. Way back in April[1] ttx asserted: One insight which I think we could take from this is that when a smaller group of people "owns" a set of files, we raise quality (compared to everyone owning everything). So the more we can split the code along areas of expertise and smaller review teams, the better. But I think that is also something we intuitively knew. [1] http://lists.openstack.org/pipermail/openstack-dev/2017-April/115061.html -- Chris Dent ┬──┬◡ノ(° -°ノ) https://anticdent.org/ freenode: cdent tw: @anticdent___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] [nova][ironic][scheduler][placement] IMPORTANT: Getting rid of the automated reschedule functionality
On Tue, May 23, 2017 at 12:20 PM, Edward Leafe wrote: > > > Oh please, not Nova. The last word that comes to mind when thinking about > Nova code is “porcelain”. > Oh I dunno, porcelain is usually associated with so many every day objects. If we really push, we could see a movement in the right direction. Better to use what we have, then wipe it all and flush so much hard work. ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] [nova][ironic][scheduler][placement] IMPORTANT: Getting rid of the automated reschedule functionality
On May 23, 2017, at 1:27 PM, James Penick wrote: > Perhaps this is a place where the TC and Foundation should step in and > foster the existence of a porcelain API. Either by constructing something > new, or by growing Nova into that thing. Oh please, not Nova. The last word that comes to mind when thinking about Nova code is “porcelain”. -- Ed Leafe ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] [nova][ironic][scheduler][placement] IMPORTANT: Getting rid of the automated reschedule functionality
On Tue, May 23, 2017 at 8:52 AM, Jay Pipes wrote: > > If Heat was more widely deployed, would you feel this way? Would you > reconsider having Heat as one of those "basic compute services" in > OpenStack, then? > > (Caveat: I haven't looked at Heat in at least a year) I haven't deployed heat in my environment yet, because as a template based orchestration system it requires that you pass the correct template to construct or tear down a stack. If you were to come along and remove part of that stack in the interim, you throw everything into disarray, which then requires cleanup. Also, i'm pretty sure my users would mostly hate needing to pass a file to boot a single instance. As an example: In my environment I allows users to request a custom disk layout for baremetal hosts, by passing a yaml file as metadata (yeah, yeah I know). The result? They hate that they have to pass a file. To them disk layout should be a first class object, similar to flavors. I've pushed back hard against this: It's not clean, disk profiles should be the exception to the norm, just keep the profile in a code repo. But the truth is i'm coming around to their way of thinking. I'm forced to choose between Architectural Purity[1] and what my customers actually need. In the end the people who actually use my product define it inasmuch as I do. At some point i'll probably give in and implement the thing they want, because from a broad perspective it makes sense to me, even though it doesn't align with the state of Nova right now. This is, unfortunately, one of the main problems stemming from OpenStack > not having a *single* public API, with projects implementing parts of that > single public API. You know, the thing I started arguing for about 6 years > ago. > > If we had one single public porcelain API, we wouldn't even need to have > this conversation. People wouldn't even know we'd changed implementation > details behind the scenes and were doing retries at a slightly higher level > than before. Oh well... we live and learn (maybe). > > I agree that a single entry point to OpenStack would be fantastic. If it existed, scheduling, quota, etc would have moved out of Nova a long time ago, and Nova at this point would be just a small VM driver. Unfortunately such a thing does not yet exist, and Nova has the momentum and mind share as -The- entry point for all things Compute in OpenStack. If the community aligns behind a new porcelain API, great! But until it's ready, deployers, operators, and users need to run their businesses. Removing functionality that impedes our ability to provide a stable IaaS experience isn't acceptable to us. If the expectation is that deployers will hack around this, then that's putting us in the position of struggling even more to keep up with, or move to a current version of OpenStack. Worse, that's anathema to cloud interop. Perhaps this is a place where the TC and Foundation should step in and foster the existence of a porcelain API. Either by constructing something new, or by growing Nova into that thing. -James [1] Insert choir of angels sound here ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] [nova][ironic][scheduler][placement] IMPORTANT: Getting rid of the automated reschedule functionality
On 05/23/2017 12:34 PM, Marc Heckmann wrote: On Tue, 2017-05-23 at 11:44 -0400, Jay Pipes wrote: On 05/23/2017 09:48 AM, Marc Heckmann wrote: For the anti-affinity use case, it's really useful for smaller or medium size operators who want to provide some form of failure domains to users but do not have the resources to create AZ's at DC or even at rack or row scale. Don't forget that as soon as you introduce AZs, you need to grow those AZs at the same rate and have the same flavor offerings across those AZs. For the retry thing, I think enough people have chimed in to echo the general sentiment. The purpose of my ML post was around getting rid of retries, not the usefulness of affinity groups. That seems to have been missed, however. Do you or David have any data on how often you've actually seen retries due to the last-minute affinity constraint violation in real world production? No I don't have any data unfortunately. Mostly because we haven't advertised the feature to end users yet. We only now are in a position to do so because, previously there was a bug causing nova-scheduler to grow in RAM usage if the required config flag to enable the feature was on. k. I have however seen retry's triggered on hypervisors for other reasons. I can try to dig up why specifically if that would be useful. I will add that we do not use Ironic at all. Yeah, any data you can get about real-world retry causes would be awesome. Note that all "resource over-consumption" causes of retries will be going away once we do claims in the scheduler. So, really, we're looking for data on the *other* causes of retries. Thanks much in advance! -jay ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] [nova][ironic][scheduler][placement] IMPORTANT: Getting rid of the automated reschedule functionality
On Tue, 2017-05-23 at 11:44 -0400, Jay Pipes wrote: > On 05/23/2017 09:48 AM, Marc Heckmann wrote: > > For the anti-affinity use case, it's really useful for smaller or > > medium > > size operators who want to provide some form of failure domains to > > users > > but do not have the resources to create AZ's at DC or even at rack > > or > > row scale. Don't forget that as soon as you introduce AZs, you need > > to > > grow those AZs at the same rate and have the same flavor offerings > > across those AZs. > > > > For the retry thing, I think enough people have chimed in to echo > > the > > general sentiment. > > The purpose of my ML post was around getting rid of retries, not the > usefulness of affinity groups. That seems to have been missed, > however. > > Do you or David have any data on how often you've actually seen > retries > due to the last-minute affinity constraint violation in real world > production? No I don't have any data unfortunately. Mostly because we haven't advertised the feature to end users yet. We only now are in a position to do so because, previously there was a bug causing nova-scheduler to grow in RAM usage if the required config flag to enable the feature was on. I have however seen retry's triggered on hypervisors for other reasons. I can try to dig up why specifically if that would be useful. I will add that we do not use Ironic at all. -m > > Thanks, > -jay > > ___ > OpenStack-operators mailing list > OpenStack-operators@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operato > rs ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] [nova][ironic][scheduler][placement] IMPORTANT: Getting rid of the automated reschedule functionality
On 05/22/2017 03:36 PM, Sean Dague wrote: On 05/22/2017 02:45 PM, James Penick wrote: I recognize that large Ironic users expressed their concerns about IPMI/BMC communication being unreliable and not wanting to have users manually retry a baremetal instance launch. But, on this particular point, I'm of the opinion that Nova just do one thing and do it well. Nova isn't an orchestrator, nor is it intending to be a "just continually try to get me to this eventual state" system like Kubernetes. Kubernetes is a larger orchestration platform that provides autoscale. I don't expect Nova to provide autoscale, but I agree that Nova should do one thing and do it really well, and in my mind that thing is reliable provisioning of compute resources. Kubernetes does autoscale among other things. I'm not asking for Nova to provide Autoscale, I -AM- asking OpenStack's compute platform to provision a discrete compute resource reliably. This means overcoming common and simple error cases. As a deployer of OpenStack I'm trying to build a cloud that wraps the chaos of infrastructure, and present a reliable facade. When my users issue a boot request, I want to see if fulfilled. I don't expect it to be a 100% guarantee across any possible failure, but I expect (and my users demand) that my "Infrastructure as a service" API make reasonable accommodation to overcome common failures. Right, I think hits my major queeziness with throwing the baby out with the bathwater here. I feel like Nova's job is to give me a compute when asked for computes. Yes, like malloc, things could fail. But honestly if Nova can recover from that scenario, it should try to. The baremetal and affinity cases are pretty good instances where Nova can catch and recover, and not just export that complexity up. It would make me sad to just export that complexity to users, and instead of handing those cases internally make every SDK, App, and simple script build their own retry loop. If Heat was more widely deployed, would you feel this way? Would you reconsider having Heat as one of those "basic compute services" in OpenStack, then? This is, unfortunately, one of the main problems stemming from OpenStack not having a *single* public API, with projects implementing parts of that single public API. You know, the thing I started arguing for about 6 years ago. If we had one single public porcelain API, we wouldn't even need to have this conversation. People wouldn't even know we'd changed implementation details behind the scenes and were doing retries at a slightly higher level than before. Oh well... we live and learn (maybe). Best, -jay ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] [nova][ironic][scheduler][placement] IMPORTANT: Getting rid of the automated reschedule functionality
On 05/23/2017 09:48 AM, Marc Heckmann wrote: For the anti-affinity use case, it's really useful for smaller or medium size operators who want to provide some form of failure domains to users but do not have the resources to create AZ's at DC or even at rack or row scale. Don't forget that as soon as you introduce AZs, you need to grow those AZs at the same rate and have the same flavor offerings across those AZs. For the retry thing, I think enough people have chimed in to echo the general sentiment. The purpose of my ML post was around getting rid of retries, not the usefulness of affinity groups. That seems to have been missed, however. Do you or David have any data on how often you've actually seen retries due to the last-minute affinity constraint violation in real world production? Thanks, -jay ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] [nova][ironic][scheduler][placement] IMPORTANT: Getting rid of the automated reschedule functionality
For the anti-affinity use case, it's really useful for smaller or medium size operators who want to provide some form of failure domains to users but do not have the resources to create AZ's at DC or even at rack or row scale. Don't forget that as soon as you introduce AZs, you need to grow those AZs at the same rate and have the same flavor offerings across those AZs. For the retry thing, I think enough people have chimed in to echo the general sentiment. -m On Mon, 2017-05-22 at 16:30 -0600, David Medberry wrote: I have to agree with James My affinity and anti-affinity rules have nothing to do with NFV. a-a is almost always a failure domain solution. I'm not sure we have users actually choosing affinity (though it would likely be for network speed issues and/or some sort of badly architected need or perceived need for coupling.) On Mon, May 22, 2017 at 12:45 PM, James Penick mailto:jpen...@gmail.com>> wrote: On Mon, May 22, 2017 at 10:54 AM, Jay Pipes mailto:jaypi...@gmail.com>> wrote: Hi Ops, Hi! For class b) causes, we should be able to solve this issue when the placement service understands affinity/anti-affinity (maybe Queens/Rocky). Until then, we propose that instead of raising a Reschedule when an affinity constraint was last-minute violated due to a racing scheduler decision, that we simply set the instance to an ERROR state. Personally, I have only ever seen anti-affinity/affinity use cases in relation to NFV deployments, and in every NFV deployment of OpenStack there is a VNFM or MANO solution that is responsible for the orchestration of instances belonging to various service function chains. I think it is reasonable to expect the MANO system to be responsible for attempting a re-launch of an instance that was set to ERROR due to a last-minute affinity violation. **Operators, do you agree with the above?** I do not. My affinity and anti-affinity use cases reflect the need to build large applications across failure domains in a datacenter. Anti-affinity: Most anti-affinity use cases relate to the ability to guarantee that instances are scheduled across failure domains, others relate to security compliance. Affinity: Hadoop/Big data deployments have affinity use cases, where nodes processing data need to be in the same rack as the nodes which house the data. This is a common setup for large hadoop deployers. I recognize that large Ironic users expressed their concerns about IPMI/BMC communication being unreliable and not wanting to have users manually retry a baremetal instance launch. But, on this particular point, I'm of the opinion that Nova just do one thing and do it well. Nova isn't an orchestrator, nor is it intending to be a "just continually try to get me to this eventual state" system like Kubernetes. Kubernetes is a larger orchestration platform that provides autoscale. I don't expect Nova to provide autoscale, but I agree that Nova should do one thing and do it really well, and in my mind that thing is reliable provisioning of compute resources. Kubernetes does autoscale among other things. I'm not asking for Nova to provide Autoscale, I -AM- asking OpenStack's compute platform to provision a discrete compute resource reliably. This means overcoming common and simple error cases. As a deployer of OpenStack I'm trying to build a cloud that wraps the chaos of infrastructure, and present a reliable facade. When my users issue a boot request, I want to see if fulfilled. I don't expect it to be a 100% guarantee across any possible failure, but I expect (and my users demand) that my "Infrastructure as a service" API make reasonable accommodation to overcome common failures. If we removed Reschedule for class c) failures entirely, large Ironic deployers would have to train users to manually retry a failed launch or would need to write a simple retry mechanism into whatever client/UI that they expose to their users. **Ironic operators, would the above decision force you to abandon Nova as the multi-tenant BMaaS facility?** I just glanced at one of my production clusters and found there are around 7K users defined, many of whom use OpenStack on a daily basis. When they issue a boot call, they expect that request to be honored. From their perspective, if they call AWS, they get what they ask for. If you remove reschedules you're not just breaking the expectation of a single deployer, but for my thousands of engineers who, every day, rely on OpenStack to manage their stack. I don't have a "i'll take my football and go home" mentality. But if you remove the ability for the compute provisioning API to present a reliable facade over infrastructure, I have to go write something else, or patch it back in. Now it's even harder for me to get and stay current with OpenStack. During the summit the agreement was, if I recall, that reschedules would happen within a cell, and not between the parent and cell. That was complet
Re: [Openstack-operators] [nova][ironic][scheduler][placement] IMPORTANT: Getting rid of the automated reschedule functionality
On 23 May 2017 at 05:33, Dan Smith wrote: > Sure, the diaper exception is rescheduled currently. That should > basically be things like misconfiguration type things. Rescheduling > papers over those issues, which I don't like, but in the room it surely > seemed like operators thought that they still needed to be handled. Operators don't want retries to mask configuration issues (appropriate errors should still be captured in places where operators can process them on a regular basis) but what they want even less is any further complexity or "soft" failures exposed to end-users. -- Cheers, ~Blairo ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] [nova][ironic][scheduler][placement] IMPORTANT: Getting rid of the automated reschedule functionality
I have to agree with James My affinity and anti-affinity rules have nothing to do with NFV. a-a is almost always a failure domain solution. I'm not sure we have users actually choosing affinity (though it would likely be for network speed issues and/or some sort of badly architected need or perceived need for coupling.) On Mon, May 22, 2017 at 12:45 PM, James Penick wrote: > > > On Mon, May 22, 2017 at 10:54 AM, Jay Pipes wrote: > >> Hi Ops, >> >> Hi! > > >> >> For class b) causes, we should be able to solve this issue when the >> placement service understands affinity/anti-affinity (maybe Queens/Rocky). >> Until then, we propose that instead of raising a Reschedule when an >> affinity constraint was last-minute violated due to a racing scheduler >> decision, that we simply set the instance to an ERROR state. >> >> Personally, I have only ever seen anti-affinity/affinity use cases in >> relation to NFV deployments, and in every NFV deployment of OpenStack there >> is a VNFM or MANO solution that is responsible for the orchestration of >> instances belonging to various service function chains. I think it is >> reasonable to expect the MANO system to be responsible for attempting a >> re-launch of an instance that was set to ERROR due to a last-minute >> affinity violation. >> > > >> **Operators, do you agree with the above?** >> > > I do not. My affinity and anti-affinity use cases reflect the need to > build large applications across failure domains in a datacenter. > > Anti-affinity: Most anti-affinity use cases relate to the ability to > guarantee that instances are scheduled across failure domains, others > relate to security compliance. > > Affinity: Hadoop/Big data deployments have affinity use cases, where nodes > processing data need to be in the same rack as the nodes which house the > data. This is a common setup for large hadoop deployers. > > >> I recognize that large Ironic users expressed their concerns about >> IPMI/BMC communication being unreliable and not wanting to have users >> manually retry a baremetal instance launch. But, on this particular point, >> I'm of the opinion that Nova just do one thing and do it well. Nova isn't >> an orchestrator, nor is it intending to be a "just continually try to get >> me to this eventual state" system like Kubernetes. >> > > Kubernetes is a larger orchestration platform that provides autoscale. I > don't expect Nova to provide autoscale, but > > I agree that Nova should do one thing and do it really well, and in my > mind that thing is reliable provisioning of compute resources. Kubernetes > does autoscale among other things. I'm not asking for Nova to provide > Autoscale, I -AM- asking OpenStack's compute platform to provision a > discrete compute resource reliably. This means overcoming common and simple > error cases. As a deployer of OpenStack I'm trying to build a cloud that > wraps the chaos of infrastructure, and present a reliable facade. When my > users issue a boot request, I want to see if fulfilled. I don't expect it > to be a 100% guarantee across any possible failure, but I expect (and my > users demand) that my "Infrastructure as a service" API make reasonable > accommodation to overcome common failures. > > > >> If we removed Reschedule for class c) failures entirely, large Ironic >> deployers would have to train users to manually retry a failed launch or >> would need to write a simple retry mechanism into whatever client/UI that >> they expose to their users. >> >> **Ironic operators, would the above decision force you to abandon Nova as >> the multi-tenant BMaaS facility?** >> >> > I just glanced at one of my production clusters and found there are > around 7K users defined, many of whom use OpenStack on a daily basis. When > they issue a boot call, they expect that request to be honored. From their > perspective, if they call AWS, they get what they ask for. If you remove > reschedules you're not just breaking the expectation of a single deployer, > but for my thousands of engineers who, every day, rely on OpenStack to > manage their stack. > > I don't have a "i'll take my football and go home" mentality. But if you > remove the ability for the compute provisioning API to present a reliable > facade over infrastructure, I have to go write something else, or patch it > back in. Now it's even harder for me to get and stay current with OpenStack. > > During the summit the agreement was, if I recall, that reschedules would > happen within a cell, and not between the parent and cell. That was > completely acceptable to me. > > -James > > > ___ > OpenStack-operators mailing list > OpenStack-operators@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > > ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operato
Re: [Openstack-operators] [nova][ironic][scheduler][placement] IMPORTANT: Getting rid of the automated reschedule functionality
> Whoah, but that's after 10 tries (by default). And if e.g. it bounced > because the instance is too big for the host, but other, smaller > instances come in and succeed in the meantime, that could wind up being > stretched indefinitely. Doesn't sound like a complete answer to this issue. No dude, remember, this is all assuming that claiming with placement eliminates 100% of the resource races :) The _only_ things left to reschedule for are (a) straight up 100% fail compute host misconfigurations and (b) anything that fails some percentage of the time and will actually be resolved by trying a different host (i.e. baseline 40% ironic ipmi failbots). > Today you can limit the set of compute hosts to try by specifying an > "availability zone". Perhaps the answer here is to support some kind of > "exclude these hosts" list to a "fresh" deploy. > > But is the cure worse than the disease? I (and I think others) would argue that the user needing to know that they should try a different AZ is not reasonable UX. A rebuild of an instance that failed to boot can/should exclude the original host on the rebuild attempt. It does today with reschedules so it's not that hard, just requires some plumbing. --Dan ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] [nova][ironic][scheduler][placement] IMPORTANT: Getting rid of the automated reschedule functionality
Dan, et al- > Well, (a) today you can't really externally retry a single instance > build without just creating a new one. The new one could suffer the same > fate, but that's why we just did the auto-disable feature for nova-compute. Whoah, but that's after 10 tries (by default). And if e.g. it bounced because the instance is too big for the host, but other, smaller instances come in and succeed in the meantime, that could wind up being stretched indefinitely. Doesn't sound like a complete answer to this issue. > Thing (b) is that if we fix rebuild so it works on a failed > shell-of-an-instance from a boot operation, we could easily exclude the > host it failed on, but it'd require some additional logic. Right, so I think the need for that "additional logic" was my point. Today you can limit the set of compute hosts to try by specifying an "availability zone". Perhaps the answer here is to support some kind of "exclude these hosts" list to a "fresh" deploy. But is the cure worse than the disease? -efried . ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] [nova][ironic][scheduler][placement] IMPORTANT: Getting rid of the automated reschedule functionality
> In a no-reschedules-by-nova world, if a deploy fails on host 1, how does > the orchestrator (whatever that may be) ask nova to deploy in such a way > that it'll still try to find a good host, but *avoid* host 1? If host 1 > was an attractive candidate the first time around, wouldn't it be likely > to remain high on the list the second time? Well, (a) today you can't really externally retry a single instance build without just creating a new one. The new one could suffer the same fate, but that's why we just did the auto-disable feature for nova-compute. Thing (b) is that if we fix rebuild so it works on a failed shell-of-an-instance from a boot operation, we could easily exclude the host it failed on, but it'd require some additional logic. --Dan ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] [nova][ironic][scheduler][placement] IMPORTANT: Getting rid of the automated reschedule functionality
In the case of baremetal in our environment, when a boot attempt fails we mark that node as being in maintenance mode, which prevents Nova from scheduling to it a second time. Then automation comes along and files repair tickets for the bad hardware. Only when a human or other automation fixes the node and removes the "maintenance" state, will it be available for scheduling again. On Mon, May 22, 2017 at 1:25 PM, Eric Fried wrote: > Hey folks, sorry if this is a jejune question, but: > > In a no-reschedules-by-nova world, if a deploy fails on host 1, how does > the orchestrator (whatever that may be) ask nova to deploy in such a way > that it'll still try to find a good host, but *avoid* host 1? If host 1 > was an attractive candidate the first time around, wouldn't it be likely > to remain high on the list the second time? > > I'd also like to second the thought that the monolithic "instance in > error state" gives the orchestrator no hint as to whether the deploy > failed because of something the orchestrator did (remedy may be to > redrive with different inputs, but no need to exclude the original > target host) versus because something went wrong on the compute host > (remedy would be to retry on a different host with the same inputs). > Kind of analogous to the difference between HTTP 4xx and 5xx error > classes. (Perhaps implying a design whereby the nova API responds to > the deploy request with different error codes accordingly.) > > Thanks, > efried > . > > ___ > OpenStack-operators mailing list > OpenStack-operators@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] [nova][ironic][scheduler][placement] IMPORTANT: Getting rid of the automated reschedule functionality
Hey folks, sorry if this is a jejune question, but: In a no-reschedules-by-nova world, if a deploy fails on host 1, how does the orchestrator (whatever that may be) ask nova to deploy in such a way that it'll still try to find a good host, but *avoid* host 1? If host 1 was an attractive candidate the first time around, wouldn't it be likely to remain high on the list the second time? I'd also like to second the thought that the monolithic "instance in error state" gives the orchestrator no hint as to whether the deploy failed because of something the orchestrator did (remedy may be to redrive with different inputs, but no need to exclude the original target host) versus because something went wrong on the compute host (remedy would be to retry on a different host with the same inputs). Kind of analogous to the difference between HTTP 4xx and 5xx error classes. (Perhaps implying a design whereby the nova API responds to the deploy request with different error codes accordingly.) Thanks, efried . ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] [nova][ironic][scheduler][placement] IMPORTANT: Getting rid of the automated reschedule functionality
That depends.. I differentiate between a compute worker running on a hypervisor, and one running as a service in the control plane (like the compute worker in an Ironic cluster). A compute worker that is running on a hypervisor has highly restricted network access. But if the compute worker is a service in the control plane, such as it is with my Ironic installations, that's totally ok. It really comes down to the fact that I don't want any real or logical network access between an instance and the heart of the control plane. I'll allow a child cell control plane to call a parent cell, just not a hypervisor within the child cell. On Mon, May 22, 2017 at 12:42 PM, Sean Dague wrote: > On 05/22/2017 02:45 PM, James Penick wrote: > > > During the summit the agreement was, if I recall, that reschedules would > > happen within a cell, and not between the parent and cell. That was > > completely acceptable to me. > > Follow on question (just because the right folks are in this thread, and > it could impact paths forward). I know that some of the inability to > have upcalls in the system is based around firewalling that both Yahoo > and RAX did blocking the compute workers from communicating out. > > If the compute worker or cell conductor wanted to make an HTTP call back > to nova-api (through the public interface), with the user context, is > that a network path that would or could be accessible in your case? > > -Sean > > -- > Sean Dague > http://dague.net > > ___ > OpenStack-operators mailing list > OpenStack-operators@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] [nova][ironic][scheduler][placement] IMPORTANT: Getting rid of the automated reschedule functionality
On 05/22/2017 03:53 PM, Jonathan Proulx wrote: To be clear on my view of the whole proposal most of my Rescheduling that I've seen and want are of type "A" where claim exceeds resources. At least I think they are type "A" and not "C" unknown. The exact case is that I over subsribe RAM (1.5x) my users typically over claim so this is OK (my worst case is a hypervisor using only 10% of claimed RAM). But there are some hotspots where propertional utilization is high so libvirt won't start more VMs becasue it really doesn't have the memory. If that's solved (or will be at the time reschedule goes away), teh cases I've actually experienced would be solved. The anit-affinity use cases are currently most important to be of the affinity scheduling and I haven't (to my knowlege) seen collisions in that direction. So I could live with that race becuase for me it is uncommon (though I imagine for others where positive affinity is important teh race may get lost mroe frequently) Thanks for the feedback, Jon. For the record, affinity really doesn't have much of a race condition at all. It's really only anti-affinity that has much of a chance of last-minute violation. Best, -jay On Mon, May 22, 2017 at 03:00:09PM -0400, Jonathan Proulx wrote: :On Mon, May 22, 2017 at 11:45:33AM -0700, James Penick wrote: ::On Mon, May 22, 2017 at 10:54 AM, Jay Pipes wrote: :: ::> Hi Ops, ::> ::> Hi! :: :: ::> ::> For class b) causes, we should be able to solve this issue when the ::> placement service understands affinity/anti-affinity (maybe Queens/Rocky). ::> Until then, we propose that instead of raising a Reschedule when an ::> affinity constraint was last-minute violated due to a racing scheduler ::> decision, that we simply set the instance to an ERROR state. ::> ::> Personally, I have only ever seen anti-affinity/affinity use cases in ::> relation to NFV deployments, and in every NFV deployment of OpenStack there ::> is a VNFM or MANO solution that is responsible for the orchestration of ::> instances belonging to various service function chains. I think it is ::> reasonable to expect the MANO system to be responsible for attempting a ::> re-launch of an instance that was set to ERROR due to a last-minute ::> affinity violation. ::> :: :: ::> **Operators, do you agree with the above?** ::> :: ::I do not. My affinity and anti-affinity use cases reflect the need to build ::large applications across failure domains in a datacenter. :: ::Anti-affinity: Most anti-affinity use cases relate to the ability to ::guarantee that instances are scheduled across failure domains, others ::relate to security compliance. :: ::Affinity: Hadoop/Big data deployments have affinity use cases, where nodes ::processing data need to be in the same rack as the nodes which house the ::data. This is a common setup for large hadoop deployers. : :James describes my use case as well. : :I would also rather see a reschedule, if we're having a really bad day :and reach max retries then see ERR : :-Jon ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] [nova][ironic][scheduler][placement] IMPORTANT: Getting rid of the automated reschedule functionality
To be clear on my view of the whole proposal most of my Rescheduling that I've seen and want are of type "A" where claim exceeds resources. At least I think they are type "A" and not "C" unknown. The exact case is that I over subsribe RAM (1.5x) my users typically over claim so this is OK (my worst case is a hypervisor using only 10% of claimed RAM). But there are some hotspots where propertional utilization is high so libvirt won't start more VMs becasue it really doesn't have the memory. If that's solved (or will be at the time reschedule goes away), teh cases I've actually experienced would be solved. The anit-affinity use cases are currently most important to be of the affinity scheduling and I haven't (to my knowlege) seen collisions in that direction. So I could live with that race becuase for me it is uncommon (though I imagine for others where positive affinity is important teh race may get lost mroe frequently) -Jon On Mon, May 22, 2017 at 03:00:09PM -0400, Jonathan Proulx wrote: :On Mon, May 22, 2017 at 11:45:33AM -0700, James Penick wrote: ::On Mon, May 22, 2017 at 10:54 AM, Jay Pipes wrote: :: ::> Hi Ops, ::> ::> Hi! :: :: ::> ::> For class b) causes, we should be able to solve this issue when the ::> placement service understands affinity/anti-affinity (maybe Queens/Rocky). ::> Until then, we propose that instead of raising a Reschedule when an ::> affinity constraint was last-minute violated due to a racing scheduler ::> decision, that we simply set the instance to an ERROR state. ::> ::> Personally, I have only ever seen anti-affinity/affinity use cases in ::> relation to NFV deployments, and in every NFV deployment of OpenStack there ::> is a VNFM or MANO solution that is responsible for the orchestration of ::> instances belonging to various service function chains. I think it is ::> reasonable to expect the MANO system to be responsible for attempting a ::> re-launch of an instance that was set to ERROR due to a last-minute ::> affinity violation. ::> :: :: ::> **Operators, do you agree with the above?** ::> :: ::I do not. My affinity and anti-affinity use cases reflect the need to build ::large applications across failure domains in a datacenter. :: ::Anti-affinity: Most anti-affinity use cases relate to the ability to ::guarantee that instances are scheduled across failure domains, others ::relate to security compliance. :: ::Affinity: Hadoop/Big data deployments have affinity use cases, where nodes ::processing data need to be in the same rack as the nodes which house the ::data. This is a common setup for large hadoop deployers. : :James describes my use case as well. : :I would also rather see a reschedule, if we're having a really bad day :and reach max retries then see ERR : :-Jon -- ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] [nova][ironic][scheduler][placement] IMPORTANT: Getting rid of the automated reschedule functionality
On 05/22/2017 02:45 PM, James Penick wrote: > During the summit the agreement was, if I recall, that reschedules would > happen within a cell, and not between the parent and cell. That was > completely acceptable to me. Follow on question (just because the right folks are in this thread, and it could impact paths forward). I know that some of the inability to have upcalls in the system is based around firewalling that both Yahoo and RAX did blocking the compute workers from communicating out. If the compute worker or cell conductor wanted to make an HTTP call back to nova-api (through the public interface), with the user context, is that a network path that would or could be accessible in your case? -Sean -- Sean Dague http://dague.net ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] [nova][ironic][scheduler][placement] IMPORTANT: Getting rid of the automated reschedule functionality
On 05/22/2017 02:45 PM, James Penick wrote: > > > I recognize that large Ironic users expressed their concerns about > IPMI/BMC communication being unreliable and not wanting to have > users manually retry a baremetal instance launch. But, on this > particular point, I'm of the opinion that Nova just do one thing and > do it well. Nova isn't an orchestrator, nor is it intending to be a > "just continually try to get me to this eventual state" system like > Kubernetes. > > > Kubernetes is a larger orchestration platform that provides autoscale. I > don't expect Nova to provide autoscale, but > > I agree that Nova should do one thing and do it really well, and in my > mind that thing is reliable provisioning of compute resources. > Kubernetes does autoscale among other things. I'm not asking for Nova to > provide Autoscale, I -AM- asking OpenStack's compute platform to > provision a discrete compute resource reliably. This means overcoming > common and simple error cases. As a deployer of OpenStack I'm trying to > build a cloud that wraps the chaos of infrastructure, and present a > reliable facade. When my users issue a boot request, I want to see if > fulfilled. I don't expect it to be a 100% guarantee across any possible > failure, but I expect (and my users demand) that my "Infrastructure as a > service" API make reasonable accommodation to overcome common failures. Right, I think hits my major queeziness with throwing the baby out with the bathwater here. I feel like Nova's job is to give me a compute when asked for computes. Yes, like malloc, things could fail. But honestly if Nova can recover from that scenario, it should try to. The baremetal and affinity cases are pretty good instances where Nova can catch and recover, and not just export that complexity up. It would make me sad to just export that complexity to users, and instead of handing those cases internally make every SDK, App, and simple script build their own retry loop. -Sean -- Sean Dague http://dague.net ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] [nova][ironic][scheduler][placement] IMPORTANT: Getting rid of the automated reschedule functionality
> To be clear, they are able to communicate, and do, as long as you > configure them to be able to do so. The long-term goal is that you don't > have to configure them to be able to do so, so we're trying to design > and work in that mode toward that goal. No, the cell conductor doesn't have a way to communicate with the scheduler. It's more than just a "it's not configured to" thing. If you have multiple cells, then your conductors within a cell point to the cell MQ as the default transport for all kinds of stuff. If they call to compute to do a thing, they don't (can't, since it doesn't have the ability to lookup the cell mapping) target, they just ask on their default bus. So, unless scheduler and compute are on the same bus, conductor *can't* talk to both at the same time (for non-super conductor operations like build that expect to target, but then they can't do the non-targeted operations). If you do that, then you're not doing cellsv2. >> [1] This really does not occur with any frequency for hypervisor virt >> drivers, since the exceptions those hypervisors throw are caught by >> the nova-compute worker and handled without raising a Reschedule. > > Are you sure about that? > > https://github.com/openstack/nova/blob/931c3f48188e57e71aa6518d5253e1a5bd9a27c0/nova/compute/manager.py#L2041-L2049 Sure, the diaper exception is rescheduled currently. That should basically be things like misconfiguration type things. Rescheduling papers over those issues, which I don't like, but in the room it surely seemed like operators thought that they still needed to be handled. --Dan ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] [nova][ironic][scheduler][placement] IMPORTANT: Getting rid of the automated reschedule functionality
On Mon, May 22, 2017 at 11:45:33AM -0700, James Penick wrote: :On Mon, May 22, 2017 at 10:54 AM, Jay Pipes wrote: : :> Hi Ops, :> :> Hi! : : :> :> For class b) causes, we should be able to solve this issue when the :> placement service understands affinity/anti-affinity (maybe Queens/Rocky). :> Until then, we propose that instead of raising a Reschedule when an :> affinity constraint was last-minute violated due to a racing scheduler :> decision, that we simply set the instance to an ERROR state. :> :> Personally, I have only ever seen anti-affinity/affinity use cases in :> relation to NFV deployments, and in every NFV deployment of OpenStack there :> is a VNFM or MANO solution that is responsible for the orchestration of :> instances belonging to various service function chains. I think it is :> reasonable to expect the MANO system to be responsible for attempting a :> re-launch of an instance that was set to ERROR due to a last-minute :> affinity violation. :> : : :> **Operators, do you agree with the above?** :> : :I do not. My affinity and anti-affinity use cases reflect the need to build :large applications across failure domains in a datacenter. : :Anti-affinity: Most anti-affinity use cases relate to the ability to :guarantee that instances are scheduled across failure domains, others :relate to security compliance. : :Affinity: Hadoop/Big data deployments have affinity use cases, where nodes :processing data need to be in the same rack as the nodes which house the :data. This is a common setup for large hadoop deployers. James describes my use case as well. I would also rather see a reschedule, if we're having a really bad day and reach max retries then see ERR -Jon ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] [nova][ironic][scheduler][placement] IMPORTANT: Getting rid of the automated reschedule functionality
On 5/22/2017 12:54 PM, Jay Pipes wrote: Hi Ops, I need your feedback on a very important direction we would like to pursue. I realize that there were Forum sessions about this topic at the summit in Boston and that there were some decisions that were reached. I'd like to revisit that decision and explain why I'd like your support for getting rid of the automatic reschedule behaviour entirely in Nova for Pike. == The current situation and why it sucks == Nova currently attempts to "reschedule" instances when any of the following events occur: a) the "claim resources" process that occurs on the nova-compute worker results in the chosen compute node exceeding its own capacity b) in between the time a compute node was chosen by the scheduler, another process launched an instance that would violate an affinity constraint c) an "unknown" exception occurs during the spawn process. In practice, this really only is seen when the Ironic baremetal node that was chosen by the scheduler turns out to be unreliable (IPMI issues, BMC failures, etc) and wasn't able to launch the instance. [1] The logic for handling these reschedules makes the Nova conductor, scheduler and compute worker code very complex. With the new cellsv2 architecture in Nova, child cells are not able to communicate with the Nova scheduler (and thus "ask for a reschedule"). To be clear, they are able to communicate, and do, as long as you configure them to be able to do so. The long-term goal is that you don't have to configure them to be able to do so, so we're trying to design and work in that mode toward that goal. We (the Nova team) would like to get rid of the automated rescheduling behaviour that Nova currently exposes because we could eliminate a large amount of complexity (which leads to bugs) from the already-complicated dance of communication that occurs between internal Nova components. == What we would like to do == With the move of the resource claim to the Nova scheduler [2], we can entirely eliminate the a) class of Reschedule causes. This leaves class b) and c) causes of Rescheduling. For class b) causes, we should be able to solve this issue when the placement service understands affinity/anti-affinity (maybe Queens/Rocky). Until then, we propose that instead of raising a Reschedule when an affinity constraint was last-minute violated due to a racing scheduler decision, that we simply set the instance to an ERROR state. Personally, I have only ever seen anti-affinity/affinity use cases in relation to NFV deployments, and in every NFV deployment of OpenStack there is a VNFM or MANO solution that is responsible for the orchestration of instances belonging to various service function chains. I think it is reasonable to expect the MANO system to be responsible for attempting a re-launch of an instance that was set to ERROR due to a last-minute affinity violation. **Operators, do you agree with the above?** Finally, for class c) Reschedule causes, I do not believe that we should be attempting automated rescheduling when "unknown" errors occur. I just don't believe this is something Nova should be doing. I recognize that large Ironic users expressed their concerns about IPMI/BMC communication being unreliable and not wanting to have users manually retry a baremetal instance launch. But, on this particular point, I'm of the opinion that Nova just do one thing and do it well. Nova isn't an orchestrator, nor is it intending to be a "just continually try to get me to this eventual state" system like Kubernetes. If we removed Reschedule for class c) failures entirely, large Ironic deployers would have to train users to manually retry a failed launch or would need to write a simple retry mechanism into whatever client/UI that they expose to their users. **Ironic operators, would the above decision force you to abandon Nova as the multi-tenant BMaaS facility?** Thanks in advance for your consideration and feedback. Best, -jay [1] This really does not occur with any frequency for hypervisor virt drivers, since the exceptions those hypervisors throw are caught by the nova-compute worker and handled without raising a Reschedule. Are you sure about that? https://github.com/openstack/nova/blob/931c3f48188e57e71aa6518d5253e1a5bd9a27c0/nova/compute/manager.py#L2041-L2049 The compute manager handles anything non-specific that leaks up from the virt driver.spawn() method and reschedules it. Think ProcessExecutionError when vif plugging fails in the libvirt driver because the command blew up for some reason (sudo on the host is wrong?). I'm not saying it should, as I'm guessing most of these types of failures are due to misconfiguration, but it is how things currently work today. [2] http://specs.openstack.org/openstack/nova-specs/specs/pike/approved/placement-claims.html ___ OpenStack-operators mailing list O
Re: [Openstack-operators] [nova][ironic][scheduler][placement] IMPORTANT: Getting rid of the automated reschedule functionality
On Mon, May 22, 2017 at 10:54 AM, Jay Pipes wrote: > Hi Ops, > > Hi! > > For class b) causes, we should be able to solve this issue when the > placement service understands affinity/anti-affinity (maybe Queens/Rocky). > Until then, we propose that instead of raising a Reschedule when an > affinity constraint was last-minute violated due to a racing scheduler > decision, that we simply set the instance to an ERROR state. > > Personally, I have only ever seen anti-affinity/affinity use cases in > relation to NFV deployments, and in every NFV deployment of OpenStack there > is a VNFM or MANO solution that is responsible for the orchestration of > instances belonging to various service function chains. I think it is > reasonable to expect the MANO system to be responsible for attempting a > re-launch of an instance that was set to ERROR due to a last-minute > affinity violation. > > **Operators, do you agree with the above?** > I do not. My affinity and anti-affinity use cases reflect the need to build large applications across failure domains in a datacenter. Anti-affinity: Most anti-affinity use cases relate to the ability to guarantee that instances are scheduled across failure domains, others relate to security compliance. Affinity: Hadoop/Big data deployments have affinity use cases, where nodes processing data need to be in the same rack as the nodes which house the data. This is a common setup for large hadoop deployers. > I recognize that large Ironic users expressed their concerns about > IPMI/BMC communication being unreliable and not wanting to have users > manually retry a baremetal instance launch. But, on this particular point, > I'm of the opinion that Nova just do one thing and do it well. Nova isn't > an orchestrator, nor is it intending to be a "just continually try to get > me to this eventual state" system like Kubernetes. > Kubernetes is a larger orchestration platform that provides autoscale. I don't expect Nova to provide autoscale, but I agree that Nova should do one thing and do it really well, and in my mind that thing is reliable provisioning of compute resources. Kubernetes does autoscale among other things. I'm not asking for Nova to provide Autoscale, I -AM- asking OpenStack's compute platform to provision a discrete compute resource reliably. This means overcoming common and simple error cases. As a deployer of OpenStack I'm trying to build a cloud that wraps the chaos of infrastructure, and present a reliable facade. When my users issue a boot request, I want to see if fulfilled. I don't expect it to be a 100% guarantee across any possible failure, but I expect (and my users demand) that my "Infrastructure as a service" API make reasonable accommodation to overcome common failures. > If we removed Reschedule for class c) failures entirely, large Ironic > deployers would have to train users to manually retry a failed launch or > would need to write a simple retry mechanism into whatever client/UI that > they expose to their users. > > **Ironic operators, would the above decision force you to abandon Nova as > the multi-tenant BMaaS facility?** > > I just glanced at one of my production clusters and found there are around 7K users defined, many of whom use OpenStack on a daily basis. When they issue a boot call, they expect that request to be honored. From their perspective, if they call AWS, they get what they ask for. If you remove reschedules you're not just breaking the expectation of a single deployer, but for my thousands of engineers who, every day, rely on OpenStack to manage their stack. I don't have a "i'll take my football and go home" mentality. But if you remove the ability for the compute provisioning API to present a reliable facade over infrastructure, I have to go write something else, or patch it back in. Now it's even harder for me to get and stay current with OpenStack. During the summit the agreement was, if I recall, that reschedules would happen within a cell, and not between the parent and cell. That was completely acceptable to me. -James ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
[Openstack-operators] [nova][ironic][scheduler][placement] IMPORTANT: Getting rid of the automated reschedule functionality
Hi Ops, I need your feedback on a very important direction we would like to pursue. I realize that there were Forum sessions about this topic at the summit in Boston and that there were some decisions that were reached. I'd like to revisit that decision and explain why I'd like your support for getting rid of the automatic reschedule behaviour entirely in Nova for Pike. == The current situation and why it sucks == Nova currently attempts to "reschedule" instances when any of the following events occur: a) the "claim resources" process that occurs on the nova-compute worker results in the chosen compute node exceeding its own capacity b) in between the time a compute node was chosen by the scheduler, another process launched an instance that would violate an affinity constraint c) an "unknown" exception occurs during the spawn process. In practice, this really only is seen when the Ironic baremetal node that was chosen by the scheduler turns out to be unreliable (IPMI issues, BMC failures, etc) and wasn't able to launch the instance. [1] The logic for handling these reschedules makes the Nova conductor, scheduler and compute worker code very complex. With the new cellsv2 architecture in Nova, child cells are not able to communicate with the Nova scheduler (and thus "ask for a reschedule"). We (the Nova team) would like to get rid of the automated rescheduling behaviour that Nova currently exposes because we could eliminate a large amount of complexity (which leads to bugs) from the already-complicated dance of communication that occurs between internal Nova components. == What we would like to do == With the move of the resource claim to the Nova scheduler [2], we can entirely eliminate the a) class of Reschedule causes. This leaves class b) and c) causes of Rescheduling. For class b) causes, we should be able to solve this issue when the placement service understands affinity/anti-affinity (maybe Queens/Rocky). Until then, we propose that instead of raising a Reschedule when an affinity constraint was last-minute violated due to a racing scheduler decision, that we simply set the instance to an ERROR state. Personally, I have only ever seen anti-affinity/affinity use cases in relation to NFV deployments, and in every NFV deployment of OpenStack there is a VNFM or MANO solution that is responsible for the orchestration of instances belonging to various service function chains. I think it is reasonable to expect the MANO system to be responsible for attempting a re-launch of an instance that was set to ERROR due to a last-minute affinity violation. **Operators, do you agree with the above?** Finally, for class c) Reschedule causes, I do not believe that we should be attempting automated rescheduling when "unknown" errors occur. I just don't believe this is something Nova should be doing. I recognize that large Ironic users expressed their concerns about IPMI/BMC communication being unreliable and not wanting to have users manually retry a baremetal instance launch. But, on this particular point, I'm of the opinion that Nova just do one thing and do it well. Nova isn't an orchestrator, nor is it intending to be a "just continually try to get me to this eventual state" system like Kubernetes. If we removed Reschedule for class c) failures entirely, large Ironic deployers would have to train users to manually retry a failed launch or would need to write a simple retry mechanism into whatever client/UI that they expose to their users. **Ironic operators, would the above decision force you to abandon Nova as the multi-tenant BMaaS facility?** Thanks in advance for your consideration and feedback. Best, -jay [1] This really does not occur with any frequency for hypervisor virt drivers, since the exceptions those hypervisors throw are caught by the nova-compute worker and handled without raising a Reschedule. [2] http://specs.openstack.org/openstack/nova-specs/specs/pike/approved/placement-claims.html ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators