Re: [Openstack-operators] [nova][ironic][scheduler][placement] IMPORTANT: Getting rid of the automated reschedule functionality

2017-05-24 Thread Chris Friesen

On 05/22/2017 01:55 PM, Jay Pipes wrote:

On 05/22/2017 03:53 PM, Jonathan Proulx wrote:

To be clear on my view of the whole proposal

most of my Rescheduling that I've seen and want are of type "A" where
claim exceeds resources.  At least I think they are type "A" and not
"C" unknown.

The exact case is that I over subsribe RAM (1.5x) my users typically over
claim so this is OK (my worst case is a hypervisor using only 10% of
claimed RAM).  But there are some hotspots where propertional
utilization is high so libvirt won't start more VMs becasue it really
doesn't have the memory.

If that's solved (or will be at the time reschedule goes away), teh
cases I've actually experienced would be solved.

The anit-affinity use cases are currently most important to be of the
affinity scheduling and I haven't (to my knowlege) seen collisions in
that direction.  So I could live with that race becuase for me it is
uncommon (though I imagine for others where positive affinity is
important teh race may get lost mroe frequently)


Thanks for the feedback, Jon.

For the record, affinity really doesn't have much of a race condition at all.
It's really only anti-affinity that has much of a chance of last-minute 
violation.


Don't they have the same race on instance boot?  Two instances being started in 
the same (initially empty) affinity group could be scheduled in parallel and end 
up on different compute nodes.


Chris


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova][ironic][scheduler][placement] IMPORTANT: Getting rid of the automated reschedule functionality

2017-05-24 Thread Edward Leafe
On May 23, 2017, at 3:15 PM, melanie witt  wrote:
> 
> Removing the small VM driver from Nova would allow people to keep using what 
> they know (Nova API) but would keep a lot of cruft with it. So I would tend 
> to favor a new porcelain API.


This.

-- Ed Leafe





___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova][ironic][scheduler][placement] IMPORTANT: Getting rid of the automated reschedule functionality

2017-05-23 Thread Jay Pipes

Thanks for the feedback, Curtis, appreciated!

On 05/23/2017 04:09 PM, Curtis wrote:

On Tue, May 23, 2017 at 1:20 PM, Edward Leafe  wrote:

On May 23, 2017, at 1:27 PM, James Penick  wrote:


  Perhaps this is a place where the TC and Foundation should step in and foster 
the existence of a porcelain API. Either by constructing something new, or by 
growing Nova into that thing.



Oh please, not Nova. The last word that comes to mind when thinking about Nova 
code is “porcelain”.



I keep seeing the word "porcelain", but I'm not sure what it means in
this context. Could someone help me out here and explain what that is?
:)


Here's where the term porcelain comes from:

https://git-scm.com/book/en/v2/Git-Internals-Plumbing-and-Porcelain

Best,
-jay

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova][ironic][scheduler][placement] IMPORTANT: Getting rid of the automated reschedule functionality

2017-05-23 Thread melanie witt

On Tue, 23 May 2017 20:58:20 +0100 (IST), Chris Dent wrote:


If we're talking big crazy changes: Why not take the "small VM
driver" (presumably nova-compute) out of Nova? What stays behind is
_already_ orchestration but missing some features and having a fair
few bugs.


I've suggested this a couple of times on the dev ML in replies to other 
threads in the past. We could either build a new porcelain API fresh and 
then whittle Nova down into a small VM driver or we could take the small 
VM driver out of Nova and mold Nova into the porcelain API.


New porcelain API would be a fresh start to "do it right" and would 
involve having people switch over to it. I think there would be 
sufficient motivation for operators to take on the effort of deploying 
it, considering there would be a lot of features their end users would 
want to get.


Removing the small VM driver from Nova would allow people to keep using 
what they know (Nova API) but would keep a lot of cruft with it. So I 
would tend to favor a new porcelain API.


We really need one, like yesterday IMHO.

-melanie

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova][ironic][scheduler][placement] IMPORTANT: Getting rid of the automated reschedule functionality

2017-05-23 Thread Curtis
On Tue, May 23, 2017 at 1:20 PM, Edward Leafe  wrote:
> On May 23, 2017, at 1:27 PM, James Penick  wrote:
>
>>  Perhaps this is a place where the TC and Foundation should step in and 
>> foster the existence of a porcelain API. Either by constructing something 
>> new, or by growing Nova into that thing.
>
>
> Oh please, not Nova. The last word that comes to mind when thinking about 
> Nova code is “porcelain”.
>

I keep seeing the word "porcelain", but I'm not sure what it means in
this context. Could someone help me out here and explain what that is?
:)

For my $0.02 as an operator, most of the time I see retries they are
all failures, but I haven't run as big of clouds as a lot of people on
this list. I have certainly seen IPMI fail intermittently (I have a
script that logs in to a bunch of service processors and restarts
them) and would very much like to use Ironic to manage large pools of
baremetal nodes, so I could see that being an issue.

As a user of cloud resources though I always use some kind of
automation tooling with some form of looping for retries, but that
it's not always easy to get customers/users to use that kind of
tooling. For NFV workloads/clouds there almost always be some kind of
higher level abstraction (eg. as mentioned MANO) managing the
resources and it can retry (thought not all of them actually have that
functionality...yet).

So, as an operator and a user, I would personally be Ok with Nova
retries if it significantly adds to the complexity of Nova. I
certainly would not abandon Ironic if Nova didn't retry. I do wonder
what custom code might be required in say a public cloud providing
Ironic nodes though.

Thanks,
Curtis.

>
> -- Ed Leafe
>
>
>
>
>
>
> ___
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova][ironic][scheduler][placement] IMPORTANT: Getting rid of the automated reschedule functionality

2017-05-23 Thread Chris Dent

On Tue, 23 May 2017, James Penick wrote:


I agree that a single entry point to OpenStack would be fantastic. If it
existed, scheduling, quota, etc would have moved out of Nova a long time
ago, and Nova at this point would be just a small VM driver. Unfortunately
such a thing does not yet exist, and Nova has the momentum and mind share
as -The- entry point for all things Compute in OpenStack.


[snip some reality]


Perhaps this is a place where the TC and Foundation should step in and
foster the existence of a porcelain API. Either by constructing something
new, or by growing Nova into that thing.


If we're talking big crazy changes: Why not take the "small VM
driver" (presumably nova-compute) out of Nova? What stays behind is
_already_ orchestration but missing some features and having a fair
few bugs.

Way back in April[1] ttx asserted:

One insight which I think we could take from this is that when a
smaller group of people "owns" a set of files, we raise quality
(compared to everyone owning everything). So the more we can
split the code along areas of expertise and smaller review
teams, the better. But I think that is also something we
intuitively knew.

[1] http://lists.openstack.org/pipermail/openstack-dev/2017-April/115061.html

--
Chris Dent  ┬──┬◡ノ(° -°ノ)   https://anticdent.org/
freenode: cdent tw: @anticdent___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova][ironic][scheduler][placement] IMPORTANT: Getting rid of the automated reschedule functionality

2017-05-23 Thread James Penick
On Tue, May 23, 2017 at 12:20 PM, Edward Leafe  wrote:

>
>
> Oh please, not Nova. The last word that comes to mind when thinking about
> Nova code is “porcelain”.
>

Oh I dunno, porcelain is usually associated with so many every day objects.

If we really push, we could see a movement in the right direction. Better
to use what we have, then wipe it all and flush so much hard work.
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova][ironic][scheduler][placement] IMPORTANT: Getting rid of the automated reschedule functionality

2017-05-23 Thread Edward Leafe
On May 23, 2017, at 1:27 PM, James Penick  wrote:

>  Perhaps this is a place where the TC and Foundation should step in and 
> foster the existence of a porcelain API. Either by constructing something 
> new, or by growing Nova into that thing.


Oh please, not Nova. The last word that comes to mind when thinking about Nova 
code is “porcelain”.


-- Ed Leafe






___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova][ironic][scheduler][placement] IMPORTANT: Getting rid of the automated reschedule functionality

2017-05-23 Thread James Penick
On Tue, May 23, 2017 at 8:52 AM, Jay Pipes  wrote:
>
> If Heat was more widely deployed, would you feel this way? Would you
> reconsider having Heat as one of those "basic compute services" in
> OpenStack, then?
>
>
 (Caveat: I haven't looked at Heat in at least a year) I haven't deployed
heat in my environment yet, because as a template based orchestration
system it requires that you pass the correct template to construct or tear
down a stack. If you were to come along and remove part of that stack in
the interim, you throw everything into disarray, which then requires
cleanup.

 Also, i'm pretty sure my users would mostly hate needing to pass a file to
boot a single instance.

 As an example: In my environment I allows users to request a custom disk
layout for baremetal hosts, by passing a yaml file as metadata (yeah, yeah
I know). The result? They hate that they have to pass a file. To them disk
layout should be a first class object, similar to flavors. I've pushed back
hard against this: It's not clean, disk profiles should be the exception to
the norm, just keep the profile in a code repo. But the truth is i'm coming
around to their way of thinking.

 I'm forced to choose between Architectural Purity[1] and what my customers
actually need. In the end the people who actually use my product define it
inasmuch as I do. At some point i'll probably give in and implement the
thing they want, because from a broad perspective it makes sense to me,
even though it doesn't align with the state of Nova right now.

This is, unfortunately, one of the main problems stemming from OpenStack
> not having a *single* public API, with projects implementing parts of that
> single public API. You know, the thing I started arguing for about 6 years
> ago.
>
> If we had one single public porcelain API, we wouldn't even need to have
> this conversation. People wouldn't even know we'd changed implementation
> details behind the scenes and were doing retries at a slightly higher level
> than before. Oh well... we live and learn (maybe).
>
>
 I agree that a single entry point to OpenStack would be fantastic. If it
existed, scheduling, quota, etc would have moved out of Nova a long time
ago, and Nova at this point would be just a small VM driver. Unfortunately
such a thing does not yet exist, and Nova has the momentum and mind share
as -The- entry point for all things Compute in OpenStack.

 If the community aligns behind a new porcelain API, great! But until it's
ready, deployers, operators, and users need to run their businesses.
Removing functionality that impedes our ability to provide a stable IaaS
experience isn't acceptable to us. If the expectation is that deployers
will hack around this, then that's putting us in the position of struggling
even more to keep up with, or move to a current version of OpenStack.
Worse, that's anathema to cloud interop.

 Perhaps this is a place where the TC and Foundation should step in and
foster the existence of a porcelain API. Either by constructing something
new, or by growing Nova into that thing.

-James
[1] Insert choir of angels sound here
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova][ironic][scheduler][placement] IMPORTANT: Getting rid of the automated reschedule functionality

2017-05-23 Thread Jay Pipes

On 05/23/2017 12:34 PM, Marc Heckmann wrote:

On Tue, 2017-05-23 at 11:44 -0400, Jay Pipes wrote:

On 05/23/2017 09:48 AM, Marc Heckmann wrote:

For the anti-affinity use case, it's really useful for smaller or
medium
size operators who want to provide some form of failure domains to
users
but do not have the resources to create AZ's at DC or even at rack
or
row scale. Don't forget that as soon as you introduce AZs, you need
to
grow those AZs at the same rate and have the same flavor offerings
across those AZs.

For the retry thing, I think enough people have chimed in to echo
the
general sentiment.


The purpose of my ML post was around getting rid of retries, not the
usefulness of affinity groups. That seems to have been missed,
however.

Do you or David have any data on how often you've actually seen
retries
due to the last-minute affinity constraint violation in real world
production?


No I don't have any data unfortunately. Mostly because we haven't
advertised the feature to end users yet. We only now are in a position
to do so because, previously there was a bug causing nova-scheduler to
grow in RAM usage if the required config flag to enable the feature was
  on.


k.


I have however seen retry's triggered on hypervisors for other reasons.
I can try to dig up why specifically if that would be useful. I will
add that we do not use Ironic at all.


Yeah, any data you can get about real-world retry causes would be 
awesome. Note that all "resource over-consumption" causes of retries 
will be going away once we do claims in the scheduler. So, really, we're 
looking for data on the *other* causes of retries.


Thanks much in advance!

-jay

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova][ironic][scheduler][placement] IMPORTANT: Getting rid of the automated reschedule functionality

2017-05-23 Thread Marc Heckmann
On Tue, 2017-05-23 at 11:44 -0400, Jay Pipes wrote:
> On 05/23/2017 09:48 AM, Marc Heckmann wrote:
> > For the anti-affinity use case, it's really useful for smaller or
> > medium 
> > size operators who want to provide some form of failure domains to
> > users 
> > but do not have the resources to create AZ's at DC or even at rack
> > or 
> > row scale. Don't forget that as soon as you introduce AZs, you need
> > to 
> > grow those AZs at the same rate and have the same flavor offerings 
> > across those AZs.
> > 
> > For the retry thing, I think enough people have chimed in to echo
> > the 
> > general sentiment.
> 
> The purpose of my ML post was around getting rid of retries, not the 
> usefulness of affinity groups. That seems to have been missed,
> however.
> 
> Do you or David have any data on how often you've actually seen
> retries 
> due to the last-minute affinity constraint violation in real world 
> production?

No I don't have any data unfortunately. Mostly because we haven't
advertised the feature to end users yet. We only now are in a position
to do so because, previously there was a bug causing nova-scheduler to
grow in RAM usage if the required config flag to enable the feature was
 on.

I have however seen retry's triggered on hypervisors for other reasons.
I can try to dig up why specifically if that would be useful. I will
add that we do not use Ironic at all.

-m



> 
> Thanks,
> -jay
> 
> ___
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operato
> rs
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova][ironic][scheduler][placement] IMPORTANT: Getting rid of the automated reschedule functionality

2017-05-23 Thread Jay Pipes

On 05/22/2017 03:36 PM, Sean Dague wrote:

On 05/22/2017 02:45 PM, James Penick wrote:



 I recognize that large Ironic users expressed their concerns about
 IPMI/BMC communication being unreliable and not wanting to have
 users manually retry a baremetal instance launch. But, on this
 particular point, I'm of the opinion that Nova just do one thing and
 do it well. Nova isn't an orchestrator, nor is it intending to be a
 "just continually try to get me to this eventual state" system like
 Kubernetes.

Kubernetes is a larger orchestration platform that provides autoscale. I
don't expect Nova to provide autoscale, but

I agree that Nova should do one thing and do it really well, and in my
mind that thing is reliable provisioning of compute resources.
Kubernetes does autoscale among other things. I'm not asking for Nova to
provide Autoscale, I -AM- asking OpenStack's compute platform to
provision a discrete compute resource reliably. This means overcoming
common and simple error cases. As a deployer of OpenStack I'm trying to
build a cloud that wraps the chaos of infrastructure, and present a
reliable facade. When my users issue a boot request, I want to see if
fulfilled. I don't expect it to be a 100% guarantee across any possible
failure, but I expect (and my users demand) that my "Infrastructure as a
service" API make reasonable accommodation to overcome common failures.


Right, I think hits my major queeziness with throwing the baby out with
the bathwater here. I feel like Nova's job is to give me a compute when
asked for computes. Yes, like malloc, things could fail. But honestly if
Nova can recover from that scenario, it should try to. The baremetal and
affinity cases are pretty good instances where Nova can catch and
recover, and not just export that complexity up.

It would make me sad to just export that complexity to users, and
instead of handing those cases internally make every SDK, App, and
simple script build their own retry loop.


If Heat was more widely deployed, would you feel this way? Would you 
reconsider having Heat as one of those "basic compute services" in 
OpenStack, then?


This is, unfortunately, one of the main problems stemming from OpenStack 
not having a *single* public API, with projects implementing parts of 
that single public API. You know, the thing I started arguing for about 
6 years ago.


If we had one single public porcelain API, we wouldn't even need to have 
this conversation. People wouldn't even know we'd changed implementation 
details behind the scenes and were doing retries at a slightly higher 
level than before. Oh well... we live and learn (maybe).


Best,
-jay

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova][ironic][scheduler][placement] IMPORTANT: Getting rid of the automated reschedule functionality

2017-05-23 Thread Jay Pipes

On 05/23/2017 09:48 AM, Marc Heckmann wrote:
For the anti-affinity use case, it's really useful for smaller or medium 
size operators who want to provide some form of failure domains to users 
but do not have the resources to create AZ's at DC or even at rack or 
row scale. Don't forget that as soon as you introduce AZs, you need to 
grow those AZs at the same rate and have the same flavor offerings 
across those AZs.


For the retry thing, I think enough people have chimed in to echo the 
general sentiment.


The purpose of my ML post was around getting rid of retries, not the 
usefulness of affinity groups. That seems to have been missed, however.


Do you or David have any data on how often you've actually seen retries 
due to the last-minute affinity constraint violation in real world 
production?


Thanks,
-jay

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova][ironic][scheduler][placement] IMPORTANT: Getting rid of the automated reschedule functionality

2017-05-23 Thread Marc Heckmann
For the anti-affinity use case, it's really useful for smaller or medium size 
operators who want to provide some form of failure domains to users but do not 
have the resources to create AZ's at DC or even at rack or row scale. Don't 
forget that as soon as you introduce AZs, you need to grow those AZs at the 
same rate and have the same flavor offerings across those AZs.

For the retry thing, I think enough people have chimed in to echo the general 
sentiment.

-m


On Mon, 2017-05-22 at 16:30 -0600, David Medberry wrote:
I have to agree with James

My affinity and anti-affinity rules have nothing to do with NFV. a-a is almost 
always a failure domain solution. I'm not sure we have users actually choosing 
affinity (though it would likely be for network speed issues and/or some sort 
of badly architected need or perceived need for coupling.)

On Mon, May 22, 2017 at 12:45 PM, James Penick 
mailto:jpen...@gmail.com>> wrote:


On Mon, May 22, 2017 at 10:54 AM, Jay Pipes 
mailto:jaypi...@gmail.com>> wrote:
Hi Ops,

Hi!


For class b) causes, we should be able to solve this issue when the placement 
service understands affinity/anti-affinity (maybe Queens/Rocky). Until then, we 
propose that instead of raising a Reschedule when an affinity constraint was 
last-minute violated due to a racing scheduler decision, that we simply set the 
instance to an ERROR state.

Personally, I have only ever seen anti-affinity/affinity use cases in relation 
to NFV deployments, and in every NFV deployment of OpenStack there is a VNFM or 
MANO solution that is responsible for the orchestration of instances belonging 
to various service function chains. I think it is reasonable to expect the MANO 
system to be responsible for attempting a re-launch of an instance that was set 
to ERROR due to a last-minute affinity violation.

**Operators, do you agree with the above?**

I do not. My affinity and anti-affinity use cases reflect the need to build 
large applications across failure domains in a datacenter.

Anti-affinity: Most anti-affinity use cases relate to the ability to guarantee 
that instances are scheduled across failure domains, others relate to security 
compliance.

Affinity: Hadoop/Big data deployments have affinity use cases, where nodes 
processing data need to be in the same rack as the nodes which house the data. 
This is a common setup for large hadoop deployers.

I recognize that large Ironic users expressed their concerns about IPMI/BMC 
communication being unreliable and not wanting to have users manually retry a 
baremetal instance launch. But, on this particular point, I'm of the opinion 
that Nova just do one thing and do it well. Nova isn't an orchestrator, nor is 
it intending to be a "just continually try to get me to this eventual state" 
system like Kubernetes.

Kubernetes is a larger orchestration platform that provides autoscale. I don't 
expect Nova to provide autoscale, but

I agree that Nova should do one thing and do it really well, and in my mind 
that thing is reliable provisioning of compute resources. Kubernetes does 
autoscale among other things. I'm not asking for Nova to provide Autoscale, I 
-AM- asking OpenStack's compute platform to provision a discrete compute 
resource reliably. This means overcoming common and simple error cases. As a 
deployer of OpenStack I'm trying to build a cloud that wraps the chaos of 
infrastructure, and present a reliable facade. When my users issue a boot 
request, I want to see if fulfilled. I don't expect it to be a 100% guarantee 
across any possible failure, but I expect (and my users demand) that my 
"Infrastructure as a service" API make reasonable accommodation to overcome 
common failures.


If we removed Reschedule for class c) failures entirely, large Ironic deployers 
would have to train users to manually retry a failed launch or would need to 
write a simple retry mechanism into whatever client/UI that they expose to 
their users.

**Ironic operators, would the above decision force you to abandon Nova as the 
multi-tenant BMaaS facility?**


 I just glanced at one of my production clusters and found there are around 7K 
users defined, many of whom use OpenStack on a daily basis. When they issue a 
boot call, they expect that request to be honored. From their perspective, if 
they call AWS, they get what they ask for. If you remove reschedules you're not 
just breaking the expectation of a single deployer, but for my thousands of 
engineers who, every day, rely on OpenStack to manage their stack.

I don't have a "i'll take my football and go home" mentality. But if you remove 
the ability for the compute provisioning API to present a reliable facade over 
infrastructure, I have to go write something else, or patch it back in. Now 
it's even harder for me to get and stay current with OpenStack.

During the summit the agreement was, if I recall, that reschedules would happen 
within a cell, and not between the parent and cell. That was complet

Re: [Openstack-operators] [nova][ironic][scheduler][placement] IMPORTANT: Getting rid of the automated reschedule functionality

2017-05-22 Thread Blair Bethwaite
On 23 May 2017 at 05:33, Dan Smith  wrote:
> Sure, the diaper exception is rescheduled currently. That should
> basically be things like misconfiguration type things. Rescheduling
> papers over those issues, which I don't like, but in the room it surely
> seemed like operators thought that they still needed to be handled.

Operators don't want retries to mask configuration issues (appropriate
errors should still be captured in places where operators can process
them on a regular basis) but what they want even less is any further
complexity or "soft" failures exposed to end-users.

-- 
Cheers,
~Blairo

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova][ironic][scheduler][placement] IMPORTANT: Getting rid of the automated reschedule functionality

2017-05-22 Thread David Medberry
I have to agree with James

My affinity and anti-affinity rules have nothing to do with NFV. a-a is
almost always a failure domain solution. I'm not sure we have users
actually choosing affinity (though it would likely be for network speed
issues and/or some sort of badly architected need or perceived need for
coupling.)

On Mon, May 22, 2017 at 12:45 PM, James Penick  wrote:

>
>
> On Mon, May 22, 2017 at 10:54 AM, Jay Pipes  wrote:
>
>> Hi Ops,
>>
>> Hi!
>
>
>>
>> For class b) causes, we should be able to solve this issue when the
>> placement service understands affinity/anti-affinity (maybe Queens/Rocky).
>> Until then, we propose that instead of raising a Reschedule when an
>> affinity constraint was last-minute violated due to a racing scheduler
>> decision, that we simply set the instance to an ERROR state.
>>
>> Personally, I have only ever seen anti-affinity/affinity use cases in
>> relation to NFV deployments, and in every NFV deployment of OpenStack there
>> is a VNFM or MANO solution that is responsible for the orchestration of
>> instances belonging to various service function chains. I think it is
>> reasonable to expect the MANO system to be responsible for attempting a
>> re-launch of an instance that was set to ERROR due to a last-minute
>> affinity violation.
>>
>
>
>> **Operators, do you agree with the above?**
>>
>
> I do not. My affinity and anti-affinity use cases reflect the need to
> build large applications across failure domains in a datacenter.
>
> Anti-affinity: Most anti-affinity use cases relate to the ability to
> guarantee that instances are scheduled across failure domains, others
> relate to security compliance.
>
> Affinity: Hadoop/Big data deployments have affinity use cases, where nodes
> processing data need to be in the same rack as the nodes which house the
> data. This is a common setup for large hadoop deployers.
>
>
>> I recognize that large Ironic users expressed their concerns about
>> IPMI/BMC communication being unreliable and not wanting to have users
>> manually retry a baremetal instance launch. But, on this particular point,
>> I'm of the opinion that Nova just do one thing and do it well. Nova isn't
>> an orchestrator, nor is it intending to be a "just continually try to get
>> me to this eventual state" system like Kubernetes.
>>
>
> Kubernetes is a larger orchestration platform that provides autoscale. I
> don't expect Nova to provide autoscale, but
>
> I agree that Nova should do one thing and do it really well, and in my
> mind that thing is reliable provisioning of compute resources. Kubernetes
> does autoscale among other things. I'm not asking for Nova to provide
> Autoscale, I -AM- asking OpenStack's compute platform to provision a
> discrete compute resource reliably. This means overcoming common and simple
> error cases. As a deployer of OpenStack I'm trying to build a cloud that
> wraps the chaos of infrastructure, and present a reliable facade. When my
> users issue a boot request, I want to see if fulfilled. I don't expect it
> to be a 100% guarantee across any possible failure, but I expect (and my
> users demand) that my "Infrastructure as a service" API make reasonable
> accommodation to overcome common failures.
>
>
>
>> If we removed Reschedule for class c) failures entirely, large Ironic
>> deployers would have to train users to manually retry a failed launch or
>> would need to write a simple retry mechanism into whatever client/UI that
>> they expose to their users.
>>
>> **Ironic operators, would the above decision force you to abandon Nova as
>> the multi-tenant BMaaS facility?**
>>
>>
>  I just glanced at one of my production clusters and found there are
> around 7K users defined, many of whom use OpenStack on a daily basis. When
> they issue a boot call, they expect that request to be honored. From their
> perspective, if they call AWS, they get what they ask for. If you remove
> reschedules you're not just breaking the expectation of a single deployer,
> but for my thousands of engineers who, every day, rely on OpenStack to
> manage their stack.
>
> I don't have a "i'll take my football and go home" mentality. But if you
> remove the ability for the compute provisioning API to present a reliable
> facade over infrastructure, I have to go write something else, or patch it
> back in. Now it's even harder for me to get and stay current with OpenStack.
>
> During the summit the agreement was, if I recall, that reschedules would
> happen within a cell, and not between the parent and cell. That was
> completely acceptable to me.
>
> -James
>
>
> ___
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
>
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operato

Re: [Openstack-operators] [nova][ironic][scheduler][placement] IMPORTANT: Getting rid of the automated reschedule functionality

2017-05-22 Thread Dan Smith
> Whoah, but that's after 10 tries (by default).  And if e.g. it bounced
> because the instance is too big for the host, but other, smaller
> instances come in and succeed in the meantime, that could wind up being
> stretched indefinitely.  Doesn't sound like a complete answer to this issue.

No dude, remember, this is all assuming that claiming with placement
eliminates 100% of the resource races :)

The _only_ things left to reschedule for are (a) straight up 100% fail
compute host misconfigurations and (b) anything that fails some
percentage of the time and will actually be resolved by trying a
different host (i.e. baseline 40% ironic ipmi failbots).

> Today you can limit the set of compute hosts to try by specifying an
> "availability zone".  Perhaps the answer here is to support some kind of
> "exclude these hosts" list to a "fresh" deploy.
> 
> But is the cure worse than the disease?

I (and I think others) would argue that the user needing to know that
they should try a different AZ is not reasonable UX. A rebuild of an
instance that failed to boot can/should exclude the original host on the
rebuild attempt. It does today with reschedules so it's not that hard,
just requires some plumbing.

--Dan

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova][ironic][scheduler][placement] IMPORTANT: Getting rid of the automated reschedule functionality

2017-05-22 Thread Eric Fried
Dan, et al-

> Well, (a) today you can't really externally retry a single instance
> build without just creating a new one. The new one could suffer the same
> fate, but that's why we just did the auto-disable feature for nova-compute.

Whoah, but that's after 10 tries (by default).  And if e.g. it bounced
because the instance is too big for the host, but other, smaller
instances come in and succeed in the meantime, that could wind up being
stretched indefinitely.  Doesn't sound like a complete answer to this issue.

> Thing (b) is that if we fix rebuild so it works on a failed
> shell-of-an-instance from a boot operation, we could easily exclude the
> host it failed on, but it'd require some additional logic.

Right, so I think the need for that "additional logic" was my point.

Today you can limit the set of compute hosts to try by specifying an
"availability zone".  Perhaps the answer here is to support some kind of
"exclude these hosts" list to a "fresh" deploy.

But is the cure worse than the disease?

-efried
.


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova][ironic][scheduler][placement] IMPORTANT: Getting rid of the automated reschedule functionality

2017-05-22 Thread Dan Smith
> In a no-reschedules-by-nova world, if a deploy fails on host 1, how does
> the orchestrator (whatever that may be) ask nova to deploy in such a way
> that it'll still try to find a good host, but *avoid* host 1?  If host 1
> was an attractive candidate the first time around, wouldn't it be likely
> to remain high on the list the second time?

Well, (a) today you can't really externally retry a single instance
build without just creating a new one. The new one could suffer the same
fate, but that's why we just did the auto-disable feature for nova-compute.

Thing (b) is that if we fix rebuild so it works on a failed
shell-of-an-instance from a boot operation, we could easily exclude the
host it failed on, but it'd require some additional logic.

--Dan

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova][ironic][scheduler][placement] IMPORTANT: Getting rid of the automated reschedule functionality

2017-05-22 Thread James Penick
In the case of baremetal in our environment, when a boot attempt fails we
mark that node as being in maintenance mode, which prevents Nova from
scheduling to it a second time. Then automation comes along and files
repair tickets for the bad hardware. Only when a human or other automation
fixes the node and removes the "maintenance" state, will it be available
for scheduling again.

On Mon, May 22, 2017 at 1:25 PM, Eric Fried  wrote:

> Hey folks, sorry if this is a jejune question, but:
>
> In a no-reschedules-by-nova world, if a deploy fails on host 1, how does
> the orchestrator (whatever that may be) ask nova to deploy in such a way
> that it'll still try to find a good host, but *avoid* host 1?  If host 1
> was an attractive candidate the first time around, wouldn't it be likely
> to remain high on the list the second time?
>
> I'd also like to second the thought that the monolithic "instance in
> error state" gives the orchestrator no hint as to whether the deploy
> failed because of something the orchestrator did (remedy may be to
> redrive with different inputs, but no need to exclude the original
> target host) versus because something went wrong on the compute host
> (remedy would be to retry on a different host with the same inputs).
> Kind of analogous to the difference between HTTP 4xx and 5xx error
> classes.  (Perhaps implying a design whereby the nova API responds to
> the deploy request with different error codes accordingly.)
>
> Thanks,
> efried
> .
>
> ___
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova][ironic][scheduler][placement] IMPORTANT: Getting rid of the automated reschedule functionality

2017-05-22 Thread Eric Fried
Hey folks, sorry if this is a jejune question, but:

In a no-reschedules-by-nova world, if a deploy fails on host 1, how does
the orchestrator (whatever that may be) ask nova to deploy in such a way
that it'll still try to find a good host, but *avoid* host 1?  If host 1
was an attractive candidate the first time around, wouldn't it be likely
to remain high on the list the second time?

I'd also like to second the thought that the monolithic "instance in
error state" gives the orchestrator no hint as to whether the deploy
failed because of something the orchestrator did (remedy may be to
redrive with different inputs, but no need to exclude the original
target host) versus because something went wrong on the compute host
(remedy would be to retry on a different host with the same inputs).
Kind of analogous to the difference between HTTP 4xx and 5xx error
classes.  (Perhaps implying a design whereby the nova API responds to
the deploy request with different error codes accordingly.)

Thanks,
efried
.

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova][ironic][scheduler][placement] IMPORTANT: Getting rid of the automated reschedule functionality

2017-05-22 Thread James Penick
That depends..
 I differentiate between a compute worker running on a hypervisor, and one
running as a service in the control plane (like the compute worker in an
Ironic cluster).

 A compute worker that is running on a hypervisor has highly restricted
network access. But if the compute worker is a service in the control
plane, such as it is with my Ironic installations, that's totally ok. It
really comes down to the fact that I don't want any real or logical network
access between an instance and the heart of the control plane.

 I'll allow a child cell control plane to call a parent cell, just not a
hypervisor within the child cell.


On Mon, May 22, 2017 at 12:42 PM, Sean Dague  wrote:

> On 05/22/2017 02:45 PM, James Penick wrote:
> 
> > During the summit the agreement was, if I recall, that reschedules would
> > happen within a cell, and not between the parent and cell. That was
> > completely acceptable to me.
>
> Follow on question (just because the right folks are in this thread, and
> it could impact paths forward). I know that some of the inability to
> have upcalls in the system is based around firewalling that both Yahoo
> and RAX did blocking the compute workers from communicating out.
>
> If the compute worker or cell conductor wanted to make an HTTP call back
> to nova-api (through the public interface), with the user context, is
> that a network path that would or could be accessible in your case?
>
> -Sean
>
> --
> Sean Dague
> http://dague.net
>
> ___
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova][ironic][scheduler][placement] IMPORTANT: Getting rid of the automated reschedule functionality

2017-05-22 Thread Jay Pipes

On 05/22/2017 03:53 PM, Jonathan Proulx wrote:

To be clear on my view of the whole proposal

most of my Rescheduling that I've seen and want are of type "A" where
claim exceeds resources.  At least I think they are type "A" and not
"C" unknown.

The exact case is that I over subsribe RAM (1.5x) my users typically over
claim so this is OK (my worst case is a hypervisor using only 10% of
claimed RAM).  But there are some hotspots where propertional
utilization is high so libvirt won't start more VMs becasue it really
doesn't have the memory.

If that's solved (or will be at the time reschedule goes away), teh
cases I've actually experienced would be solved.

The anit-affinity use cases are currently most important to be of the
affinity scheduling and I haven't (to my knowlege) seen collisions in
that direction.  So I could live with that race becuase for me it is
uncommon (though I imagine for others where positive affinity is
important teh race may get lost mroe frequently)


Thanks for the feedback, Jon.

For the record, affinity really doesn't have much of a race condition at 
all. It's really only anti-affinity that has much of a chance of 
last-minute violation.


Best,
-jay


On Mon, May 22, 2017 at 03:00:09PM -0400, Jonathan Proulx wrote:
:On Mon, May 22, 2017 at 11:45:33AM -0700, James Penick wrote:
::On Mon, May 22, 2017 at 10:54 AM, Jay Pipes  wrote:
::
::> Hi Ops,
::>
::> Hi!
::
::
::>
::> For class b) causes, we should be able to solve this issue when the
::> placement service understands affinity/anti-affinity (maybe Queens/Rocky).
::> Until then, we propose that instead of raising a Reschedule when an
::> affinity constraint was last-minute violated due to a racing scheduler
::> decision, that we simply set the instance to an ERROR state.
::>
::> Personally, I have only ever seen anti-affinity/affinity use cases in
::> relation to NFV deployments, and in every NFV deployment of OpenStack there
::> is a VNFM or MANO solution that is responsible for the orchestration of
::> instances belonging to various service function chains. I think it is
::> reasonable to expect the MANO system to be responsible for attempting a
::> re-launch of an instance that was set to ERROR due to a last-minute
::> affinity violation.
::>
::
::
::> **Operators, do you agree with the above?**
::>
::
::I do not. My affinity and anti-affinity use cases reflect the need to build
::large applications across failure domains in a datacenter.
::
::Anti-affinity: Most anti-affinity use cases relate to the ability to
::guarantee that instances are scheduled across failure domains, others
::relate to security compliance.
::
::Affinity: Hadoop/Big data deployments have affinity use cases, where nodes
::processing data need to be in the same rack as the nodes which house the
::data. This is a common setup for large hadoop deployers.
:
:James describes my use case as well.
:
:I would also rather see a reschedule, if we're having a really bad day
:and reach max retries then see ERR
:
:-Jon



___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova][ironic][scheduler][placement] IMPORTANT: Getting rid of the automated reschedule functionality

2017-05-22 Thread Jonathan Proulx

To be clear on my view of the whole proposal

most of my Rescheduling that I've seen and want are of type "A" where
claim exceeds resources.  At least I think they are type "A" and not
"C" unknown.

The exact case is that I over subsribe RAM (1.5x) my users typically over
claim so this is OK (my worst case is a hypervisor using only 10% of
claimed RAM).  But there are some hotspots where propertional
utilization is high so libvirt won't start more VMs becasue it really
doesn't have the memory.

If that's solved (or will be at the time reschedule goes away), teh
cases I've actually experienced would be solved.

The anit-affinity use cases are currently most important to be of the
affinity scheduling and I haven't (to my knowlege) seen collisions in
that direction.  So I could live with that race becuase for me it is
uncommon (though I imagine for others where positive affinity is
important teh race may get lost mroe frequently) 

-Jon

On Mon, May 22, 2017 at 03:00:09PM -0400, Jonathan Proulx wrote:
:On Mon, May 22, 2017 at 11:45:33AM -0700, James Penick wrote:
::On Mon, May 22, 2017 at 10:54 AM, Jay Pipes  wrote:
::
::> Hi Ops,
::>
::> Hi!
::
::
::>
::> For class b) causes, we should be able to solve this issue when the
::> placement service understands affinity/anti-affinity (maybe Queens/Rocky).
::> Until then, we propose that instead of raising a Reschedule when an
::> affinity constraint was last-minute violated due to a racing scheduler
::> decision, that we simply set the instance to an ERROR state.
::>
::> Personally, I have only ever seen anti-affinity/affinity use cases in
::> relation to NFV deployments, and in every NFV deployment of OpenStack there
::> is a VNFM or MANO solution that is responsible for the orchestration of
::> instances belonging to various service function chains. I think it is
::> reasonable to expect the MANO system to be responsible for attempting a
::> re-launch of an instance that was set to ERROR due to a last-minute
::> affinity violation.
::>
::
::
::> **Operators, do you agree with the above?**
::>
::
::I do not. My affinity and anti-affinity use cases reflect the need to build
::large applications across failure domains in a datacenter.
::
::Anti-affinity: Most anti-affinity use cases relate to the ability to
::guarantee that instances are scheduled across failure domains, others
::relate to security compliance.
::
::Affinity: Hadoop/Big data deployments have affinity use cases, where nodes
::processing data need to be in the same rack as the nodes which house the
::data. This is a common setup for large hadoop deployers.
:
:James describes my use case as well.
:
:I would also rather see a reschedule, if we're having a really bad day
:and reach max retries then see ERR
:
:-Jon

-- 

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova][ironic][scheduler][placement] IMPORTANT: Getting rid of the automated reschedule functionality

2017-05-22 Thread Sean Dague
On 05/22/2017 02:45 PM, James Penick wrote:

> During the summit the agreement was, if I recall, that reschedules would
> happen within a cell, and not between the parent and cell. That was
> completely acceptable to me.

Follow on question (just because the right folks are in this thread, and
it could impact paths forward). I know that some of the inability to
have upcalls in the system is based around firewalling that both Yahoo
and RAX did blocking the compute workers from communicating out.

If the compute worker or cell conductor wanted to make an HTTP call back
to nova-api (through the public interface), with the user context, is
that a network path that would or could be accessible in your case?

-Sean

-- 
Sean Dague
http://dague.net

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova][ironic][scheduler][placement] IMPORTANT: Getting rid of the automated reschedule functionality

2017-05-22 Thread Sean Dague
On 05/22/2017 02:45 PM, James Penick wrote:

>  
> 
> I recognize that large Ironic users expressed their concerns about
> IPMI/BMC communication being unreliable and not wanting to have
> users manually retry a baremetal instance launch. But, on this
> particular point, I'm of the opinion that Nova just do one thing and
> do it well. Nova isn't an orchestrator, nor is it intending to be a
> "just continually try to get me to this eventual state" system like
> Kubernetes.
> 
> 
> Kubernetes is a larger orchestration platform that provides autoscale. I
> don't expect Nova to provide autoscale, but 
> 
> I agree that Nova should do one thing and do it really well, and in my
> mind that thing is reliable provisioning of compute resources.
> Kubernetes does autoscale among other things. I'm not asking for Nova to
> provide Autoscale, I -AM- asking OpenStack's compute platform to
> provision a discrete compute resource reliably. This means overcoming
> common and simple error cases. As a deployer of OpenStack I'm trying to
> build a cloud that wraps the chaos of infrastructure, and present a
> reliable facade. When my users issue a boot request, I want to see if
> fulfilled. I don't expect it to be a 100% guarantee across any possible
> failure, but I expect (and my users demand) that my "Infrastructure as a
> service" API make reasonable accommodation to overcome common failures. 

Right, I think hits my major queeziness with throwing the baby out with
the bathwater here. I feel like Nova's job is to give me a compute when
asked for computes. Yes, like malloc, things could fail. But honestly if
Nova can recover from that scenario, it should try to. The baremetal and
affinity cases are pretty good instances where Nova can catch and
recover, and not just export that complexity up.

It would make me sad to just export that complexity to users, and
instead of handing those cases internally make every SDK, App, and
simple script build their own retry loop.

-Sean

-- 
Sean Dague
http://dague.net

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova][ironic][scheduler][placement] IMPORTANT: Getting rid of the automated reschedule functionality

2017-05-22 Thread Dan Smith
> To be clear, they are able to communicate, and do, as long as you
> configure them to be able to do so. The long-term goal is that you don't
> have to configure them to be able to do so, so we're trying to design
> and work in that mode toward that goal.

No, the cell conductor doesn't have a way to communicate with the
scheduler. It's more than just a "it's not configured to" thing.

If you have multiple cells, then your conductors within a cell point to
the cell MQ as the default transport for all kinds of stuff. If they
call to compute to do a thing, they don't (can't, since it doesn't have
the ability to lookup the cell mapping) target, they just ask on their
default bus.

So, unless scheduler and compute are on the same bus, conductor *can't*
talk to both at the same time (for non-super conductor operations like
build that expect to target, but then they can't do the non-targeted
operations). If you do that, then you're not doing cellsv2.

>> [1] This really does not occur with any frequency for hypervisor virt
>> drivers, since the exceptions those hypervisors throw are caught by
>> the nova-compute worker and handled without raising a Reschedule.
> 
> Are you sure about that?
> 
> https://github.com/openstack/nova/blob/931c3f48188e57e71aa6518d5253e1a5bd9a27c0/nova/compute/manager.py#L2041-L2049

Sure, the diaper exception is rescheduled currently. That should
basically be things like misconfiguration type things. Rescheduling
papers over those issues, which I don't like, but in the room it surely
seemed like operators thought that they still needed to be handled.

--Dan

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova][ironic][scheduler][placement] IMPORTANT: Getting rid of the automated reschedule functionality

2017-05-22 Thread Jonathan Proulx
On Mon, May 22, 2017 at 11:45:33AM -0700, James Penick wrote:
:On Mon, May 22, 2017 at 10:54 AM, Jay Pipes  wrote:
:
:> Hi Ops,
:>
:> Hi!
:
:
:>
:> For class b) causes, we should be able to solve this issue when the
:> placement service understands affinity/anti-affinity (maybe Queens/Rocky).
:> Until then, we propose that instead of raising a Reschedule when an
:> affinity constraint was last-minute violated due to a racing scheduler
:> decision, that we simply set the instance to an ERROR state.
:>
:> Personally, I have only ever seen anti-affinity/affinity use cases in
:> relation to NFV deployments, and in every NFV deployment of OpenStack there
:> is a VNFM or MANO solution that is responsible for the orchestration of
:> instances belonging to various service function chains. I think it is
:> reasonable to expect the MANO system to be responsible for attempting a
:> re-launch of an instance that was set to ERROR due to a last-minute
:> affinity violation.
:>
:
:
:> **Operators, do you agree with the above?**
:>
:
:I do not. My affinity and anti-affinity use cases reflect the need to build
:large applications across failure domains in a datacenter.
:
:Anti-affinity: Most anti-affinity use cases relate to the ability to
:guarantee that instances are scheduled across failure domains, others
:relate to security compliance.
:
:Affinity: Hadoop/Big data deployments have affinity use cases, where nodes
:processing data need to be in the same rack as the nodes which house the
:data. This is a common setup for large hadoop deployers.

James describes my use case as well.

I would also rather see a reschedule, if we're having a really bad day
and reach max retries then see ERR

-Jon

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova][ironic][scheduler][placement] IMPORTANT: Getting rid of the automated reschedule functionality

2017-05-22 Thread Matt Riedemann

On 5/22/2017 12:54 PM, Jay Pipes wrote:

Hi Ops,

I need your feedback on a very important direction we would like to 
pursue. I realize that there were Forum sessions about this topic at the 
summit in Boston and that there were some decisions that were reached.


I'd like to revisit that decision and explain why I'd like your support 
for getting rid of the automatic reschedule behaviour entirely in Nova 
for Pike.


== The current situation and why it sucks ==

Nova currently attempts to "reschedule" instances when any of the 
following events occur:


a) the "claim resources" process that occurs on the nova-compute worker 
results in the chosen compute node exceeding its own capacity


b) in between the time a compute node was chosen by the scheduler, 
another process launched an instance that would violate an affinity 
constraint


c) an "unknown" exception occurs during the spawn process. In practice, 
this really only is seen when the Ironic baremetal node that was chosen 
by the scheduler turns out to be unreliable (IPMI issues, BMC failures, 
etc) and wasn't able to launch the instance. [1]


The logic for handling these reschedules makes the Nova conductor, 
scheduler and compute worker code very complex. With the new cellsv2 
architecture in Nova, child cells are not able to communicate with the 
Nova scheduler (and thus "ask for a reschedule").


To be clear, they are able to communicate, and do, as long as you 
configure them to be able to do so. The long-term goal is that you don't 
have to configure them to be able to do so, so we're trying to design 
and work in that mode toward that goal.




We (the Nova team) would like to get rid of the automated rescheduling 
behaviour that Nova currently exposes because we could eliminate a large 
amount of complexity (which leads to bugs) from the already-complicated 
dance of communication that occurs between internal Nova components.


== What we would like to do ==

With the move of the resource claim to the Nova scheduler [2], we can 
entirely eliminate the a) class of Reschedule causes.


This leaves class b) and c) causes of Rescheduling.

For class b) causes, we should be able to solve this issue when the 
placement service understands affinity/anti-affinity (maybe 
Queens/Rocky). Until then, we propose that instead of raising a 
Reschedule when an affinity constraint was last-minute violated due to a 
racing scheduler decision, that we simply set the instance to an ERROR 
state.


Personally, I have only ever seen anti-affinity/affinity use cases in 
relation to NFV deployments, and in every NFV deployment of OpenStack 
there is a VNFM or MANO solution that is responsible for the 
orchestration of instances belonging to various service function chains. 
I think it is reasonable to expect the MANO system to be responsible for 
attempting a re-launch of an instance that was set to ERROR due to a 
last-minute affinity violation.


**Operators, do you agree with the above?**

Finally, for class c) Reschedule causes, I do not believe that we should 
be attempting automated rescheduling when "unknown" errors occur. I just 
don't believe this is something Nova should be doing.


I recognize that large Ironic users expressed their concerns about 
IPMI/BMC communication being unreliable and not wanting to have users 
manually retry a baremetal instance launch. But, on this particular 
point, I'm of the opinion that Nova just do one thing and do it well. 
Nova isn't an orchestrator, nor is it intending to be a "just 
continually try to get me to this eventual state" system like Kubernetes.


If we removed Reschedule for class c) failures entirely, large Ironic 
deployers would have to train users to manually retry a failed launch or 
would need to write a simple retry mechanism into whatever client/UI 
that they expose to their users.


**Ironic operators, would the above decision force you to abandon Nova 
as the multi-tenant BMaaS facility?**


Thanks in advance for your consideration and feedback.

Best,
-jay

[1] This really does not occur with any frequency for hypervisor virt 
drivers, since the exceptions those hypervisors throw are caught by the 
nova-compute worker and handled without raising a Reschedule.


Are you sure about that?

https://github.com/openstack/nova/blob/931c3f48188e57e71aa6518d5253e1a5bd9a27c0/nova/compute/manager.py#L2041-L2049

The compute manager handles anything non-specific that leaks up from the 
virt driver.spawn() method and reschedules it. Think 
ProcessExecutionError when vif plugging fails in the libvirt driver 
because the command blew up for some reason (sudo on the host is 
wrong?). I'm not saying it should, as I'm guessing most of these types 
of failures are due to misconfiguration, but it is how things currently 
work today.




[2] 
http://specs.openstack.org/openstack/nova-specs/specs/pike/approved/placement-claims.html 



___
OpenStack-operators mailing list
O

Re: [Openstack-operators] [nova][ironic][scheduler][placement] IMPORTANT: Getting rid of the automated reschedule functionality

2017-05-22 Thread James Penick
On Mon, May 22, 2017 at 10:54 AM, Jay Pipes  wrote:

> Hi Ops,
>
> Hi!


>
> For class b) causes, we should be able to solve this issue when the
> placement service understands affinity/anti-affinity (maybe Queens/Rocky).
> Until then, we propose that instead of raising a Reschedule when an
> affinity constraint was last-minute violated due to a racing scheduler
> decision, that we simply set the instance to an ERROR state.
>
> Personally, I have only ever seen anti-affinity/affinity use cases in
> relation to NFV deployments, and in every NFV deployment of OpenStack there
> is a VNFM or MANO solution that is responsible for the orchestration of
> instances belonging to various service function chains. I think it is
> reasonable to expect the MANO system to be responsible for attempting a
> re-launch of an instance that was set to ERROR due to a last-minute
> affinity violation.
>


> **Operators, do you agree with the above?**
>

I do not. My affinity and anti-affinity use cases reflect the need to build
large applications across failure domains in a datacenter.

Anti-affinity: Most anti-affinity use cases relate to the ability to
guarantee that instances are scheduled across failure domains, others
relate to security compliance.

Affinity: Hadoop/Big data deployments have affinity use cases, where nodes
processing data need to be in the same rack as the nodes which house the
data. This is a common setup for large hadoop deployers.


> I recognize that large Ironic users expressed their concerns about
> IPMI/BMC communication being unreliable and not wanting to have users
> manually retry a baremetal instance launch. But, on this particular point,
> I'm of the opinion that Nova just do one thing and do it well. Nova isn't
> an orchestrator, nor is it intending to be a "just continually try to get
> me to this eventual state" system like Kubernetes.
>

Kubernetes is a larger orchestration platform that provides autoscale. I
don't expect Nova to provide autoscale, but

I agree that Nova should do one thing and do it really well, and in my mind
that thing is reliable provisioning of compute resources. Kubernetes does
autoscale among other things. I'm not asking for Nova to provide Autoscale,
I -AM- asking OpenStack's compute platform to provision a discrete compute
resource reliably. This means overcoming common and simple error cases. As
a deployer of OpenStack I'm trying to build a cloud that wraps the chaos of
infrastructure, and present a reliable facade. When my users issue a boot
request, I want to see if fulfilled. I don't expect it to be a 100%
guarantee across any possible failure, but I expect (and my users demand)
that my "Infrastructure as a service" API make reasonable accommodation to
overcome common failures.



> If we removed Reschedule for class c) failures entirely, large Ironic
> deployers would have to train users to manually retry a failed launch or
> would need to write a simple retry mechanism into whatever client/UI that
> they expose to their users.
>
> **Ironic operators, would the above decision force you to abandon Nova as
> the multi-tenant BMaaS facility?**
>
>
 I just glanced at one of my production clusters and found there are around
7K users defined, many of whom use OpenStack on a daily basis. When they
issue a boot call, they expect that request to be honored. From their
perspective, if they call AWS, they get what they ask for. If you remove
reschedules you're not just breaking the expectation of a single deployer,
but for my thousands of engineers who, every day, rely on OpenStack to
manage their stack.

I don't have a "i'll take my football and go home" mentality. But if you
remove the ability for the compute provisioning API to present a reliable
facade over infrastructure, I have to go write something else, or patch it
back in. Now it's even harder for me to get and stay current with OpenStack.

During the summit the agreement was, if I recall, that reschedules would
happen within a cell, and not between the parent and cell. That was
completely acceptable to me.

-James
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] [nova][ironic][scheduler][placement] IMPORTANT: Getting rid of the automated reschedule functionality

2017-05-22 Thread Jay Pipes

Hi Ops,

I need your feedback on a very important direction we would like to 
pursue. I realize that there were Forum sessions about this topic at the 
summit in Boston and that there were some decisions that were reached.


I'd like to revisit that decision and explain why I'd like your support 
for getting rid of the automatic reschedule behaviour entirely in Nova 
for Pike.


== The current situation and why it sucks ==

Nova currently attempts to "reschedule" instances when any of the 
following events occur:


a) the "claim resources" process that occurs on the nova-compute worker 
results in the chosen compute node exceeding its own capacity


b) in between the time a compute node was chosen by the scheduler, 
another process launched an instance that would violate an affinity 
constraint


c) an "unknown" exception occurs during the spawn process. In practice, 
this really only is seen when the Ironic baremetal node that was chosen 
by the scheduler turns out to be unreliable (IPMI issues, BMC failures, 
etc) and wasn't able to launch the instance. [1]


The logic for handling these reschedules makes the Nova conductor, 
scheduler and compute worker code very complex. With the new cellsv2 
architecture in Nova, child cells are not able to communicate with the 
Nova scheduler (and thus "ask for a reschedule").


We (the Nova team) would like to get rid of the automated rescheduling 
behaviour that Nova currently exposes because we could eliminate a large 
amount of complexity (which leads to bugs) from the already-complicated 
dance of communication that occurs between internal Nova components.


== What we would like to do ==

With the move of the resource claim to the Nova scheduler [2], we can 
entirely eliminate the a) class of Reschedule causes.


This leaves class b) and c) causes of Rescheduling.

For class b) causes, we should be able to solve this issue when the 
placement service understands affinity/anti-affinity (maybe 
Queens/Rocky). Until then, we propose that instead of raising a 
Reschedule when an affinity constraint was last-minute violated due to a 
racing scheduler decision, that we simply set the instance to an ERROR 
state.


Personally, I have only ever seen anti-affinity/affinity use cases in 
relation to NFV deployments, and in every NFV deployment of OpenStack 
there is a VNFM or MANO solution that is responsible for the 
orchestration of instances belonging to various service function chains. 
I think it is reasonable to expect the MANO system to be responsible for 
attempting a re-launch of an instance that was set to ERROR due to a 
last-minute affinity violation.


**Operators, do you agree with the above?**

Finally, for class c) Reschedule causes, I do not believe that we should 
be attempting automated rescheduling when "unknown" errors occur. I just 
don't believe this is something Nova should be doing.


I recognize that large Ironic users expressed their concerns about 
IPMI/BMC communication being unreliable and not wanting to have users 
manually retry a baremetal instance launch. But, on this particular 
point, I'm of the opinion that Nova just do one thing and do it well. 
Nova isn't an orchestrator, nor is it intending to be a "just 
continually try to get me to this eventual state" system like Kubernetes.


If we removed Reschedule for class c) failures entirely, large Ironic 
deployers would have to train users to manually retry a failed launch or 
would need to write a simple retry mechanism into whatever client/UI 
that they expose to their users.


**Ironic operators, would the above decision force you to abandon Nova 
as the multi-tenant BMaaS facility?**


Thanks in advance for your consideration and feedback.

Best,
-jay

[1] This really does not occur with any frequency for hypervisor virt 
drivers, since the exceptions those hypervisors throw are caught by the 
nova-compute worker and handled without raising a Reschedule.


[2] 
http://specs.openstack.org/openstack/nova-specs/specs/pike/approved/placement-claims.html


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators