Re: [openstack-dev] [nova] heads up to users of Aggregate[Core|Ram|Disk]Filter: behavior change in >= Ocata

Jay Pipes Thu, 18 Jan 2018 12:50:53 -0800

On 01/18/2018 03:06 PM, Logan V. wrote:

We have used aggregate based scheduler filters since deploying our
cloud in Kilo. This explains the unpredictable scheduling we have seen
since upgrading to Ocata. Before this post, was there some indication
I missed that these filters can no longer be used? Even now reading
the Ocata release notes[1] or checking the filter scheduler docs[2] I
cannot find any indication that AggregateCoreFilter,
AggregateRamFilter, and AggregateDiskFilter are useless in Ocata+. If
I missed something I'd like to know where it is so I can avoid that
mistake again!

We failed to provide a release note about it. :( That's our fault and Iapologize.

Just to make sure I understand correctly, given this list of filters
we used in Newton:
AggregateInstanceExtraSpecsFilter,AggregateNumInstancesFilter,AggregateCoreFilter,AggregateRamFilter,RetryFilter,AvailabilityZoneFilter,RamFilter,ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,ServerGroupAntiAffinityFilter,ServerGroupAffinityFilter

I should remove AggregateCoreFilter, AggregateRamFilter, and RamFilter
from the list because they are no longer useful, and replace them with
the appropriate nova.conf settings instead, correct?


Yes, correct.

What about AggregateInstanceExtraSpecsFilter and
AggregateNumInstancesFilter? Do these still work?


Yes.

Best,
-jay

Thanks
Logan

[1] https://docs.openstack.org/releasenotes/nova/ocata.html
[2] https://docs.openstack.org/ocata/config-reference/compute/schedulers.html

On Wed, Jan 17, 2018 at 7:57 AM, Sylvain Bauza <sba...@redhat.com> wrote:



On Wed, Jan 17, 2018 at 2:22 PM, Jay Pipes <jaypi...@gmail.com> wrote:


On 01/16/2018 08:19 PM, Zhenyu Zheng wrote:


Thanks for the info, so it seems we are not going to implement aggregate
overcommit ratio in placement at least in the near future?



As @edleafe alluded to, we will not be adding functionality to the
placement service to associate an overcommit ratio with an aggregate. This
was/is buggy functionality that we do not wish to bring forward into the
placement modeling system.

Reasons the current functionality is poorly architected and buggy
(mentioned in @melwitt's footnote):

1) If a nova-compute service's CONF.cpu_allocation_ratio is different from
the host aggregate's cpu_allocation_ratio metadata value, which value should
be considered by the AggregateCoreFilter filter?

2) If a nova-compute service is associated with multiple host aggregates,
and those aggregates contain different values for their cpu_allocation_ratio
metadata value, which one should be used by the AggregateCoreFilter?

The bottom line for me is that the AggregateCoreFilter has been used as a
crutch to solve a **configuration management problem**.

Instead of the configuration management system (Puppet, etc) setting
nova-compute service CONF.cpu_allocation_ratio options *correctly*, having
the admin set the HostAggregate metadata cpu_allocation_ratio value is
error-prone for the reasons listed above.


Well, the main cause why people started to use AggregateCoreFilter and
others is because pre-Newton, it was litterally impossible to assign
different allocation ratios in between computes except if you were grouping
them in aggregates and using those filters.
Now that ratios are per-compute, there is no need to keep those filters
except if you don't touch computes nova.conf's so that it defaults to the
scheduler ones. The crazy usecase would be like "I have 1000+ computes and I
just want to apply specific ratios to only one or two" but then, I'd second
Jay and say "Config management is the solution to your problem".


Incidentally, this same design flaw is the reason that availability zones
are so poorly defined in Nova. There is actually no such thing as an
availability zone in Nova. Instead, an AZ is merely a metadata tag (or a
CONF option! :( ) that may or may not exist against a host aggregate.
There's lots of spaghetti in Nova due to the decision to use host aggregate
metadata for availability zone information, which should have always been
the domain of a **configuration management system** to set. [*]


IMHO, not exactly the root cause why we have spaghetti code for AZs. I
rather like the idea to see an availability zone as just a user-visible
aggregate, because it makes things simple to understand.
What the spaghetti code is due to is because the transitive relationship
between an aggregate, a compute and an instance is misunderstood and we
introduced the notion of "instance AZ" which is a fool. Instances shouldn't
have a field saying "here is my AZ", it should rather be a flag saying "what
the user wanted as AZ ? (None being a choice) "

In the Placement service, we have the concept of aggregates, too. However,
in Placement, an aggregate (note: not "host aggregate") is merely a grouping
mechanism for resource providers. Placement aggregates do not have any
attributes themselves -- they merely represent the relationship between
resource providers. Placement aggregates suffer from neither of the above
listed design flaws because they are not buckets for metadata.

ok </rant>.

Best,
-jay

[*] Note the assumption on line 97 here:


https://github.com/openstack/nova/blob/master/nova/availability_zones.py#L96-L100

On Wed, Jan 17, 2018 at 5:24 AM, melanie witt <melwi...@gmail.com
<mailto:melwi...@gmail.com>> wrote:

     Hello Stackers,

     This is a heads up to any of you using the AggregateCoreFilter,
     AggregateRamFilter, and/or AggregateDiskFilter in the filter
     scheduler. These filters have effectively allowed operators to set
     overcommit ratios per aggregate rather than per compute node in <=
     Newton.

     Beginning in Ocata, there is a behavior change where aggregate-based
     overcommit ratios will no longer be honored during scheduling.
     Instead, overcommit values must be set on a per compute node basis
     in nova.conf.

     Details: as of Ocata, instead of considering all compute nodes at
     the start of scheduler filtering, an optimization has been added to
     query resource capacity from placement and prune the compute node
     list with the result *before* any filters are applied. Placement
     tracks resource capacity and usage and does *not* track aggregate
     metadata [1]. Because of this, placement cannot consider
     aggregate-based overcommit and will exclude compute nodes that do
     not have capacity based on per compute node overcommit.

     How to prepare: if you have been relying on per aggregate
     overcommit, during your upgrade to Ocata, you must change to using
     per compute node overcommit ratios in order for your scheduling
     behavior to stay consistent. Otherwise, you may notice increased
     NoValidHost scheduling failures as the aggregate-based overcommit is
     no longer being considered. You can safely remove the
     AggregateCoreFilter, AggregateRamFilter, and AggregateDiskFilter
     from your enabled_filters and you do not need to replace them with
     any other core/ram/disk filters. The placement query takes care of
     the core/ram/disk filtering instead, so CoreFilter, RamFilter, and
     DiskFilter are redundant.

     Thanks,
     -melanie

     [1] Placement has been a new slate for resource management and prior
     to placement, there were conflicts between the different methods for
     setting overcommit ratios that were never addressed, such as, "which
     value to take if a compute node has overcommit set AND the aggregate
     has it set? Which takes precedence?" And, "if a compute node is in
     more than one aggregate, which overcommit value should be taken?"
     So, the ambiguities were not something that was desirable to bring
     forward into placement.


__________________________________________________________________________
     OpenStack Development Mailing List (not for usage questions)
     Unsubscribe:
     openstack-dev-requ...@lists.openstack.org?subject:unsubscribe

<http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe>
     http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
     <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>





__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] heads up to users of Aggregate[Core|Ram|Disk]Filter: behavior change in >= Ocata

Reply via email to