[Yahoo-eng-team] [Bug 1878719] Re: DHCP Agent's iptables CHECKSUM rule causes skb_warn_bad_offload kernel
[Expired for neutron because there has been no activity for 60 days.] ** Changed in: neutron Status: Incomplete => Expired -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1878719 Title: DHCP Agent's iptables CHECKSUM rule causes skb_warn_bad_offload kernel Status in neutron: Expired Bug description: We are hitting this kernel issue due to a DHCP agent CHECKSUM rule that is probably obsolete/not needed: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1840619 Upgrading the kernel is one workaround, but more disruptive, especially since still using CentOS7, and kernel fix only made it into 4.19. We should just remove this rule altogether. As per the kernel issue: "The changes are limited only to users which have CHECKSUM rules enabled in their iptables configs. Openstack commonly configures such rules on deployment, even though they are not necessary, as almost all packets have their checksum calculated by NICs these days, and CHECKSUM is only around to service old dhcp clients which would discard UDP packets with empty checksums. This commit was selected for upstream -stable 4.18.13, and has made its way into bionic 4.15.0-58.64 by LP #1836426. There have been no reported problems and those kernels would have had sufficient testing with Openstack and its configured iptables rules. If any users are affected by regression, then they can simply delete any CHECKSUM entries in their iptables configs." I can see the metadata agent's CHECKSUM rule was alreayd removed last year: https://github.com/openstack/neutron/commit/04e995be9898ceaa009344509dc16ca7f589d814 Is there any reason the DHCP agent's was not? Is it safe to just remove this function and where it is invoked from altogether? https://github.com/openstack/neutron/blob/master/neutron/agent/linux/dhcp.py#L1739 https://github.com/openstack/neutron/blob/cb55643a0695ebc5b41f50f6edb1546bcc676b71/neutron/agent/linux/dhcp.py#L1691 To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1878719/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1886537] Re: Missing parameters in python-glanceclient image-import command documentation
Thanks for pointing this out. I targeted it to python-glanceclient instead of the service. ** Project changed: glance => python-glanceclient ** Changed in: python-glanceclient Status: New => Triaged ** Changed in: python-glanceclient Importance: Undecided => Medium ** Tags added: low-hanging-fruit -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to Glance. https://bugs.launchpad.net/bugs/1886537 Title: Missing parameters in python-glanceclient image-import command documentation Status in Glance Client: Triaged Bug description: Description python-glanceclient documentation [0] for image-import command shows just an optional parameter: ``` usage: glance image-import [--import-method ] ``` But in the actual CLI there're more parameters available: ``` $ glance help image-import usage: glance image-import [--import-method ] [--uri ] [--store ] [--stores ] [--all-stores [True|False]] [--allow-failure [True|False]] ``` How to reproduce 1. Open python-glanceclient image-import command documentation [0] 2. Check the params Expected behavior -- To have all the parameters documented Actual behavior -- Parameters are missing, compared to an actual `glance help image- import` command output. [0] https://docs.openstack.org/python- glanceclient/latest/cli/details.html#glance-image-import To manage notifications about this bug go to: https://bugs.launchpad.net/python-glanceclient/+bug/1886537/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1887405] [NEW] Race condition while processing security_groups_member_updated events (ipset)
Public bug reported: # Summary Race condition while processing security_groups_member_updated events (ipset) # Overview We have a customer that uses heat templates to deploy large environments (e.g. 21 instances) with a significant number of security groups (e.g. 60) that use bi-directional remote group references for both ingress and egress filtering. These heat stacks are deployed using a CI pipeline and intermittently suffer from application layer failures due to broken network connectivity. We found that this was caused by the ipsets used to implement remote_group memberships missing IPs from their member lists. Troubleshooting suggests this is caused by a race condition, which I've attempted to describe in detail below. Version: `54e1a6b1bc378c0745afc03987d0fea241b826ae` (HEAD of stable/rocky as of Jan 26, 2020), though I suspect this issue persists through master. I'm working on getting some multi-node environments deployed (I don't think it's possible to reproduce this with a single hypervisor) and hope to provide reproduction steps on Rocky and master soon. I wanted to get this report submitted as-is with the hopes that an experienced Neutron dev might be able to spot possible solutions or provide diagnostic insight that I am not yet able to produce. I suspect this report may be easier to read with some markdown, so please feel free to read it in a gist: https://gist.github.com/cfarquhar/20fddf2000a83216021bd15b512f772b Also, this diagram is probably critical to following along: https ://user-images.githubusercontent.com/1253665/87317744-0a75b180-c4ed- 11ea-9bad-085019c0f954.png # Race condition symptoms Given the following security groups/rules: ``` | secgroup name | secgroup id | direction | remote group | dest port | |---|--|---|--|---| | server| fcd6cf12-2ac9-4704-9208-7c6cb83d1a71 | ingress | b52c8c54-b97a-477d-8b68-f4075e7595d9 | 9092 | | client| b52c8c54-b97a-477d-8b68-f4075e7595d9 | egress| fcd6cf12-2ac9-4704-9208-7c6cb83d1a71 | 9092 | ``` And the following instances: ``` | instance name | hypervisor | ip | secgroup assignment | |---||-|-| | server01 | compute01 | 192.168.0.1 | server | | server02 | compute02 | 192.168.0.2 | server | | server03 | compute03 | 192.168.0.3 | server | | client01 | compute04 | 192.168.0.4 | client | ``` We would expect to find the following ipset representing the `server` security group members on `compute04`: ``` # ipset list NIPv4fcd6cf12-2ac9-4704-9208- Name: NIPv4fcd6cf12-2ac9-4704-9208- Type: hash:net Revision: 6 Header: family inet hashsize 1024 maxelem 65536 Size in memory: 536 References: 4 Number of entries: 3 Members: 192.168.0.1 192.168.0.2 192.168.0.3 ``` What we actually get when the race condition is triggered is an incomplete list of members in the ipset. The member list could contain anywhere between zero and two of the expected IPs. # Triggering the race condition The problem occurs when `security_group_member_updated` events arrive between `port_update` steps 12 and 22 (see diagram and process details below). - `port_update` step 12 retrieves the remote security groups' member lists, which are not necessarily complete yet. - `port_update` step 22 adds the port to `IptablesFirewallDriver.ports()`. This results in `security_group_member_updated` step 3 looking for the port to apply the updated member list to (in `IptablesFirewallDriver.ports()`) BEFORE it has been added by `port_update`'s step 22. This causes the membership update event to effectively be discarded. We are then left with whatever the remote security group's member list was when the `port_update` process retrieved it at step 12. This state persists until something triggers the port being re-added to the `updated_ports` list (e.g. agent restart, another remote group membership change, local security group addition/removal, etc). # Race condition details The race condition occurs in the linuxbridge agent between the two following operations: 1) Processing a `port_update` event when an instance is first created 2) Processing `security_group_member_updated` events for the instance's remote security groups. Either of these operations can result in creating or mutating an ipset from `IpsetManager.set_members()`. The relevant control flow sequence for each operation is listed below. I've left out any branches that did not seem to be relevant to the race condition. ## Processing a `port_update` event: 1) We receive an RPC port_update event via `LinuxBridgeRpcCallbacks.port_update()`, which adds the tap device to the `LinuxBridgeRpcCallbacks.updated_devices` list 2) Sleep until the next `CommonAgentLoop.daemon_loop()`
[Yahoo-eng-team] [Bug 1837882] Re: while creating external network subnet range through Horizon UI helper message says give subnet range as comma separated but accept hyphen("-")
** Project changed: horizon-cisco-ui => horizon -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Dashboard (Horizon). https://bugs.launchpad.net/bugs/1837882 Title: while creating external network subnet range through Horizon UI helper message says give subnet range as comma separated but accept hyphen("-") Status in OpenStack Dashboard (Horizon): New Bug description: Description: When user is trying to create an external network with subnet range using hyphen "-" delimiter (20.x.x.10-20.x.x.100) even when the helper message says clearly notify user the "(start_ip_range,end_ip_range) as comma separated delimeter" is going through successfully without error message. please see the Ui attachment for more info. Also after giving Hyphen range in the external subnet. External network window leads to an unexpected error saying "Specify additional attributes for the subnet" without giving any proper error message. Pre - condition : create a tenant router r1 router with gateway external network . Step1 ->Create external network using admin user from Horizon. Step2 ->on the create network window . name -> external project->admin provider network type->external leave the physical network blank(but make sure user has tier 0 gateway set) Step3-> click next-> subnet name -> subnet1 network address-> Try to give external network range as that of the gateway ip. Gateway IP-> Give the gateway IP of the subnet Step4-> under subnet details. Uncheck the "Enable Dhcp". allocation pool-> give subnet range as (20.x.x.10-20.x.x.100) Step5-> Once user click "next" window comes back to step2 and further clicking next leads to message "Specify additional attributes for the subnet" To manage notifications about this bug go to: https://bugs.launchpad.net/horizon/+bug/1837882/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1837882] [NEW] while creating external network subnet range through Horizon UI helper message says give subnet range as comma separated but accept hyphen("-")
You have been subscribed to a public bug: Description: When user is trying to create an external network with subnet range using hyphen "-" delimiter (20.x.x.10-20.x.x.100) even when the helper message says clearly notify user the "(start_ip_range,end_ip_range) as comma separated delimeter" is going through successfully without error message. please see the Ui attachment for more info. Also after giving Hyphen range in the external subnet. External network window leads to an unexpected error saying "Specify additional attributes for the subnet" without giving any proper error message. Pre - condition : create a tenant router r1 router with gateway external network . Step1 ->Create external network using admin user from Horizon. Step2 ->on the create network window . name -> external project->admin provider network type->external leave the physical network blank(but make sure user has tier 0 gateway set) Step3-> click next-> subnet name -> subnet1 network address-> Try to give external network range as that of the gateway ip. Gateway IP-> Give the gateway IP of the subnet Step4-> under subnet details. Uncheck the "Enable Dhcp". allocation pool-> give subnet range as (20.x.x.10-20.x.x.100) Step5-> Once user click "next" window comes back to step2 and further clicking next leads to message "Specify additional attributes for the subnet" ** Affects: horizon Importance: Undecided Status: New -- while creating external network subnet range through Horizon UI helper message says give subnet range as comma separated but accept hyphen("-") https://bugs.launchpad.net/bugs/1837882 You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Dashboard (Horizon). -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1688673] Re: cpu_realtime_mask handling is not intuitive
Reviewed: https://review.opendev.org/461456 Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=9fc63c764429c10f9041e6b53659e0cbd595bf6b Submitter: Zuul Branch:master commit 9fc63c764429c10f9041e6b53659e0cbd595bf6b Author: Chris Friesen Date: Mon May 1 11:24:06 2017 -0600 hardware: Tweak the 'cpu_realtime_mask' handling slightly If the end-user specifies a cpu_realtime_mask that does not begin with a carat (i.e. it is not a purely-exclusion mask) it's likely that they're expecting us to use the exact mask that they have specified, not realizing that we default to all-vCPUs-are-RT. Let's make nova's behaviour a bit more friendly by correctly handling this scenario. Note that the end-user impact of this is minimal/non-existent. As discussed in bug #1884231, the only way a user could have used this before would be if they'd configured an emulator thread and purposefully set an invalid 'hw:cpu_realtime_mask' set. In fact, they wouldn't have been able to use this value at all if they used API microversion 2.86 (extra spec validation). Part of blueprint use-pcpu-and-vcpu-in-one-instance Change-Id: Id81859186de6fb6b728ad566a532244008fe77d0 Closes-Bug: #1688673 ** Changed in: nova Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1688673 Title: cpu_realtime_mask handling is not intuitive Status in OpenStack Compute (nova): Fix Released Bug description: The nova code implicitly assumes that all vCPUs are realtime in nova.virt.hardware.vcpus_realtime_topology(), and then it appends the user-specified mask. This only makes sense if the user-specified cpu_realtime_mask is an exclusion mask, but this isn't documented anywhere. It would make more sense to simply use the mask as passed-in from the end-user. In order to preserve the current behaviour we should probably special- case the scenario where the passed-in cpu_realtime_mask starts with a "^" (indicating an exclusion). To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1688673/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1887385] [NEW] String to byte conversion should provide the encoding type
Public bug reported: In [1], in case self.port is a string, the encoding method should be provided. [1]https://github.com/openstack/neutron/blob/73557abefcba1c6ce0cef709d1082674c0217485/neutron/tests/functional/test_server.py#L231 ** Affects: neutron Importance: Undecided Assignee: Rodolfo Alonso (rodolfo-alonso-hernandez) Status: In Progress -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1887385 Title: String to byte conversion should provide the encoding type Status in neutron: In Progress Bug description: In [1], in case self.port is a string, the encoding method should be provided. [1]https://github.com/openstack/neutron/blob/73557abefcba1c6ce0cef709d1082674c0217485/neutron/tests/functional/test_server.py#L231 To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1887385/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1887380] [NEW] Attaching virtual GPU devices to guests in nova
Public bug reported: This bug tracker is for errors with the documentation, use the following as a template and remove or add fields as you see fit. Convert [ ] into [x] to check boxes: - [X] This is a doc addition request. Hi, a problem came up when we are using nova(Queens) configured with the vGPU feature to create several instances. It seems multiple instances preempt the same vGPU resource, in our case, on the exact same instance which has acquired a vGPU already. Here is the error reported in the log: "libvirt.libvirtError: Requested operation is not valid: mediated device /sys/bus/mdev/devices/xxx is in use by driver QEMU, domain xxx" Apparently, nova is trying to allocate the vGPU resource that is already being used by another instance. Also, we ruled out a situation that there is not enough vGPU resources on the host. In our case, 25% of instances fell into error-creating state while we are only creating instances which only need 50% of all vGPU resources. From our perspective, the problem is with the nova-scheduler. Any idea how to work this out? Thanks Ruien Zhang zhangru...@bytedance.com --- Release: 21.1.0.dev214 on 2020-04-28 20:09:00 SHA: d19f1ac47b0a5fe1dd80b7187087e5810501f16c Source: https://opendev.org/openstack/nova/src/doc/source/admin/virtual-gpu.rst URL: https://docs.openstack.org/nova/latest/admin/virtual-gpu.html ** Affects: nova Importance: Undecided Status: New ** Tags: doc -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1887380 Title: Attaching virtual GPU devices to guests in nova Status in OpenStack Compute (nova): New Bug description: This bug tracker is for errors with the documentation, use the following as a template and remove or add fields as you see fit. Convert [ ] into [x] to check boxes: - [X] This is a doc addition request. Hi, a problem came up when we are using nova(Queens) configured with the vGPU feature to create several instances. It seems multiple instances preempt the same vGPU resource, in our case, on the exact same instance which has acquired a vGPU already. Here is the error reported in the log: "libvirt.libvirtError: Requested operation is not valid: mediated device /sys/bus/mdev/devices/xxx is in use by driver QEMU, domain xxx" Apparently, nova is trying to allocate the vGPU resource that is already being used by another instance. Also, we ruled out a situation that there is not enough vGPU resources on the host. In our case, 25% of instances fell into error-creating state while we are only creating instances which only need 50% of all vGPU resources. From our perspective, the problem is with the nova-scheduler. Any idea how to work this out? Thanks Ruien Zhang zhangru...@bytedance.com --- Release: 21.1.0.dev214 on 2020-04-28 20:09:00 SHA: d19f1ac47b0a5fe1dd80b7187087e5810501f16c Source: https://opendev.org/openstack/nova/src/doc/source/admin/virtual-gpu.rst URL: https://docs.openstack.org/nova/latest/admin/virtual-gpu.html To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1887380/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1872671] Re: [Focal] Policy files are missing
This bug was fixed in the package horizon - 3:18.4.2~git2020070209.392bc2482-0ubuntu1~cloud0 --- horizon (3:18.4.2~git2020070209.392bc2482-0ubuntu1~cloud0) focal-victoria; urgency=medium . * New upstream release for the Ubuntu Cloud Archive. . horizon (3:18.4.2~git2020070209.392bc2482-0ubuntu1) groovy; urgency=medium . * New upstream snapshot for OpenStack Victoria. * d/control: Align (Build-)Depends with upstream. * d/p/fix-skipped-config-files.patch: Dropped. Fixed upstream. * d/control: Update Standards-Version to 4.5.0. . horizon (3:18.3.2-0ubuntu2) groovy; urgency=medium . * d/p/fix-skipped-config-files.patch: Ensure that config files are included in the package (LP: #1872671). . horizon (3:18.3.2-0ubuntu1) groovy; urgency=medium . * New upstream release for OpenStack Ussuri (LP: #1877642). ** Changed in: cloud-archive Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Dashboard (Horizon). https://bugs.launchpad.net/bugs/1872671 Title: [Focal] Policy files are missing Status in Ubuntu Cloud Archive: Fix Released Status in Ubuntu Cloud Archive ussuri series: Triaged Status in OpenStack Dashboard (Horizon): Fix Released Status in horizon package in Ubuntu: Fix Released Status in horizon source package in Focal: Triaged Bug description: [Impact] [Test Case] python3-django-horizon: Installed: 3:18.2.1~git2020032709.2c4470272-0ubuntu1 After a fresh install of openstack dashboard on focal, apache2 error.log contains hundreds of error message about missing policy files: [Tue Apr 14 09:12:34.558183 2020] [wsgi:error] [pid 3062:tid 140253993006848] [remote 10.64.255.1:50364] WARNING openstack_auth.policy No policy rules for service 'identity' in /usr/lib/python3/dist-packages/openstack_dashboard/conf/keystone_policy.json [Tue Apr 14 09:12:34.559486 2020] [wsgi:error] [pid 3062:tid 140253993006848] [remote 10.64.255.1:50364] WARNING openstack_auth.policy No policy rules for service 'compute' in /usr/lib/python3/dist-packages/openstack_dashboard/conf/nova_policy.json and files under ['/usr/lib/python3/dist-packages/openstack_dashboard/conf/nova_policy.d'] [Tue Apr 14 09:12:34.560622 2020] [wsgi:error] [pid 3062:tid 140253993006848] [remote 10.64.255.1:50364] WARNING openstack_auth.policy No policy rules for service 'volume' in /usr/lib/python3/dist-packages/openstack_dashboard/conf/cinder_policy.json and files under ['/usr/lib/python3/dist-packages/openstack_dashboard/conf/cinder_policy.d'] [Tue Apr 14 09:12:34.561703 2020] [wsgi:error] [pid 3062:tid 140253993006848] [remote 10.64.255.1:50364] WARNING openstack_auth.policy No policy rules for service 'image' in /usr/lib/python3/dist-packages/openstack_dashboard/conf/glance_policy.json [Tue Apr 14 09:12:34.562703 2020] [wsgi:error] [pid 3062:tid 140253993006848] [remote 10.64.255.1:50364] WARNING openstack_auth.policy No policy rules for service 'network' in /usr/lib/python3/dist-packages/openstack_dashboard/conf/neutron_policy.json The policy files are indeed missing from the package: dpkg -L python3-django-horizon | grep json$ /usr/lib/python3/dist-packages/horizon/xstatic/pkg/angular/data/errors.json /usr/lib/python3/dist-packages/horizon/xstatic/pkg/angular/data/version.json /usr/lib/python3/dist-packages/horizon-18.2.1.dev1.egg-info/pbr.json Logging in with a normal user account (without admin role) still shows the admin panel and buttons a normal user cannot use, like identity/users "create user". Trying to use these either doesn't work or throws errors like "Unable to retrieve xxx". Copying the policy files from the source package solves the problem. [Regression Potential] Very low, this patch is just adding necessary policy files back into the package that have already existed in prior releases. To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/1872671/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1887377] [NEW] nova does not loadbalance asignmnet of resources on a host based on avaiablity of pci device, hugepages or pcpus.
Public bug reported: Nova has supported hugpages, cpu pinning and pci numa affintiy for a very long time. since its introduction the advice has always been to create a flavor that mimic your typeical hardware toplogy. i.e. if all your compute host have 2 numa nodes the you should create flavor that request 2 numa nodes. for along time operators have ignored this advice and continued to create singel numa node flavor sighting that after 5+ year of hardware venders working with VNF vendor to make there product numa aware, vnf often still do not optimize properly for a multi numa environment. as a result many operator still deploy single numa vms although that is becoming less common over time. when you deploy a vm with a single numa node today we more or less iterate over the host numa node in order and assign the vm to the first numa nodes where it fits. on a host without any pci devices whitelisted for openstack management this behvaior result in numa nodes being filled linerally form numa 0 to numa n. that mean if a host had 100G of hugepage on both numa node 0 and 1 and you schduled 101 1G singel numa vms to the host, 100 vm would spawn on numa0 and 1 vm would spwan on numa node 1. that means that the first 100 vms would all contened for cpu resouces on the first numa node while the last vm had all of the secound numa ndoe to its own use. the correct behavior woudl be for nova to round robin asign the vms attepmetin to keep the resouce avapiableity blanced. this will maxiumise performance for indivigual vms while pessimisng the schduling of large vms on a host. to this end a new numa blancing config option (unset, pack or spread) should be added and we should sort numa nodes in decending(spread) or acending(pack) order based on pMEM, pCPUs, mempages and pci devices in that sequence. in future release when numa is in placment this sorting will need to be done in a weigher that sorts the allocation caindiates based on the same pack/spread cirtira. i am filing this as a bug not a feature as this will have a significant impact for existing deployment that either expected https://specs.openstack.org/openstack/nova-specs/specs/pike/implemented /reserve-numa-with-pci.html to implement this logic already or who do not follow our existing guidance on creating flavor that align to the host topology. ** Affects: nova Importance: Undecided Assignee: sean mooney (sean-k-mooney) Status: New ** Tags: numa -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1887377 Title: nova does not loadbalance asignmnet of resources on a host based on avaiablity of pci device, hugepages or pcpus. Status in OpenStack Compute (nova): New Bug description: Nova has supported hugpages, cpu pinning and pci numa affintiy for a very long time. since its introduction the advice has always been to create a flavor that mimic your typeical hardware toplogy. i.e. if all your compute host have 2 numa nodes the you should create flavor that request 2 numa nodes. for along time operators have ignored this advice and continued to create singel numa node flavor sighting that after 5+ year of hardware venders working with VNF vendor to make there product numa aware, vnf often still do not optimize properly for a multi numa environment. as a result many operator still deploy single numa vms although that is becoming less common over time. when you deploy a vm with a single numa node today we more or less iterate over the host numa node in order and assign the vm to the first numa nodes where it fits. on a host without any pci devices whitelisted for openstack management this behvaior result in numa nodes being filled linerally form numa 0 to numa n. that mean if a host had 100G of hugepage on both numa node 0 and 1 and you schduled 101 1G singel numa vms to the host, 100 vm would spawn on numa0 and 1 vm would spwan on numa node 1. that means that the first 100 vms would all contened for cpu resouces on the first numa node while the last vm had all of the secound numa ndoe to its own use. the correct behavior woudl be for nova to round robin asign the vms attepmetin to keep the resouce avapiableity blanced. this will maxiumise performance for indivigual vms while pessimisng the schduling of large vms on a host. to this end a new numa blancing config option (unset, pack or spread) should be added and we should sort numa nodes in decending(spread) or acending(pack) order based on pMEM, pCPUs, mempages and pci devices in that sequence. in future release when numa is in placment this sorting will need to be done in a weigher that sorts the allocation caindiates based on the same pack/spread cirtira. i am filing this as a bug not a feature as this will have a significant impact for existing deployment that either
[Yahoo-eng-team] [Bug 1887363] [NEW] [ovn-octavia-provider] Functional tests job fails
Public bug reported: Functional tests job fails on: 2020-07-13 08:22:50.145117 | controller | + /home/zuul/src/opendev.org/openstack/neutron/tools/configure_for_func_testing.sh:_install_base_deps:113 : source /home/zuul/src/opendev.org/openstack/ovn-octavia-provider/devstack/lib/ovs 2020-07-13 08:22:50.145252 | controller | /home/zuul/src/opendev.org/openstack/neutron/tools/configure_for_func_testing.sh: line 113: /home/zuul/src/opendev.org/openstack/ovn-octavia-provider/devstack/lib/ovs: No such file or directory https://9ce43a75e3387ceb8909-2b4f2fa211fea8445ec0f4a568f6056b.ssl.cf2.rackcdn.com/740625/1/check /ovn-octavia-provider-functional/714ba02/job-output.txt ** Affects: neutron Importance: Undecided Assignee: Maciej Jozefczyk (maciej.jozefczyk) Status: New ** Tags: ovn-octavia-provider ** Changed in: neutron Assignee: (unassigned) => Maciej Jozefczyk (maciej.jozefczyk) -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1887363 Title: [ovn-octavia-provider] Functional tests job fails Status in neutron: New Bug description: Functional tests job fails on: 2020-07-13 08:22:50.145117 | controller | + /home/zuul/src/opendev.org/openstack/neutron/tools/configure_for_func_testing.sh:_install_base_deps:113 : source /home/zuul/src/opendev.org/openstack/ovn-octavia-provider/devstack/lib/ovs 2020-07-13 08:22:50.145252 | controller | /home/zuul/src/opendev.org/openstack/neutron/tools/configure_for_func_testing.sh: line 113: /home/zuul/src/opendev.org/openstack/ovn-octavia-provider/devstack/lib/ovs: No such file or directory https://9ce43a75e3387ceb8909-2b4f2fa211fea8445ec0f4a568f6056b.ssl.cf2.rackcdn.com/740625/1/check /ovn-octavia-provider-functional/714ba02/job-output.txt To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1887363/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp