Re: [ovs-discuss] Timeout waiting for network-vif-plugged for instance with vm_state building and task_state spawning
I've been hitting this issue from time to time. There are discussions and bugs for this. http://lists.openstack.org/pipermail/openstack-discuss/2021-November/025941.html http://lists.openstack.org/pipermail/openstack-discuss/2021-November/025892.html https://bugs.launchpad.net/heat/+bug/1694371 https://bugs.launchpad.net/nova/+bug/1944779 I don't think it's about OVN/OVS, because the port is up in OVN NB. And restarting Neutron will make it work again. I doubt Neutron either doesn't pick it up or doesn't send it out to Nova compute. When I hit this issue, I enable debugging for Neutron and restart it, but that makes problem gone. If I keep logging on debug level till next time, it will dump too much loggings. I would be good for Neutron to support enable debug logging on-the-fly. If it happens to you more frequently, you can try to enable debug and see if that captures anything helpful. Tony From: discuss on behalf of Christian Stelter Sent: December 29, 2021 09:51 AM To: ovs-discuss@openvswitch.org Subject: [ovs-discuss] Timeout waiting for network-vif-plugged for instance with vm_state building and task_state spawning Hi! We observe in our well aged OpenStack installation (kolla-ansible deployed victoria release) strange timeouts while creating new instances. It seems that the feedback from the ovn controller(?) doesn't reach the nova compute/libvirt process within 300s. This does not happen all the time, but 1/3 to 1/5 of the launches have problems. This is what we see in the logs: --- nova-compute --- 2021-12-29 07:27:06.972 7 INFO nova.compute.claims [req-57321447-272f-4820-813d-925a43d69462 0b16e2645828414d86f094c2a7a693eb 66fcb198228e4d9d97f09dc7ce1d5130 - default default] [instance: aef0a1b3-99d6-43e2-89db-b7bd4e0516d9] Claim successful on node compute07 2021-12-29 07:27:07.757 7 INFO nova.virt.libvirt.driver [req-57321447-272f-4820-813d-925a43d69462 0b16e2645828414d86f094c2a7a693eb 66fcb198228e4d9d97f09dc7ce1d5130 - default default] [instance: aef0a1b3-99d6-43e2-89db-b7bd4e0516d9] Creating image 2021-12-29 07:27:12.514 7 INFO nova.compute.manager [req-01948206-8ff3-404c-b71f-86ed7d1791d4 - - - - -] [instance: aef0a1b3-99d6-43e2-89db-b7bd4e0516d9] VM Started (Lifecycle Event) 2021-12-29 07:27:12.559 7 INFO nova.compute.manager [req-01948206-8ff3-404c-b71f-86ed7d1791d4 - - - - -] [instance: aef0a1b3-99d6-43e2-89db-b7bd4e0516d9] VM Paused (Lifecycle Event) 2021-12-29 07:27:12.649 7 INFO nova.compute.manager [req-01948206-8ff3-404c-b71f-86ed7d1791d4 - - - - -] [instance: aef0a1b3-99d6-43e2-89db-b7bd4e0516d9] During sync_power_state the instance has a pending task (spawning). Skip. 2021-12-29 07:32:12.518 7 WARNING nova.virt.libvirt.driver [req-57321447-272f-4820-813d-925a43d69462 0b16e2645828414d86f094c2a7a693eb 66fcb198228e4d9d97f09dc7ce1d5130 - default default] [instance: aef0a1b3-99d6-43e2-89db-b7bd4e0516d9] Timeout waiting for [('network-vif-plugged', 'c641bae9-edcc-4e64-93ec-2dcbf5010009')] for instance with vm_state building and task_state spawning: eventlet.timeout.Timeout: 300 seconds 2021-12-29 07:32:12.522 7 INFO nova.compute.manager [req-01948206-8ff3-404c-b71f-86ed7d1791d4 - - - - -] [instance: aef0a1b3-99d6-43e2-89db-b7bd4e0516d9] VM Resumed (Lifecycle Event) 2021-12-29 07:32:12.526 7 INFO nova.virt.libvirt.driver [-] [instance: aef0a1b3-99d6-43e2-89db-b7bd4e0516d9] Instance spawned successfully. 2021-12-29 07:32:12.527 7 INFO nova.compute.manager [req-57321447-272f-4820-813d-925a43d69462 0b16e2645828414d86f094c2a7a693eb 66fcb198228e4d9d97f09dc7ce1d5130 - default default] [instance: aef0a1b3-99d6-43e2-89db-b7bd4e0516d9] Took 304.77 seconds to spawn the instance on the hypervisor. 2021-12-29 07:32:12.613 7 INFO nova.compute.manager [req-01948206-8ff3-404c-b71f-86ed7d1791d4 - - - - -] [instance: aef0a1b3-99d6-43e2-89db-b7bd4e0516d9] During sync_power_state the instance has a pending task (spawning). Skip. 2021-12-29 07:32:12.632 7 INFO nova.compute.manager [req-57321447-272f-4820-813d-925a43d69462 0b16e2645828414d86f094c2a7a693eb 66fcb198228e4d9d97f09dc7ce1d5130 - default default] [instance: aef0a1b3-99d6-43e2-89db-b7bd4e0516d9] Took 305.70 seconds to build instance. --- nova-compute --- --- neutron-server --- 2021-12-29 07:27:09.115 23 INFO neutron.db.ovn_revision_numbers_db [req-cb418ddf-a068-4511-9fed-f0914945c76d 0b16e2645828414d86f094c2a7a693eb 66fcb198228e4d9d97f09dc7ce1d5130 - default default] Successfully bumped revision number for resource c641bae9-edcc-4e64-93ec-2dcbf5010009 (type: ports) to 1 2021-12-29 07:27:10.054 23 INFO neutron.db.ovn_revision_numbers_db [req-8e3f0f54-8164-4d6d-acab-3be912b0afb6 217d8068ede5469ebfbebaf3e2242840 8775aa7d41654c84b8b17b3a570adf4d - default default] Successfully bumped revision number for resource c641bae9-edcc-4e64-93ec-2dcbf5010009 (type: ports) to 2 2021-12-29 07:27:10.393 23 INFO neutron.db.ovn_revision_numbers_db
[ovs-discuss] Timeout waiting for network-vif-plugged for instance with vm_state building and task_state spawning
Hi! We observe in our well aged OpenStack installation (kolla-ansible deployed victoria release) strange timeouts while creating new instances. It seems that the feedback from the ovn controller(?) doesn't reach the nova compute/libvirt process within 300s. This does not happen all the time, but 1/3 to 1/5 of the launches have problems. This is what we see in the logs: --- nova-compute --- 2021-12-29 07:27:06.972 7 INFO nova.compute.claims [req-57321447-272f-4820-813d-925a43d69462 0b16e2645828414d86f094c2a7a693eb 66fcb198228e4d9d97f09dc7ce1d5130 - default default] [instance: aef0a1b3-99d6-43e2-89db-b7bd4e0516d9] Claim successful on node compute07 2021-12-29 07:27:07.757 7 INFO nova.virt.libvirt.driver [req-57321447-272f-4820-813d-925a43d69462 0b16e2645828414d86f094c2a7a693eb 66fcb198228e4d9d97f09dc7ce1d5130 - default default] [instance: aef0a1b3-99d6-43e2-89db-b7bd4e0516d9] Creating image 2021-12-29 07:27:12.514 7 INFO nova.compute.manager [req-01948206-8ff3-404c-b71f-86ed7d1791d4 - - - - -] [instance: aef0a1b3-99d6-43e2-89db-b7bd4e0516d9] VM Started (Lifecycle Event) 2021-12-29 07:27:12.559 7 INFO nova.compute.manager [req-01948206-8ff3-404c-b71f-86ed7d1791d4 - - - - -] [instance: aef0a1b3-99d6-43e2-89db-b7bd4e0516d9] VM Paused (Lifecycle Event) 2021-12-29 07:27:12.649 7 INFO nova.compute.manager [req-01948206-8ff3-404c-b71f-86ed7d1791d4 - - - - -] [instance: aef0a1b3-99d6-43e2-89db-b7bd4e0516d9] During sync_power_state the instance has a pending task (spawning). Skip. 2021-12-29 07:32:12.518 7 WARNING nova.virt.libvirt.driver [req-57321447-272f-4820-813d-925a43d69462 0b16e2645828414d86f094c2a7a693eb 66fcb198228e4d9d97f09dc7ce1d5130 - default default] [instance: aef0a1b3-99d6-43e2-89db-b7bd4e0516d9] Timeout waiting for [('network-vif-plugged', 'c641bae9-edcc-4e64-93ec-2dcbf5010009')] for instance with vm_state building and task_state spawning: eventlet.timeout.Timeout: 300 seconds 2021-12-29 07:32:12.522 7 INFO nova.compute.manager [req-01948206-8ff3-404c-b71f-86ed7d1791d4 - - - - -] [instance: aef0a1b3-99d6-43e2-89db-b7bd4e0516d9] VM Resumed (Lifecycle Event) 2021-12-29 07:32:12.526 7 INFO nova.virt.libvirt.driver [-] [instance: aef0a1b3-99d6-43e2-89db-b7bd4e0516d9] Instance spawned successfully. 2021-12-29 07:32:12.527 7 INFO nova.compute.manager [req-57321447-272f-4820-813d-925a43d69462 0b16e2645828414d86f094c2a7a693eb 66fcb198228e4d9d97f09dc7ce1d5130 - default default] [instance: aef0a1b3-99d6-43e2-89db-b7bd4e0516d9] Took 304.77 seconds to spawn the instance on the hypervisor. 2021-12-29 07:32:12.613 7 INFO nova.compute.manager [req-01948206-8ff3-404c-b71f-86ed7d1791d4 - - - - -] [instance: aef0a1b3-99d6-43e2-89db-b7bd4e0516d9] During sync_power_state the instance has a pending task (spawning). Skip. 2021-12-29 07:32:12.632 7 INFO nova.compute.manager [req-57321447-272f-4820-813d-925a43d69462 0b16e2645828414d86f094c2a7a693eb 66fcb198228e4d9d97f09dc7ce1d5130 - default default] [instance: aef0a1b3-99d6-43e2-89db-b7bd4e0516d9] Took 305.70 seconds to build instance. --- nova-compute --- --- neutron-server --- 2021-12-29 07:27:09.115 23 INFO neutron.db.ovn_revision_numbers_db [req-cb418ddf-a068-4511-9fed-f0914945c76d 0b16e2645828414d86f094c2a7a693eb 66fcb198228e4d9d97f09dc7ce1d5130 - default default] Successfully bumped revision number for resource c641bae9-edcc-4e64-93ec-2dcbf5010009 (type: ports) to 1 2021-12-29 07:27:10.054 23 INFO neutron.db.ovn_revision_numbers_db [req-8e3f0f54-8164-4d6d-acab-3be912b0afb6 217d8068ede5469ebfbebaf3e2242840 8775aa7d41654c84b8b17b3a570adf4d - default default] Successfully bumped revision number for resource c641bae9-edcc-4e64-93ec-2dcbf5010009 (type: ports) to 2 2021-12-29 07:27:10.393 23 INFO neutron.db.ovn_revision_numbers_db [req-8e3f0f54-8164-4d6d-acab-3be912b0afb6 217d8068ede5469ebfbebaf3e2242840 8775aa7d41654c84b8b17b3a570adf4d - default default] Successfully bumped revision number for resource c641bae9-edcc-4e64-93ec-2dcbf5010009 (type: ports) to 2 2021-12-29 07:27:10.415 23 INFO neutron.wsgi [req-8e3f0f54-8164-4d6d-acab-3be912b0afb6 217d8068ede5469ebfbebaf3e2242840 8775aa7d41654c84b8b17b3a570adf4d - default default] 172.16.4.75,172.16.4.74 "PUT /v2.0/ports/c641bae9-edcc-4e64-93ec-2dcbf5010009 HTTP/1.1" status: 200 len: 1258 time: 1.2485251 2021-12-29 07:27:10.556 23 INFO neutron.notifiers.nova [-] Nova event response: {'name': 'network-changed', 'server_uuid': 'aef0a1b3-99d6-43e2-89db-b7bd4e0516d9', 'tag': 'c641bae9-edcc-4e64-93ec-2dcbf5010009', 'status': 'completed', 'code': 200} --- neutron-server --- --- ovn-controller --- 2021-12-29T07:27:11.448Z|00363|binding|INFO|Claiming lport c641bae9-edcc-4e64-93ec-2dcbf5010009 for this chassis. 2021-12-29T07:27:11.448Z|00364|binding|INFO|c641bae9-edcc-4e64-93ec-2dcbf5010009: Claiming fa:16:3e:62:2d:8a 192.168.0.237 --- ovn-controller --- In one case we found an "OVN reports status up for port" with more than 5 hours delay after the initial creation request in the log. Does anyone have an idea what could be