I've been hitting this issue from time to time. There are discussions and bugs 
for this.

http://lists.openstack.org/pipermail/openstack-discuss/2021-November/025941.html
http://lists.openstack.org/pipermail/openstack-discuss/2021-November/025892.html
https://bugs.launchpad.net/heat/+bug/1694371
https://bugs.launchpad.net/nova/+bug/1944779

I don't think it's about OVN/OVS, because the port is up in OVN NB.
And restarting Neutron will make it work again. I doubt Neutron either
doesn't pick it up or doesn't send it out to Nova compute.

When I hit this issue, I enable debugging for Neutron and restart it, but that 
makes
problem gone. If I keep logging on debug level till next time, it will dump too 
much
loggings. I would be good for Neutron to support enable debug logging 
on-the-fly.

If it happens to you more frequently, you can try to enable debug and see if 
that
captures anything helpful.


Tony

________________________________________
From: discuss <ovs-discuss-boun...@openvswitch.org> on behalf of Christian 
Stelter <refu...@last-refuge.net>
Sent: December 29, 2021 09:51 AM
To: ovs-discuss@openvswitch.org
Subject: [ovs-discuss] Timeout waiting for network-vif-plugged for instance 
with vm_state building and task_state spawning

Hi!

We observe in our well aged OpenStack installation (kolla-ansible deployed 
victoria release) strange timeouts while creating new instances. It seems that 
the feedback from the ovn controller(?) doesn't reach the nova compute/libvirt 
process within 300s.

This does not happen all the time, but 1/3 to 1/5 of the launches have problems.

This is what we see in the logs:

---  nova-compute ---
2021-12-29 07:27:06.972 7 INFO nova.compute.claims 
[req-57321447-272f-4820-813d-925a43d69462 0b16e2645828414d86f094c2a7a693eb 
66fcb198228e4d9d97f09dc7ce1d5130 - default default] [instance: 
aef0a1b3-99d6-43e2-89db-b7bd4e0516d9] Claim successful on node compute07
2021-12-29 07:27:07.757 7 INFO nova.virt.libvirt.driver 
[req-57321447-272f-4820-813d-925a43d69462 0b16e2645828414d86f094c2a7a693eb 
66fcb198228e4d9d97f09dc7ce1d5130 - default default] [instance: 
aef0a1b3-99d6-43e2-89db-b7bd4e0516d9] Creating image
2021-12-29 07:27:12.514 7 INFO nova.compute.manager 
[req-01948206-8ff3-404c-b71f-86ed7d1791d4 - - - - -] [instance: 
aef0a1b3-99d6-43e2-89db-b7bd4e0516d9] VM Started (Lifecycle Event)
2021-12-29 07:27:12.559 7 INFO nova.compute.manager 
[req-01948206-8ff3-404c-b71f-86ed7d1791d4 - - - - -] [instance: 
aef0a1b3-99d6-43e2-89db-b7bd4e0516d9] VM Paused (Lifecycle Event)
2021-12-29 07:27:12.649 7 INFO nova.compute.manager 
[req-01948206-8ff3-404c-b71f-86ed7d1791d4 - - - - -] [instance: 
aef0a1b3-99d6-43e2-89db-b7bd4e0516d9] During sync_power_state the instance has 
a pending task (spawning). Skip.
2021-12-29 07:32:12.518 7 WARNING nova.virt.libvirt.driver 
[req-57321447-272f-4820-813d-925a43d69462 0b16e2645828414d86f094c2a7a693eb 
66fcb198228e4d9d97f09dc7ce1d5130 - default default] [instance: 
aef0a1b3-99d6-43e2-89db-b7bd4e0516d9] Timeout waiting for 
[('network-vif-plugged', 'c641bae9-edcc-4e64-93ec-2dcbf5010009')] for instance 
with vm_state building and task_state spawning: eventlet.timeout.Timeout: 300 
seconds
2021-12-29 07:32:12.522 7 INFO nova.compute.manager 
[req-01948206-8ff3-404c-b71f-86ed7d1791d4 - - - - -] [instance: 
aef0a1b3-99d6-43e2-89db-b7bd4e0516d9] VM Resumed (Lifecycle Event)
2021-12-29 07:32:12.526 7 INFO nova.virt.libvirt.driver [-] [instance: 
aef0a1b3-99d6-43e2-89db-b7bd4e0516d9] Instance spawned successfully.
2021-12-29 07:32:12.527 7 INFO nova.compute.manager 
[req-57321447-272f-4820-813d-925a43d69462 0b16e2645828414d86f094c2a7a693eb 
66fcb198228e4d9d97f09dc7ce1d5130 - default default] [instance: 
aef0a1b3-99d6-43e2-89db-b7bd4e0516d9] Took 304.77 seconds to spawn the instance 
on the hypervisor.
2021-12-29 07:32:12.613 7 INFO nova.compute.manager 
[req-01948206-8ff3-404c-b71f-86ed7d1791d4 - - - - -] [instance: 
aef0a1b3-99d6-43e2-89db-b7bd4e0516d9] During sync_power_state the instance has 
a pending task (spawning). Skip.
2021-12-29 07:32:12.632 7 INFO nova.compute.manager 
[req-57321447-272f-4820-813d-925a43d69462 0b16e2645828414d86f094c2a7a693eb 
66fcb198228e4d9d97f09dc7ce1d5130 - default default] [instance: 
aef0a1b3-99d6-43e2-89db-b7bd4e0516d9] Took 305.70 seconds to build instance.
---  nova-compute ---

--- neutron-server ---
2021-12-29 07:27:09.115 23 INFO neutron.db.ovn_revision_numbers_db 
[req-cb418ddf-a068-4511-9fed-f0914945c76d 0b16e2645828414d86f094c2a7a693eb 
66fcb198228e4d9d97f09dc7ce1d5130 - default default] Successfully bumped 
revision number for resource c641bae9-edcc-4e64-93ec-2dcbf5010009 (type: ports) 
to 1
2021-12-29 07:27:10.054 23 INFO neutron.db.ovn_revision_numbers_db 
[req-8e3f0f54-8164-4d6d-acab-3be912b0afb6 217d8068ede5469ebfbebaf3e2242840 
8775aa7d41654c84b8b17b3a570adf4d - default default] Successfully bumped 
revision number for resource c641bae9-edcc-4e64-93ec-2dcbf5010009 (type: ports) 
to 2
2021-12-29 07:27:10.393 23 INFO neutron.db.ovn_revision_numbers_db 
[req-8e3f0f54-8164-4d6d-acab-3be912b0afb6 217d8068ede5469ebfbebaf3e2242840 
8775aa7d41654c84b8b17b3a570adf4d - default default] Successfully bumped 
revision number for resource c641bae9-edcc-4e64-93ec-2dcbf5010009 (type: ports) 
to 2
2021-12-29 07:27:10.415 23 INFO neutron.wsgi 
[req-8e3f0f54-8164-4d6d-acab-3be912b0afb6 217d8068ede5469ebfbebaf3e2242840 
8775aa7d41654c84b8b17b3a570adf4d - default default] 172.16.4.75,172.16.4.74 
"PUT /v2.0/ports/c641bae9-edcc-4e64-93ec-2dcbf5010009 HTTP/1.1" status: 200  
len: 1258 time: 1.2485251
2021-12-29 07:27:10.556 23 INFO neutron.notifiers.nova [-] Nova event response: 
{'name': 'network-changed', 'server_uuid': 
'aef0a1b3-99d6-43e2-89db-b7bd4e0516d9', 'tag': 
'c641bae9-edcc-4e64-93ec-2dcbf5010009', 'status': 'completed', 'code': 200}
--- neutron-server ---

--- ovn-controller ---
2021-12-29T07:27:11.448Z|00363|binding|INFO|Claiming lport 
c641bae9-edcc-4e64-93ec-2dcbf5010009 for this chassis.
2021-12-29T07:27:11.448Z|00364|binding|INFO|c641bae9-edcc-4e64-93ec-2dcbf5010009:
 Claiming fa:16:3e:62:2d:8a 192.168.0.237
--- ovn-controller ---

In one case we found an "OVN reports status up for port" with more than 5 hours 
delay after the initial creation request in the log.

Does anyone have an idea what could be causing this? We are not sure whether 
ovn is the culprit or something in between. I didn't find any useful hints in 
the all knowing trash heap. But may be I just used the wrong key words.

Versions used:

ovn-controller 20.03.2
Open vSwitch Library 2.13.3
OpenFlow versions 0x4:0x4

OpenStack Nova 22.2.1

Thanks in advance!

Kind regards,

Christian Stelter
_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to