Re: [ovs-discuss] Timeout waiting for network-vif-plugged for instance with vm_state building and task_state spawning

2021-12-29 Thread Tony Liu
I've been hitting this issue from time to time. There are discussions and bugs 
for this.

http://lists.openstack.org/pipermail/openstack-discuss/2021-November/025941.html
http://lists.openstack.org/pipermail/openstack-discuss/2021-November/025892.html
https://bugs.launchpad.net/heat/+bug/1694371
https://bugs.launchpad.net/nova/+bug/1944779

I don't think it's about OVN/OVS, because the port is up in OVN NB.
And restarting Neutron will make it work again. I doubt Neutron either
doesn't pick it up or doesn't send it out to Nova compute.

When I hit this issue, I enable debugging for Neutron and restart it, but that 
makes
problem gone. If I keep logging on debug level till next time, it will dump too 
much
loggings. I would be good for Neutron to support enable debug logging 
on-the-fly.

If it happens to you more frequently, you can try to enable debug and see if 
that
captures anything helpful.


Tony


From: discuss  on behalf of Christian 
Stelter 
Sent: December 29, 2021 09:51 AM
To: ovs-discuss@openvswitch.org
Subject: [ovs-discuss] Timeout waiting for network-vif-plugged for instance 
with vm_state building and task_state spawning

Hi!

We observe in our well aged OpenStack installation (kolla-ansible deployed 
victoria release) strange timeouts while creating new instances. It seems that 
the feedback from the ovn controller(?) doesn't reach the nova compute/libvirt 
process within 300s.

This does not happen all the time, but 1/3 to 1/5 of the launches have problems.

This is what we see in the logs:

---  nova-compute ---
2021-12-29 07:27:06.972 7 INFO nova.compute.claims 
[req-57321447-272f-4820-813d-925a43d69462 0b16e2645828414d86f094c2a7a693eb 
66fcb198228e4d9d97f09dc7ce1d5130 - default default] [instance: 
aef0a1b3-99d6-43e2-89db-b7bd4e0516d9] Claim successful on node compute07
2021-12-29 07:27:07.757 7 INFO nova.virt.libvirt.driver 
[req-57321447-272f-4820-813d-925a43d69462 0b16e2645828414d86f094c2a7a693eb 
66fcb198228e4d9d97f09dc7ce1d5130 - default default] [instance: 
aef0a1b3-99d6-43e2-89db-b7bd4e0516d9] Creating image
2021-12-29 07:27:12.514 7 INFO nova.compute.manager 
[req-01948206-8ff3-404c-b71f-86ed7d1791d4 - - - - -] [instance: 
aef0a1b3-99d6-43e2-89db-b7bd4e0516d9] VM Started (Lifecycle Event)
2021-12-29 07:27:12.559 7 INFO nova.compute.manager 
[req-01948206-8ff3-404c-b71f-86ed7d1791d4 - - - - -] [instance: 
aef0a1b3-99d6-43e2-89db-b7bd4e0516d9] VM Paused (Lifecycle Event)
2021-12-29 07:27:12.649 7 INFO nova.compute.manager 
[req-01948206-8ff3-404c-b71f-86ed7d1791d4 - - - - -] [instance: 
aef0a1b3-99d6-43e2-89db-b7bd4e0516d9] During sync_power_state the instance has 
a pending task (spawning). Skip.
2021-12-29 07:32:12.518 7 WARNING nova.virt.libvirt.driver 
[req-57321447-272f-4820-813d-925a43d69462 0b16e2645828414d86f094c2a7a693eb 
66fcb198228e4d9d97f09dc7ce1d5130 - default default] [instance: 
aef0a1b3-99d6-43e2-89db-b7bd4e0516d9] Timeout waiting for 
[('network-vif-plugged', 'c641bae9-edcc-4e64-93ec-2dcbf5010009')] for instance 
with vm_state building and task_state spawning: eventlet.timeout.Timeout: 300 
seconds
2021-12-29 07:32:12.522 7 INFO nova.compute.manager 
[req-01948206-8ff3-404c-b71f-86ed7d1791d4 - - - - -] [instance: 
aef0a1b3-99d6-43e2-89db-b7bd4e0516d9] VM Resumed (Lifecycle Event)
2021-12-29 07:32:12.526 7 INFO nova.virt.libvirt.driver [-] [instance: 
aef0a1b3-99d6-43e2-89db-b7bd4e0516d9] Instance spawned successfully.
2021-12-29 07:32:12.527 7 INFO nova.compute.manager 
[req-57321447-272f-4820-813d-925a43d69462 0b16e2645828414d86f094c2a7a693eb 
66fcb198228e4d9d97f09dc7ce1d5130 - default default] [instance: 
aef0a1b3-99d6-43e2-89db-b7bd4e0516d9] Took 304.77 seconds to spawn the instance 
on the hypervisor.
2021-12-29 07:32:12.613 7 INFO nova.compute.manager 
[req-01948206-8ff3-404c-b71f-86ed7d1791d4 - - - - -] [instance: 
aef0a1b3-99d6-43e2-89db-b7bd4e0516d9] During sync_power_state the instance has 
a pending task (spawning). Skip.
2021-12-29 07:32:12.632 7 INFO nova.compute.manager 
[req-57321447-272f-4820-813d-925a43d69462 0b16e2645828414d86f094c2a7a693eb 
66fcb198228e4d9d97f09dc7ce1d5130 - default default] [instance: 
aef0a1b3-99d6-43e2-89db-b7bd4e0516d9] Took 305.70 seconds to build instance.
---  nova-compute ---

--- neutron-server ---
2021-12-29 07:27:09.115 23 INFO neutron.db.ovn_revision_numbers_db 
[req-cb418ddf-a068-4511-9fed-f0914945c76d 0b16e2645828414d86f094c2a7a693eb 
66fcb198228e4d9d97f09dc7ce1d5130 - default default] Successfully bumped 
revision number for resource c641bae9-edcc-4e64-93ec-2dcbf5010009 (type: ports) 
to 1
2021-12-29 07:27:10.054 23 INFO neutron.db.ovn_revision_numbers_db 
[req-8e3f0f54-8164-4d6d-acab-3be912b0afb6 217d8068ede5469ebfbebaf3e2242840 
8775aa7d41654c84b8b17b3a570adf4d - default default] Successfully bumped 
revision number for resource c641bae9-edcc-4e64-93ec-2dcbf5010009 (type: ports) 
to 2
2021-12-29 07:27:10.393 23 INFO neutron.db.ovn_revision_numbers_db 

[ovs-discuss] Timeout waiting for network-vif-plugged for instance with vm_state building and task_state spawning

2021-12-29 Thread Christian Stelter
Hi!

We observe in our well aged OpenStack installation (kolla-ansible deployed
victoria release) strange timeouts while creating new instances. It seems
that the feedback from the ovn controller(?) doesn't reach the nova
compute/libvirt process within 300s.

This does not happen all the time, but 1/3 to 1/5 of the launches have
problems.

This is what we see in the logs:

---  nova-compute ---
2021-12-29 07:27:06.972 7 INFO nova.compute.claims
[req-57321447-272f-4820-813d-925a43d69462 0b16e2645828414d86f094c2a7a693eb
66fcb198228e4d9d97f09dc7ce1d5130 - default default] [instance:
aef0a1b3-99d6-43e2-89db-b7bd4e0516d9] Claim successful on node compute07
2021-12-29 07:27:07.757 7 INFO nova.virt.libvirt.driver
[req-57321447-272f-4820-813d-925a43d69462 0b16e2645828414d86f094c2a7a693eb
66fcb198228e4d9d97f09dc7ce1d5130 - default default] [instance:
aef0a1b3-99d6-43e2-89db-b7bd4e0516d9] Creating image
2021-12-29 07:27:12.514 7 INFO nova.compute.manager
[req-01948206-8ff3-404c-b71f-86ed7d1791d4 - - - - -] [instance:
aef0a1b3-99d6-43e2-89db-b7bd4e0516d9] VM Started (Lifecycle Event)
2021-12-29 07:27:12.559 7 INFO nova.compute.manager
[req-01948206-8ff3-404c-b71f-86ed7d1791d4 - - - - -] [instance:
aef0a1b3-99d6-43e2-89db-b7bd4e0516d9] VM Paused (Lifecycle Event)
2021-12-29 07:27:12.649 7 INFO nova.compute.manager
[req-01948206-8ff3-404c-b71f-86ed7d1791d4 - - - - -] [instance:
aef0a1b3-99d6-43e2-89db-b7bd4e0516d9] During sync_power_state the instance
has a pending task (spawning). Skip.
2021-12-29 07:32:12.518 7 WARNING nova.virt.libvirt.driver
[req-57321447-272f-4820-813d-925a43d69462 0b16e2645828414d86f094c2a7a693eb
66fcb198228e4d9d97f09dc7ce1d5130 - default default] [instance:
aef0a1b3-99d6-43e2-89db-b7bd4e0516d9] Timeout waiting for
[('network-vif-plugged', 'c641bae9-edcc-4e64-93ec-2dcbf5010009')] for
instance with vm_state building and task_state spawning:
eventlet.timeout.Timeout: 300 seconds
2021-12-29 07:32:12.522 7 INFO nova.compute.manager
[req-01948206-8ff3-404c-b71f-86ed7d1791d4 - - - - -] [instance:
aef0a1b3-99d6-43e2-89db-b7bd4e0516d9] VM Resumed (Lifecycle Event)
2021-12-29 07:32:12.526 7 INFO nova.virt.libvirt.driver [-] [instance:
aef0a1b3-99d6-43e2-89db-b7bd4e0516d9] Instance spawned successfully.
2021-12-29 07:32:12.527 7 INFO nova.compute.manager
[req-57321447-272f-4820-813d-925a43d69462 0b16e2645828414d86f094c2a7a693eb
66fcb198228e4d9d97f09dc7ce1d5130 - default default] [instance:
aef0a1b3-99d6-43e2-89db-b7bd4e0516d9] Took 304.77 seconds to spawn the
instance on the hypervisor.
2021-12-29 07:32:12.613 7 INFO nova.compute.manager
[req-01948206-8ff3-404c-b71f-86ed7d1791d4 - - - - -] [instance:
aef0a1b3-99d6-43e2-89db-b7bd4e0516d9] During sync_power_state the instance
has a pending task (spawning). Skip.
2021-12-29 07:32:12.632 7 INFO nova.compute.manager
[req-57321447-272f-4820-813d-925a43d69462 0b16e2645828414d86f094c2a7a693eb
66fcb198228e4d9d97f09dc7ce1d5130 - default default] [instance:
aef0a1b3-99d6-43e2-89db-b7bd4e0516d9] Took 305.70 seconds to build instance.
---  nova-compute ---

--- neutron-server ---
2021-12-29 07:27:09.115 23 INFO neutron.db.ovn_revision_numbers_db
[req-cb418ddf-a068-4511-9fed-f0914945c76d 0b16e2645828414d86f094c2a7a693eb
66fcb198228e4d9d97f09dc7ce1d5130 - default default] Successfully bumped
revision number for resource c641bae9-edcc-4e64-93ec-2dcbf5010009 (type:
ports) to 1
2021-12-29 07:27:10.054 23 INFO neutron.db.ovn_revision_numbers_db
[req-8e3f0f54-8164-4d6d-acab-3be912b0afb6 217d8068ede5469ebfbebaf3e2242840
8775aa7d41654c84b8b17b3a570adf4d - default default] Successfully bumped
revision number for resource c641bae9-edcc-4e64-93ec-2dcbf5010009 (type:
ports) to 2
2021-12-29 07:27:10.393 23 INFO neutron.db.ovn_revision_numbers_db
[req-8e3f0f54-8164-4d6d-acab-3be912b0afb6 217d8068ede5469ebfbebaf3e2242840
8775aa7d41654c84b8b17b3a570adf4d - default default] Successfully bumped
revision number for resource c641bae9-edcc-4e64-93ec-2dcbf5010009 (type:
ports) to 2
2021-12-29 07:27:10.415 23 INFO neutron.wsgi
[req-8e3f0f54-8164-4d6d-acab-3be912b0afb6 217d8068ede5469ebfbebaf3e2242840
8775aa7d41654c84b8b17b3a570adf4d - default default] 172.16.4.75,172.16.4.74
"PUT /v2.0/ports/c641bae9-edcc-4e64-93ec-2dcbf5010009 HTTP/1.1" status:
200  len: 1258 time: 1.2485251
2021-12-29 07:27:10.556 23 INFO neutron.notifiers.nova [-] Nova event
response: {'name': 'network-changed', 'server_uuid':
'aef0a1b3-99d6-43e2-89db-b7bd4e0516d9', 'tag':
'c641bae9-edcc-4e64-93ec-2dcbf5010009', 'status': 'completed', 'code': 200}
--- neutron-server ---

--- ovn-controller ---
2021-12-29T07:27:11.448Z|00363|binding|INFO|Claiming lport
c641bae9-edcc-4e64-93ec-2dcbf5010009 for this chassis.
2021-12-29T07:27:11.448Z|00364|binding|INFO|c641bae9-edcc-4e64-93ec-2dcbf5010009:
Claiming fa:16:3e:62:2d:8a 192.168.0.237
--- ovn-controller ---

In one case we found an "OVN reports status up for port" with more than 5
hours delay after the initial creation request in the log.

Does anyone have an idea what could be