Thanks all for your pointers

Will try and attach debugger and report back.

I have tried 3.5.0 and it hasn't helped.
________________________________
From: Ilya Maximets <[email protected]>
Sent: 02 May 2025 1:45 PM
To: Brian Haley <[email protected]>; Daniel Niasoff <[email protected]>; 
[email protected] <[email protected]>
Cc: [email protected] <[email protected]>
Subject: Re: [ovs-discuss] Open vSwitch (Version 3.3.0) goes into deadlocked 
state

On 4/30/25 10:07 PM, Brian Haley via discuss wrote:
> Hi,
>
> On 4/28/25 5:57 AM, Daniel Niasoff via discuss wrote:
>>
>> Hi,
>>
>>    We are deploying OpenStack 2024.2 using kolla on Ubuntu Noble. Using
>>    OVN as the network overlay.
>>
>>    We have this issue when we enable qos on routers and networks, the
>>    openvswitch_vswitchd processes start hanging. Haven't tried with just
>>    one or the other, but it's shouldn't be possible to bring down a whole
>>    cluster with a bit of config.
>>
>>    This occured with OpenStack 2023.2 on Jammy as well in the past. So
>>    this would have been an older version of Open vSwitch and I even tried
>>    with Open vSwitch 3.5.0.
>>
>>    Just using a simple ingress/egress limits of 1000/1000 for just 1
>>    network and 500/500 for a single router,
>
> Just fyi this (above) was reported at
> https://bugs.launchpad.net/neutron/+bug/2103641
>
> But I now see someone else has reported the same issue - Ubuntu 24.04,
> OVN 24.03, OVS 3.3.0 (distro packages),
> https://bugs.launchpad.net/neutron/+bug/2109676 . One commonality seems
> to be both have installed QoS rules for the instances.
>
> I have not tried to reproduce it myself yet, just throwing it out there
> in case it might ring any bells.

It doesn't ring any bells for me.  The fact that the handler thread
is stuck somewhere makes me think that the problem is somewhere in
the kernel and not in userspace.  The fact that it requires qdisc to
be configured suggests some problem in the generic tc code in the
kernel.  So, potentially not related to OVS.

So, as Eelco suggested, if you have a reproducer, attach gdb to see
where this handler thread is, or in which syscall it is stuck in.
And then look further from there.  It's hard to guess what's going on
without this information.  May also try upstream kernel instead of
a distro kernel.  Trying upstream OVS v3.3.4 may also be a good idea,
as I'm not sure how far behind is that OVS 3.3.0, though I don't
remember any patches fixing anything similar.

Best regards, Ilya Maximets.

>
> Thanks,
>
> -Brian
>
>
>>    Here are the logs
>>
>>    2025-03-19T09:37:24.752Z|409501|connmgr|INFO|br-int<->unix#1: 8 flow_mods 
>> 43 s ago (6 adds, 2 deletes)
>>    2025-03-19T09:38:39.945Z|410047|connmgr|INFO|br-int<->unix#1: 10 
>> flow_mods in the 2 s starting 10 s ago (2 adds, 8 deletes)
>>    2025-03-19T09:44:19.786Z|412166|connmgr|INFO|br-int<->unix#1: 4 flow_mods 
>> 10 s ago (2 adds, 2 deletes)
>>    2025-03-19T09:45:19.786Z|412576|connmgr|INFO|br-int<->unix#1: 8 flow_mods 
>> in the 6 s starting 33 s ago (6 adds, 2 deletes)
>>    2025-03-19T09:54:07.996Z|415871|connmgr|INFO|br-int<->unix#1: 8 flow_mods 
>> in the 1 s starting 10 s ago (2 adds, 6 deletes)
>>    2025-03-19T09:54:52.517Z|416385|bridge|INFO|bridge br-int: deleted 
>> interface tap66d9c2a6-95 on port 101
>>    2025-03-19T09:55:07.996Z|416743|connmgr|INFO|br-int<->unix#1: 331 
>> flow_mods in the 8 s starting 23 s ago (21 adds, 310 deletes)
>>    2025-03-19T09:56:07.996Z|417114|connmgr|INFO|br-int<->unix#1: 1 flow_mods 
>> 56 s ago (1 adds)
>>    2025-03-19T09:56:54.831Z|417448|bridge|INFO|bridge br-int: added 
>> interface tapc19e70a1-68 on port 102
>>    2025-03-19T09:56:54.860Z|417540|netdev_linux|WARN|tapc19e70a1-68: 
>> removing policing failed: No such device
>>    2025-03-19T09:57:07.996Z|417902|connmgr|INFO|br-int<->unix#1: 207 
>> flow_mods in the 1 s starting 13 s ago (197 adds, 10 deletes)
>>    2025-03-19T10:00:12.730Z|419178|connmgr|INFO|br-int<->unix#1: 94 
>> flow_mods 10 s ago (85 adds, 9 deletes)
>>    2025-03-19T10:01:12.730Z|419549|connmgr|INFO|br-int<->unix#1: 6 flow_mods 
>> 37 s ago (4 adds, 2 deletes)
>>    2025-03-19T10:05:54.525Z|421308|connmgr|INFO|br-int<->unix#1: 1 flow_mods 
>> 10 s ago (1 adds)
>>    2025-03-19T10:06:54.526Z|421710|connmgr|INFO|br-int<->unix#1: 1 flow_mods 
>> 52 s ago (1 deletes)
>>    2025-03-19T10:08:52.756Z|422418|connmgr|INFO|br-int<->unix#1: 1 flow_mods 
>> 10 s ago (1 adds)
>>    2025-03-19T11:18:15.953Z|448775|connmgr|INFO|br-int<->unix#1: 176 
>> flow_mods in the 8 s starting 10 s ago (31 adds, 145 deletes)
>>    2025-03-19T11:31:30.570Z|453640|connmgr|INFO|br-int<->unix#1: 1 flow_mods 
>> 10 s ago (1 adds)
>>    2025-03-19T11:32:30.570Z|454015|connmgr|INFO|br-int<->unix#1: 1 flow_mods 
>> 58 s ago (1 adds)
>>    2025-03-19T11:35:09.140Z|539360|ovs_rcu(urcu9)|WARN|blocked 1000 ms 
>> waiting for handler1 to quiesce
>>    2025-03-19T11:35:09.140Z|455059|ovs_rcu|WARN|blocked 1000 ms waiting for 
>> handler1 to quiesce
>>    2025-03-19T11:35:10.140Z|539409|ovs_rcu(urcu9)|WARN|blocked 2000 ms 
>> waiting for handler1 to quiesce
>>    2025-03-19T11:35:10.141Z|455106|ovs_rcu|WARN|blocked 2000 ms waiting for 
>> handler1 to quiesce
>>    2025-03-19T11:35:12.140Z|539497|ovs_rcu(urcu9)|WARN|blocked 4001 ms 
>> waiting for handler1 to quiesce
>>    2025-03-19T11:35:12.141Z|455192|ovs_rcu|WARN|blocked 4000 ms waiting for 
>> handler1 to quiesce
>>    2025-03-19T11:35:16.140Z|539687|ovs_rcu(urcu9)|WARN|blocked 8000 ms 
>> waiting for handler1 to quiesce
>>    2025-03-19T11:35:16.141Z|455387|ovs_rcu|WARN|blocked 8000 ms waiting for 
>> handler1 to quiesce
>>    2025-03-19T11:35:24.139Z|540106|ovs_rcu(urcu9)|WARN|blocked 16000 ms 
>> waiting for handler1 to quiesce
>>    2025-03-19T11:35:24.140Z|455837|ovs_rcu|WARN|blocked 16000 ms waiting for 
>> handler1 to quiesce
>>    2025-03-19T11:35:40.139Z|541019|ovs_rcu(urcu9)|WARN|blocked 32000 ms 
>> waiting for handler1 to quiesce
>>    2025-03-19T11:35:40.140Z|456773|ovs_rcu|WARN|blocked 32000 ms waiting for 
>> handler1 to quiesce
>>    2025-03-19T11:36:12.139Z|542611|ovs_rcu(urcu9)|WARN|blocked 64000 ms 
>> waiting for handler1 to quiesce
>>    2025-03-19T11:36:12.140Z|458417|ovs_rcu|WARN|blocked 64000 ms waiting for 
>> handler1 to quiesce
>>    2025-03-19T11:37:16.140Z|545667|ovs_rcu(urcu9)|WARN|blocked 128000 ms 
>> waiting for handler1 to quiesce
>>    2025-03-19T11:37:16.141Z|461499|ovs_rcu|WARN|blocked 128000 ms waiting 
>> for handler1 to quiesce
>>    2025-03-19T11:39:24.139Z|551954|ovs_rcu(urcu9)|WARN|blocked 256000 ms 
>> waiting for handler1 to quiesce
>>    2025-03-19T11:39:24.140Z|467913|ovs_rcu|WARN|blocked 256000 ms waiting 
>> for handler1 to quiesce
>>    2025-03-19T11:43:40.140Z|564156|ovs_rcu(urcu9)|WARN|blocked 512000 ms 
>> waiting for handler1 to quiesce
>>    2025-03-19T11:43:40.141Z|480412|ovs_rcu|WARN|blocked 512000 ms waiting 
>> for handler1 to quiesce
>>    2025-03-19T11:50:04.648Z|00001|vlog|INFO|opened log file 
>> /var/log/kolla/openvswitch/ovs-vswitchd.log
>>
>>    Any ideas?
>>
>>    Thanks
>>
>>    Daniel
>> _______________________________________________
>> discuss mailing list
>> [email protected]
>> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
> _______________________________________________
> discuss mailing list
> [email protected]
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>

_______________________________________________
discuss mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to