Hi,

On Tue, 9 Dec 2025 at 11:08 Cristian Contescu <[email protected]> wrote:

> Hi Tiago,
>
> Thank you for your quick reply. We have actually just tested this in one
> of our clusters and it seems that the load issue is gone.
> Are you aware of any possible drawbacks of setting this?
>

Indeed if you rely on GARPs from an upstream
router for some kind of failover mechanism, this would not work more.


Tiago Pires

>
> Thank you again,
> Cristi
>
> On Tue, Dec 9, 2025 at 1:07 PM Tiago Pires <[email protected]>
> wrote:
>
>> Hi Cristian,
>>
>> On Tue, Dec 9, 2025 at 6:42 AM Cristian Contescu via discuss
>> <[email protected]> wrote:
>> >
>> > Hello everyone
>> >
>> > We wanted to check with the community a strange issue which we saw
>> happening in the following scenario
>> >
>> > In order to scale out one of our environments we decided to increase
>> the IPv4 provider network from a /22 to a /20 on Openstack, after doing so
>> > We noticed that OVS started using 100% of CPU (also from the logs):
>> >
>> >
>> > 2025-12-04T14:18:07Z|26750|poll_loop|INFO|wakeup due to [POLLIN] on fd
>> 425 (/var/run/openvswitch/br-int.mgmt<->) at ../lib/stream-fd.c:157 (101%
>> CPU usage)
>> >
>> >
>> >
>> > When the CPU spiked to 100% (ovs-vswitchd main process, while handlers
>> and revalidator threads are not as used) we started having packetloss
>> regardless of traffic (IPv4 / IPv6)
>> >
>> > After that we reverted the change and saw that the same issue happens
>> when we gradually increase the number of virtual routers on another
>> environment (with less traffic) with another /20 provider network.
>> >
>> > Do you know of any recent fixes related to the following or does anyone
>> have experienced a similar issue and can point us to some options to
>> evaluate?
>> >
>> >
>> > Our setup:
>> >
>> > dual stack external network (VLAN type) with a /20 (increased from /22
>> before) IPv4 subnet and a /64 IPv6 subnet
>> >  virtual routers are connected to the external network:
>> >
>> > dual-stack tenant networks are possible
>> >
>> > for IPv4 we use distributed floating IPs and SNAT(+DNAT)
>> > for IPv6 tenant networks are public and advertised via the
>> ovn-bgp-agent to the physical routers with next-hop being on the external
>> network
>> >
>> > our Openstack setup is based on openstack-helm deployed on physical
>> nodes
>> >
>> >
>> > So as of now our current findings are:
>> > - Correlation between number of virtual routers and CPU usage increase
>> > - Potential correlation between provider network being /20 instead of
>> /22 (increase in broadcast domain / traffic)
>> >
>> Did you set the broadcast-arps-to-all-routers=false in the provider
>> network's logical switch?
>> Ex: ovn-nbctl --no-leader-only set logical_switch <ID-Logical-Switch>
>> other_config:broadcast-arps-to-all-routers=false
>>
>> Checking your scenario I think it will probably decrease this load
>> spike in the ovs-vswitchd.
>>
>> > - Potential RA IPv6 / multicast flood when the issue happens by
>> investigating tcpdumps
>> >
>> > What you did that make the problem appear.
>> > In order to replicate this issue we increased the provider network from
>> /22 to /20 in Openstack and on the physical routers connecting the external
>> network.
>> >
>> > Another way to replicate the issue is to just increase the number of
>> virtual routers on an existing /20 provider network on a different
>> Openstack environment
>> >
>> > What you expected to happen.
>> > - OVS has the same load as before and doesn't reach 100% of CPU usage,
>> thus no packetloss
>> > - OVS is able to sustain the /20 provider network from Openstack
>> >
>> > What actually happened.
>> > - As soon as OVS main process reached 100% CPU usage from the log
>> lines, we started detecting packet loss
>> >
>> > - Other errors detected are "|WARN|over 4096 resubmit actions on bridge"
>> >
>> > neutron openvswitch-zg86k openvswitch-vswitchd
>> 2025-12-08T14:33:28Z|00016|ofproto_dpif_xlate(handler35)|WARN|over 4096
>> resubmit actions on bridge br-int while processing
>> icmp6,in_port=1,dl_vlan=101,dl_vlan_pcp=0,vlan_tci1=0x0000,dl_src=00:00:5e:00:02:65,dl_dst=33:33:00:00:00:01,ipv6_src=fe80::200:5eff:fe00:265,ipv6_dst=ff02::1,ipv6_label=0x00000,nw_tos=224,nw_ecn=0,nw_ttl=255,nw_frag=no,icmp_type=134,icmp_code=0
>> >
>> >
>> > Versions of various components:
>> >  ovs-vswitchd --version
>> > ovs-vswitchd (Open vSwitch) 3.3.4
>> >
>> >  ovn-controller --version
>> > ovn-controller 24.03.6
>> > Open vSwitch Library 3.3.4
>> > OpenFlow versions 0x6:0x6
>> > SB DB Schema 20.33.0
>> >
>> > No local patches
>> >
>> > Kernel:
>> > # cat /proc/version
>> > Linux version 6.8.0-52-generic (buildd@lcy02-amd64-046)
>> (x86_64-linux-gnu-gcc-13 (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0, GNU ld (GNU
>> Binutils for Ubuntu) 2.42) #53-Ubuntu SMP PREEMPT_DYNAMIC Sat Jan 11
>> 00:06:25 UTC 2025
>> >
>> > # ovs-dpctl show
>> > system@ovs-system:
>> >   lookups: hit:25791010986 missed:954723789 lost:3319115
>> >   flows: 1185
>> >   masks: hit:96181163550 total:35 hit/pkt:3.60
>> >   cache: hit:19162151183 hit-rate:71.65%
>> >   caches:
>> >     masks-cache: size:256
>> >   port 0: ovs-system (internal)
>> >   port 1: br-ex (internal)
>> >   port 2: bond0
>> >   port 3: br-int (internal)
>> >   port 4: genev_sys_6081 (geneve: packet_type=ptap)
>> >   port 5: tap196a9595-b2
>> >   port 6: tap72f307a7-37
>> >   ..
>> >   ..
>> >
>> >   The only workaround seems to decrease the number of virtual routers
>> which is not sustainable in a used environment
>> >
>> > We checked flows and seems they grow from 70k~ up to 120/130k~ when the
>> issue happens
>> > # ovs-ofctl dump-flows br-int | wc -l
>> > 78413
>> >
>> >
>> > Thank you for your help,
>> >
>> > Cristi
>> >
>> Regards,
>>
>> Tiago Pires
>> >
>> > _______________________________________________
>> > discuss mailing list
>> > [email protected]
>> > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>>
>> --
>>
>>
>>
>>
>> _‘Esta mensagem é direcionada apenas para os endereços constantes no
>> cabeçalho inicial. Se você não está listado nos endereços constantes no
>> cabeçalho, pedimos-lhe que desconsidere completamente o conteúdo dessa
>> mensagem e cuja cópia, encaminhamento e/ou execução das ações citadas
>> estão
>> imediatamente anuladas e proibidas’._
>>
>>
>> * **‘Apesar do Magazine Luiza tomar
>> todas as precauções razoáveis para assegurar que nenhum vírus esteja
>> presente nesse e-mail, a empresa não poderá aceitar a responsabilidade
>> por
>> quaisquer perdas ou danos causados por esse e-mail ou por seus anexos’.*
>>
>>
>>
>>

-- 




_‘Esta mensagem é direcionada apenas para os endereços constantes no 
cabeçalho inicial. Se você não está listado nos endereços constantes no 
cabeçalho, pedimos-lhe que desconsidere completamente o conteúdo dessa 
mensagem e cuja cópia, encaminhamento e/ou execução das ações citadas estão 
imediatamente anuladas e proibidas’._


* **‘Apesar do Magazine Luiza tomar 
todas as precauções razoáveis para assegurar que nenhum vírus esteja 
presente nesse e-mail, a empresa não poderá aceitar a responsabilidade por 
quaisquer perdas ou danos causados por esse e-mail ou por seus anexos’.*



_______________________________________________
discuss mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to