Hi Tiago, Many thanks again for the prompt reply. We would have to check if any use-case might be affected by the missing GARP. If this would be the case (thinking of cases like allowed_address_pairs and keepalived running on VMs on external/provider networks), would you be aware of a possible workaround?
Best regards, Cristi On Tue 9 Dec 2025, 17:23 Tiago Pires, <[email protected]> wrote: > Hi, > > On Tue, 9 Dec 2025 at 11:08 Cristian Contescu <[email protected]> wrote: > >> Hi Tiago, >> >> Thank you for your quick reply. We have actually just tested this in one >> of our clusters and it seems that the load issue is gone. >> Are you aware of any possible drawbacks of setting this? >> > > Indeed if you rely on GARPs from an upstream > router for some kind of failover mechanism, this would not work more. > > > Tiago Pires > >> >> Thank you again, >> Cristi >> >> On Tue, Dec 9, 2025 at 1:07 PM Tiago Pires <[email protected]> >> wrote: >> >>> Hi Cristian, >>> >>> On Tue, Dec 9, 2025 at 6:42 AM Cristian Contescu via discuss >>> <[email protected]> wrote: >>> > >>> > Hello everyone >>> > >>> > We wanted to check with the community a strange issue which we saw >>> happening in the following scenario >>> > >>> > In order to scale out one of our environments we decided to increase >>> the IPv4 provider network from a /22 to a /20 on Openstack, after doing so >>> > We noticed that OVS started using 100% of CPU (also from the logs): >>> > >>> > >>> > 2025-12-04T14:18:07Z|26750|poll_loop|INFO|wakeup due to [POLLIN] on fd >>> 425 (/var/run/openvswitch/br-int.mgmt<->) at ../lib/stream-fd.c:157 (101% >>> CPU usage) >>> > >>> > >>> > >>> > When the CPU spiked to 100% (ovs-vswitchd main process, while handlers >>> and revalidator threads are not as used) we started having packetloss >>> regardless of traffic (IPv4 / IPv6) >>> > >>> > After that we reverted the change and saw that the same issue happens >>> when we gradually increase the number of virtual routers on another >>> environment (with less traffic) with another /20 provider network. >>> > >>> > Do you know of any recent fixes related to the following or does >>> anyone have experienced a similar issue and can point us to some options to >>> evaluate? >>> > >>> > >>> > Our setup: >>> > >>> > dual stack external network (VLAN type) with a /20 (increased from /22 >>> before) IPv4 subnet and a /64 IPv6 subnet >>> > virtual routers are connected to the external network: >>> > >>> > dual-stack tenant networks are possible >>> > >>> > for IPv4 we use distributed floating IPs and SNAT(+DNAT) >>> > for IPv6 tenant networks are public and advertised via the >>> ovn-bgp-agent to the physical routers with next-hop being on the external >>> network >>> > >>> > our Openstack setup is based on openstack-helm deployed on physical >>> nodes >>> > >>> > >>> > So as of now our current findings are: >>> > - Correlation between number of virtual routers and CPU usage increase >>> > - Potential correlation between provider network being /20 instead of >>> /22 (increase in broadcast domain / traffic) >>> > >>> Did you set the broadcast-arps-to-all-routers=false in the provider >>> network's logical switch? >>> Ex: ovn-nbctl --no-leader-only set logical_switch <ID-Logical-Switch> >>> other_config:broadcast-arps-to-all-routers=false >>> >>> Checking your scenario I think it will probably decrease this load >>> spike in the ovs-vswitchd. >>> >>> > - Potential RA IPv6 / multicast flood when the issue happens by >>> investigating tcpdumps >>> > >>> > What you did that make the problem appear. >>> > In order to replicate this issue we increased the provider network >>> from /22 to /20 in Openstack and on the physical routers connecting the >>> external network. >>> > >>> > Another way to replicate the issue is to just increase the number of >>> virtual routers on an existing /20 provider network on a different >>> Openstack environment >>> > >>> > What you expected to happen. >>> > - OVS has the same load as before and doesn't reach 100% of CPU usage, >>> thus no packetloss >>> > - OVS is able to sustain the /20 provider network from Openstack >>> > >>> > What actually happened. >>> > - As soon as OVS main process reached 100% CPU usage from the log >>> lines, we started detecting packet loss >>> > >>> > - Other errors detected are "|WARN|over 4096 resubmit actions on >>> bridge" >>> > >>> > neutron openvswitch-zg86k openvswitch-vswitchd >>> 2025-12-08T14:33:28Z|00016|ofproto_dpif_xlate(handler35)|WARN|over 4096 >>> resubmit actions on bridge br-int while processing >>> icmp6,in_port=1,dl_vlan=101,dl_vlan_pcp=0,vlan_tci1=0x0000,dl_src=00:00:5e:00:02:65,dl_dst=33:33:00:00:00:01,ipv6_src=fe80::200:5eff:fe00:265,ipv6_dst=ff02::1,ipv6_label=0x00000,nw_tos=224,nw_ecn=0,nw_ttl=255,nw_frag=no,icmp_type=134,icmp_code=0 >>> > >>> > >>> > Versions of various components: >>> > ovs-vswitchd --version >>> > ovs-vswitchd (Open vSwitch) 3.3.4 >>> > >>> > ovn-controller --version >>> > ovn-controller 24.03.6 >>> > Open vSwitch Library 3.3.4 >>> > OpenFlow versions 0x6:0x6 >>> > SB DB Schema 20.33.0 >>> > >>> > No local patches >>> > >>> > Kernel: >>> > # cat /proc/version >>> > Linux version 6.8.0-52-generic (buildd@lcy02-amd64-046) >>> (x86_64-linux-gnu-gcc-13 (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0, GNU ld (GNU >>> Binutils for Ubuntu) 2.42) #53-Ubuntu SMP PREEMPT_DYNAMIC Sat Jan 11 >>> 00:06:25 UTC 2025 >>> > >>> > # ovs-dpctl show >>> > system@ovs-system: >>> > lookups: hit:25791010986 missed:954723789 lost:3319115 >>> > flows: 1185 >>> > masks: hit:96181163550 total:35 hit/pkt:3.60 >>> > cache: hit:19162151183 hit-rate:71.65% >>> > caches: >>> > masks-cache: size:256 >>> > port 0: ovs-system (internal) >>> > port 1: br-ex (internal) >>> > port 2: bond0 >>> > port 3: br-int (internal) >>> > port 4: genev_sys_6081 (geneve: packet_type=ptap) >>> > port 5: tap196a9595-b2 >>> > port 6: tap72f307a7-37 >>> > .. >>> > .. >>> > >>> > The only workaround seems to decrease the number of virtual routers >>> which is not sustainable in a used environment >>> > >>> > We checked flows and seems they grow from 70k~ up to 120/130k~ when >>> the issue happens >>> > # ovs-ofctl dump-flows br-int | wc -l >>> > 78413 >>> > >>> > >>> > Thank you for your help, >>> > >>> > Cristi >>> > >>> Regards, >>> >>> Tiago Pires >>> > >>> > _______________________________________________ >>> > discuss mailing list >>> > [email protected] >>> > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss >>> >>> -- >>> >>> >>> >>> >>> _‘Esta mensagem é direcionada apenas para os endereços constantes no >>> cabeçalho inicial. Se você não está listado nos endereços constantes no >>> cabeçalho, pedimos-lhe que desconsidere completamente o conteúdo dessa >>> mensagem e cuja cópia, encaminhamento e/ou execução das ações citadas >>> estão >>> imediatamente anuladas e proibidas’._ >>> >>> >>> * **‘Apesar do Magazine Luiza tomar >>> todas as precauções razoáveis para assegurar que nenhum vírus esteja >>> presente nesse e-mail, a empresa não poderá aceitar a responsabilidade >>> por >>> quaisquer perdas ou danos causados por esse e-mail ou por seus anexos’.* >>> >>> >>> >>> > > *‘Esta mensagem é direcionada apenas para os endereços constantes no > cabeçalho inicial. Se você não está listado nos endereços constantes no > cabeçalho, pedimos-lhe que desconsidere completamente o conteúdo dessa > mensagem e cuja cópia, encaminhamento e/ou execução das ações citadas estão > imediatamente anuladas e proibidas’.* > > *‘Apesar do Magazine Luiza tomar todas as precauções razoáveis para > assegurar que nenhum vírus esteja presente nesse e-mail, a empresa não > poderá aceitar a responsabilidade por quaisquer perdas ou danos causados > por esse e-mail ou por seus anexos’.* >
_______________________________________________ discuss mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
