On Mon, Sep 22, 2025 at 09:18:28AM +0700, Shawn Ming wrote:
> Hi Felix, all,

Hi Shawn,

> 
> Thanks Felix for your detailed notes and suggestions. I did some
> additional investigation as you recommended, and here’s what I found:
> 
> 1. Using tcpdump, I observed that roughly ~2,000 ARP requests are
> being sent from the router to all physical nodes - including those
> without any VMs in the /21 network.
> Most ARP requests are for IPs that appear to be invalid (either
> unallocated or currently down), while valid IPs do not generate nearly
> as many requests.

i would guess that this /21 network is a publicly routable one that is
internet accessible? Then this is from my experience quite normal.
Random scanners on the internet will scan your /21 range and send
requests there. Upstream routers do not seem to cache arp misses and
therefor will send and arp request for each packet they get for unused
ips.

> 
> 2. Interestingly, only the compute nodes that host VMs attached to the
> /21 CIDR show latency spikes and high CPU usage.
> Other nodes receive the same ARP flood but don’t seem to be affected
> in the same way.

Could you try setting other_config:broadcast-arps-to-all-routers=false
on the Logical_Switch that represents this /21 network (if the
implicantions are acceptable)?

If a Logical_Switch gets an arp request for a IP it does not
know it will flood it to all attached Logical_Switch_Ports and
potentially routers. If you have a lot of such requests that might be
quite inefficient.

We built the setting above for exactly that purpose.

Please note that if you set this only arp requests with a known
destination ip will be processed. If you rely on GARPs of your upstream
router for some kind of failover mechanism that would no longer work.

Thanks a lot,
Felix

> 
> 3. From what I’ve gathered, this behavior might be linked somehow to
> how megaflow handling works in OVS/OVN. I’m still digging into the
> details.
> 
> I’d appreciate any further insights from you and everyone in the community!
> 
> Best regards,
> Shawn
> 
> On Tue, Sep 16, 2025 at 9:19 PM Felix Huettner
> <[email protected]> wrote:
> >
> > On Mon, Sep 15, 2025 at 05:16:03PM +0700, Shawn Ming via discuss wrote:
> > > Hello all,
> > >
> > > I am running OpenStack (deployed via Kolla-Ansible) with Neutron using
> > > OVN as the networking backend. The `distributed_floating_ip` option is
> > > not enabled.
> > > I have encountered an issue related to large provider networks (CIDR
> > > /21) and would like to seek advice from the community.
> >
> > Hi Shawn,
> >
> > i'll note below what i saw, maybe it is helpful to you.
> >
> > >
> > > I./ Environment / Steps to reproduce:
> > > - OpenStack Caracal (2024.1) deployed with Kolla-Ansible.
> > > - Neutron backend: OVN version 24.03.2 (not setting 
> > > distributed_floating_ip).
> > > - Create a provider network with CIDR /21.
> > > - Deploy some VMs directly attached to this network.
> > > - Observe traffic and system behavior.
> > >
> > > II./ Observed behavior:
> > > Note: The actual gateway IP address in the logs has been replaced for
> > > some reasons
> > > 1. VMs attached to the /21 network frequently have latency spike and 
> > > packet loss
> > > root@vm4:~# ping 192.168.1.254
> > > PING 192.168.1.254 (192.168.1.254) 56(84) bytes of data.
> > > 64 bytes from 192.168.1.254: icmp_seq=4 ttl=64 time=6.86 ms
> > > 64 bytes from 192.168.1.254: icmp_seq=5 ttl=64 time=49.1 ms
> > > 64 bytes from 192.168.1.254: icmp_seq=6 ttl=64 time=7.74 ms
> > > 64 bytes from 192.168.1.254: icmp_seq=7 ttl=64 time=7.68 ms
> > > 64 bytes from 192.168.1.254: icmp_seq=9 ttl=64 time=0.850 ms
> > > 64 bytes from 192.168.1.254: icmp_seq=10 ttl=64 time=1.40 ms
> > > 64 bytes from 192.168.1.254: icmp_seq=8 ttl=64 time=2317 ms
> > > 64 bytes from 192.168.1.254: icmp_seq=11 ttl=64 time=5.31 ms
> > > 64 bytes from 192.168.1.254: icmp_seq=13 ttl=64 time=0.749 ms
> > > 64 bytes from 192.168.1.254: icmp_seq=14 ttl=64 time=4.06 ms
> > > 64 bytes from 192.168.1.254: icmp_seq=15 ttl=64 time=1.67 ms
> > > 64 bytes from 192.168.1.254: icmp_seq=16 ttl=64 time=8.24 ms
> > > 64 bytes from 192.168.1.254: icmp_seq=17 ttl=64 time=9.61 ms
> > > 64 bytes from 192.168.1.254: icmp_seq=18 ttl=64 time=5.71 ms
> > > ^C
> > > --- 192.168.1.254 ping statistics ---
> > > 18 packets transmitted, 14 received, 22.2222% packet loss, time 17148ms
> > > rtt min/avg/max/mdev = 0.749/173.252/2316.610/594.574 ms, pipe 3
> >
> > Not only is the latency spike strange, but also the packet reordering.
> >
> > >
> > > Meanwhile, VMs attached to /23 network do not
> > > root@vm5:~# ping 192.168.2.254
> > > PING 192.168.2.254 (192.168.2.254) 56(84) bytes of data.
> > > 64 bytes from 192.168.2.254: icmp_seq=1 ttl=64 time=1.04 ms
> > > 64 bytes from 192.168.2.254: icmp_seq=2 ttl=64 time=25.9 ms
> > > 64 bytes from 192.168.2.254: icmp_seq=3 ttl=64 time=5.05 ms
> > > 64 bytes from 192.168.2.254: icmp_seq=4 ttl=64 time=2.05 ms
> > > 64 bytes from 192.168.2.254: icmp_seq=5 ttl=64 time=0.523 ms
> > > 64 bytes from 192.168.2.254: icmp_seq=6 ttl=64 time=4.16 ms
> > > 64 bytes from 192.168.2.254: icmp_seq=7 ttl=64 time=0.798 ms
> > > 64 bytes from 192.168.2.254: icmp_seq=8 ttl=64 time=70.9 ms
> > > 64 bytes from 192.168.2.254: icmp_seq=9 ttl=64 time=1.54 ms
> > > 64 bytes from 192.168.2.254: icmp_seq=10 ttl=64 time=4.14 ms
> > > 64 bytes from 192.168.2.254: icmp_seq=11 ttl=64 time=6.88 ms
> > > 64 bytes from 192.168.2.254: icmp_seq=12 ttl=64 time=0.733 ms
> > > 64 bytes from 192.168.2.254: icmp_seq=13 ttl=64 time=1.01 ms
> > > 64 bytes from 192.168.2.254: icmp_seq=14 ttl=64 time=2.70 ms
> > > 64 bytes from 192.168.2.254: icmp_seq=15 ttl=64 time=26.5 ms
> > > ^C
> > > --- 192.168.2.254 ping statistics ---
> > > 15 packets transmitted, 15 received, 0% packet loss, time 14056ms
> > > rtt min/avg/max/mdev = 0.523/10.263/70.898/18.157 ms
> >
> > While the latency spikes are better this is still a high amount of
> > variation. That still does not feel healthy.
> >
> > >
> > > 2. We dedicated compute nodes hosting only one VM each for comparison
> > > (hosting VM in /21 network and in /23 network):
> > > 2.1. Compute node has VM in /21 network
> > > - OVS shows high CPU usage
> > > CONTAINER ID   NAME                   CPU %         MEM %     NET I/O
> > >  BLOCK I/O    PIDS
> > > d28dc099cc43   openvswitch_vswitchd   210.81%      0.12%     0B / 0B
> > > 0B / 401kB   126
> > >
> > > - OVS shows ARP flow records changing quickly and frequently
> > > (in second n)(openvswitch-vswitchd)[compute-node]# ovs-dpctl
> > > dump-flows | grep arp | grep "192.168.1.254" | wc -l
> > > 1184
> > > (in second n+1)(openvswitch-vswitchd)[compute-node]# ovs-dpctl
> > > dump-flows | grep arp | grep "192.168.1.254" | wc -l
> > > 628
> > > (in second n+2)(openvswitch-vswitchd)[compute-node]# ovs-dpctl
> > > dump-flows | grep arp | grep "192.168.1.254" | wc -l
> > > 1256
> > > (in second n+3)(openvswitch-vswitchd)[compute-node]# ovs-dpctl
> > > dump-flows | grep arp | grep "192.168.1.254" | wc -l
> > > 962
> >
> > I just compared this to what we see on our nodes.
> > There we mostly have 1 flow for newly sent arp requests. Since the peer
> > sending the arp request should cache the response there should be no
> > reason to send them regularly.
> >
> > I would propose you look deeper into these different flows for a single
> > IP. It would probably be interesting what are the differences between
> > them.
> > Maybe you also see something interesting if you do a tcpdump that
> > filters on arp requests and that ip address. If you see a lot of these
> > requests you can maybe find the cause of them.
> >
> > >
> > > - The number of OVS flows fluctuates, and packet drops occur even when
> > > no traffic is generated by the VM.
> > > (in second n)(openvswitch-vswitchd)[compute-node]# ovs-appctl dpctl/show
> > > system@ovs-system:
> > >   lookups: hit:54725926968 missed:525971260 lost:69166
> > >   flows: 1962
> > >   masks: hit:57009090988 total:19 hit/pkt:1.03
> > >   cache: hit:54183648664 hit-rate:98.07%
> > >   caches:
> > >     masks-cache: size:256
> > > (in second n+1)(openvswitch-vswitchd)[compute-node]# ovs-appctl dpctl/show
> > > system@ovs-system:
> > >   lookups: hit:54725931065 missed:525972068 lost:69509
> > >   flows: 2474
> > >   masks: hit:57009110492 total:19 hit/pkt:1.03
> > >   cache: hit:54183652139 hit-rate:98.07%
> > >   caches:
> > >     masks-cache: size:256
> > > (in second n+2)(openvswitch-vswitchd)[compute-node]# ovs-appctl dpctl/show
> > > system@ovs-system:
> > >   lookups: hit:54725936481 missed:525972862 lost:69509
> > >   flows: 225
> > >   masks: hit:57009126369 total:12 hit/pkt:1.03
> > >   cache: hit:54183657403 hit-rate:98.07%
> > >   caches:
> > >     masks-cache: size:256
> >
> > What might be interesting here would be "ovs-appctl upcall/show".
> > It shows how many flows are installed over time and what the current
> > flow limit and dump duration is.
> >
> > >
> > > 2.2. Compute node has VM in /23 network show better results:
> > > - ARP flow count is stable.
> > > (in second n)(openvswitch-vswitchd)[compute-node]# ovs-dpctl
> > > dump-flows | grep arp | grep "192.168.2.254" | wc -l
> > > 403
> > > (in second n+1)(openvswitch-vswitchd)[compute-node]# ovs-dpctl
> > > dump-flows | grep arp | grep "192.168.2.254" | wc -l
> > > 403
> > > (in second n+2)(openvswitch-vswitchd)[compute-node]# ovs-dpctl
> > > dump-flows | grep arp | grep "192.168.2.254" | wc -l
> > > 402
> > > (in second n+3)(openvswitch-vswitchd)[compute-node]# ovs-dpctl
> > > dump-flows | grep arp | grep "192.168.2.254" | wc -l
> > > 397
> > >
> > > - Flow entries are stable.
> > > (in second n)(openvswitch-vswitchd)[compute-node]# ovs-appctl dpctl/show
> > > system@ovs-system:
> > >   lookups: hit:54763442917 missed:539025268 lost:4603675
> > >   flows: 2666
> > >   masks: hit:60577538636 total:30 hit/pkt:1.10
> > >   cache: hit:54123911742 hit-rate:97.87%
> > >   caches:
> > >     masks-cache: size:256
> > > (in second n+1)(openvswitch-vswitchd)[compute-node]# ovs-appctl dpctl/show
> > > system@ovs-system:
> > >   lookups: hit:54763450196 missed:539025306 lost:4603675
> > >   flows: 2670
> > >   masks: hit:60577547869 total:31 hit/pkt:1.10
> > >   cache: hit:54123918904 hit-rate:97.87%
> > >   caches:
> > >     masks-cache: size:256
> > > (in second n+2)(openvswitch-vswitchd)[compute-node]# ovs-appctl dpctl/show
> > > system@ovs-system:
> > >   lookups: hit:54763458923 missed:539025355 lost:4603675
> > >   flows: 2669
> > >   masks: hit:60577558873 total:31 hit/pkt:1.10
> > >   cache: hit:54123927487 hit-rate:97.87%
> > >   caches:
> > >     masks-cache: size:256
> > >   port 0: ovs-system (internal)
> > >   port 1: br-ex (internal)
> > >   port 2: bond1
> > >   port 3: br-int (internal)
> > >   port 4: genev_sys_6081 (geneve: packet_type=ptap)
> > >   port 5: tap19510eb6-89
> > >   port 6: tap6ff45ca7-64
> > >   port 7: tap6b851650-70
> > >   port 8: tap0d5f11f9-80
> > >
> > > - OVS CPU usage remains normal (almost no spike).
> > > CONTAINER ID   NAME                   CPU %        MEM %     NET I/O
> > > BLOCK I/O     PIDS
> > > 6262f4bc6ab1   openvswitch_vswitchd   11.16%      0.16%     0B / 0B
> > > 0B / 45.7MB   127
> > >
> > > 3. Workaround
> > > - As a temporary workaround, increasing `max-idle` to 1000000 and
> > > `max-revalidator` values to 10000 appears to reduce the problem for
> > > the VM in CIDR /21 (default values: `max-idle=10000ms~10s`,
> > > `max-revalidator=500ms` per documentation).
> > > 3.1. OVS ARP flow count remains stable (no fluctuation).
> > > (in second n)(openvswitch-vswitchd)[compute-node]# ovs-dpctl
> > > dump-flows | grep arp | grep "192.168.1.254" | wc -l
> > > 1902
> > > (in second n+1)(openvswitch-vswitchd)[compute-node]# ovs-dpctl
> > > dump-flows | grep arp | grep "192.168.1.254" | wc -l
> > > 1902
> > > (in second n+2)(openvswitch-vswitchd)[compute-node]# ovs-dpctl
> > > dump-flows | grep arp | grep "192.168.1.254" | wc -l
> > > 1903
> > >
> > > 3.2. Flow entries fluctuate less.
> > > (in second n)(openvswitch-vswitchd)[compute-node]# ovs-appctl dpctl/show
> > > system@ovs-system:
> > >   lookups: hit:53647897445 missed:569251988 lost:8108302
> > >   flows: 2697
> > >   masks: hit:56660348845 total:31 hit/pkt:1.05
> > >   cache: hit:52974334763 hit-rate:97.71%
> > >   caches:
> > >     masks-cache: size:256
> > > (in second n+1)(openvswitch-vswitchd)[compute-node]# ovs-appctl dpctl/show
> > > system@ovs-system:
> > >   lookups: hit:53647898935 missed:569251992 lost:8108302
> > >   flows: 2701
> > >   masks: hit:56660350555 total:31 hit/pkt:1.05
> > >   cache: hit:52974336237 hit-rate:97.71%
> > >   caches:
> > >     masks-cache: size:256
> > > (in second n+2)(openvswitch-vswitchd)[compute-node]# ovs-appctl dpctl/show
> > > system@ovs-system:
> > >   lookups: hit:53647900325 missed:569251995 lost:8108302
> > >   flows: 2704
> > >   masks: hit:56660352110 total:31 hit/pkt:1.05
> > >   cache: hit:52974337617 hit-rate:97.71%
> > >   caches:
> > >     masks-cache: size:256
> > >
> > > 3.3. But OVS CPU usage remains high.
> > > CONTAINER ID   NAME                   CPU %        MEM %     NET I/O
> > > BLOCK I/O    PIDS
> > > 67fea86bbb86   openvswitch_vswitchd   487.15%      0.20%     0B / 0B
> > > 0B / 333MB   137
> > >
> > > 3.4. Ping to the gateway (from inside the VM) shows improved results.
> > > root@vm4:~# ping 192.168.1.254
> > > PING 192.168.1.254 (192.168.1.254) 56(84) bytes of data.
> > > 64 bytes from 192.168.1.254: icmp_seq=1 ttl=64 time=7.33 ms
> > > 64 bytes from 192.168.1.254: icmp_seq=2 ttl=64 time=1.78 ms
> > > 64 bytes from 192.168.1.254: icmp_seq=3 ttl=64 time=0.624 ms
> > > 64 bytes from 192.168.1.254: icmp_seq=4 ttl=64 time=24.2 ms
> > > 64 bytes from 192.168.1.254: icmp_seq=5 ttl=64 time=1.81 ms
> > > 64 bytes from 192.168.1.254: icmp_seq=6 ttl=64 time=4.98 ms
> > > 64 bytes from 192.168.1.254: icmp_seq=7 ttl=64 time=5.89 ms
> > > 64 bytes from 192.168.1.254: icmp_seq=8 ttl=64 time=6.57 ms
> > > 64 bytes from 192.168.1.254: icmp_seq=9 ttl=64 time=5.26 ms
> > > 64 bytes from 192.168.1.254: icmp_seq=10 ttl=64 time=1.07 ms
> > > 64 bytes from 192.168.1.254: icmp_seq=11 ttl=64 time=3.34 ms
> > > 64 bytes from 192.168.1.254: icmp_seq=12 ttl=64 time=2.53 ms
> > > 64 bytes from 192.168.1.254: icmp_seq=13 ttl=64 time=14.8 ms
> > > 64 bytes from 192.168.1.254: icmp_seq=14 ttl=64 time=29.4 ms
> > > 64 bytes from 192.168.1.254: icmp_seq=15 ttl=64 time=2.11 ms
> > > ^C
> > > --- 192.168.1.254 ping statistics ---
> > > 15 packets transmitted, 15 received, 0% packet loss, time 14039ms
> > > rtt min/avg/max/mdev = 0.624/7.442/29.369/8.363 ms
> > >
> > > Has anyone encountered a similar problem before? If so, could you
> > > share what the root cause was in your case, and what you found to be
> > > the most effective solution?
> >
> > The workaround above seems to mostly work by just keeping flows longer
> > in the datapath installed. If it works that means that there seem to be
> > a lot of different clients (probably around these 1902) that regularly
> > send arp requests. But they do not send them often enough that the flows
> > stay in the datapath.
> >
> > So i would propose to investigate where these arp requests come from.
> >
> > Hope it helps in some way.
> > Felix
> >
> > > Thanks in advance!
> > > _______________________________________________
> > > discuss mailing list
> > > [email protected]
> > > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
_______________________________________________
discuss mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to