On Mon, Jul 28, 2025 at 8:30 AM Ilya Maximets <i.maxim...@ovn.org> wrote: > > OVN routers are configured to drop any traffic with a destination > being one of the Reserved Multicast Addresses (RFC 4291). > > This is done by matching on all the bits of ipv6.dst, except for bits > 112-116 that cover all the addresses. Once installed into OVS, this > turns into a following match: > > ipv6_dst=ff00::/fff0:ffff:ffff:ffff:ffff:ffff:ffff:ffff > > We fixed a large chunk of IPv6 datapath flow explosion issues by > turning on prefix tacking in the flow classifier in OVS in commit > 89e43f7528b0 ("controller: Fix IPv6 dp flow explosion by setting flow > table prefixes."). However, prefix tracking doesn't work for masks > that are not contiguous. That means that if a packet reaches a > classifier subtable with non-contiguous mask, all the bits of that > mask will be un-wildcarded. It's not a huge problem in a general case, > because most non-contiguous masks would typically match on just a few > bits. But ip6.mcast_rsvd is matching on 124 bits, un-wildcarding them > for most of the IPv6 traffic traversing a router and causing creation > of a separate exact-match datapath flow per destination IP. > > For setups that handle large amount of traffic from many different > external addresses this issue makes IPv6 handling significantly harder > than IPv4, causing much higher load on the datapath with potential > overflow of datapath flow tables and a subsequent upcall storm. > Even without the overflow, OVS spends a lot of time revalidating all > these datapath flows burning CPU cycles. > > In general, since the number of external IP addresses is virtually > unlimited, there should be no configuration where OVN exact-matches > them, otherwise it will be a significant datapath scaling issue. > > Fix that by replacing a non-contiguous bit-match with a match on an > address set where all the reserved multicast addresses are just listed > directly. There are only 16 of them, so this should not be a huge > problem to have extra 15 OpenFlow rules per router, but it will allow > OVS to use prefix tracking for these flows and avoid creating separate > datapath flow per destination IP. > > Also adding a simple lsp-to-external routing test case to make sure > we don't have exact matches in this simple common use case. > > The OVS classifier can likely be improved to handle non-contiguous > masks better, but it's not how the prefix tracking is designed, so > it's not a simple task. > > Fixes: 677a3ba4d66b ("ovn: Add MLD support.") > Reported-at: https://issues.redhat.com/browse/FDP-1557 > Signed-off-by: Ilya Maximets <i.maxim...@ovn.org> > --- > lib/logical-fields.c | 8 ++- > tests/ovn.at | 151 +++++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 157 insertions(+), 2 deletions(-) > > diff --git a/lib/logical-fields.c b/lib/logical-fields.c > index e479a78c1..f19eb579b 100644 > --- a/lib/logical-fields.c > +++ b/lib/logical-fields.c > @@ -266,8 +266,12 @@ ovn_init_symtab(struct shash *symtab) > > /* Predefined IPv6 multicast groups (RFC 4291, 2.7.1). */ > expr_symtab_add_predicate(symtab, "ip6.mcast_rsvd", > - "ip6.dst[116..127] == 0xff0 && " > - "ip6.dst[0..111] == 0x0"); > + "ip6.dst == { " > + "ff00::0, ff01::0, ff02::0, ff03::0, " > + "ff04::0, ff05::0, ff06::0, ff07::0, " > + "ff08::0, ff09::0, ff0a::0, ff0b::0, " > + "ff0c::0, ff0d::0, ff0e::0, ff0f::0 " > + "}"); > expr_symtab_add_predicate(symtab, "ip6.mcast_all_nodes", > "ip6.dst == ff01::1 || ip6.dst == ff02::1"); > expr_symtab_add_predicate(symtab, "ip6.mcast_all_rtrs", > diff --git a/tests/ovn.at b/tests/ovn.at > index 0dabec8d9..18ce07e1a 100644 > --- a/tests/ovn.at > +++ b/tests/ovn.at > @@ -41134,6 +41134,157 @@ OVN_CHECK_PACKETS([hv/vif1-tx.pcap], [expected-vif1]) > AT_CLEANUP > ]) > > +dnl This test checks that the megaflows translated by ovs-vswitchd don't > +dnl have extensive matches on external IP addresses for simple routing. > +OVN_FOR_EACH_NORTHD([ > +AT_SETUP([IPv4/v6 routing to external - megaflow check for src/dst matches]) > +AT_SKIP_IF([test $HAVE_SCAPY = no]) > +ovn_start > + > +check ovn-nbctl ls-add sw0 > + > +check ovn-nbctl lsp-add sw0 vm0 > +check ovn-nbctl lsp-set-addresses vm0 "f0:00:0f:01:02:03 10.0.0.3 1000::3" > + > +check ovn-nbctl ls-add sw1 > + > +check ovn-nbctl lsp-add sw1 ext > +check ovn-nbctl lsp-set-addresses ext unknown > +check ovn-nbctl lsp-set-type ext localnet > +check ovn-nbctl lsp-set-options ext network_name=phys > + > +check ovn-nbctl lr-add lr0 > + > +check ovn-nbctl lrp-add lr0 lr0-sw0 fa:16:3e:00:00:01 10.0.0.250/24 1000::f0/64 > +check ovn-nbctl lsp-add sw0 sw0-lr0 > +check ovn-nbctl lsp-set-type sw0-lr0 router > +check ovn-nbctl lsp-set-addresses sw0-lr0 router > +check ovn-nbctl lsp-set-options sw0-lr0 router-port=lr0-sw0 > + > +check ovn-nbctl lrp-add lr0 lr0-sw1 fa:16:3e:00:00:02 20.0.0.250/24 2000::f0/64 > +check ovn-nbctl lsp-add sw1 sw1-lr0 > +check ovn-nbctl lsp-set-type sw1-lr0 router > +check ovn-nbctl lsp-set-addresses sw1-lr0 router > +check ovn-nbctl lsp-set-options sw1-lr0 router-port=lr0-sw1 > + > +dnl Add default routes for the external gateway. > +check ovn-nbctl lr-route-add lr0 "0.0.0.0/0" 20.0.0.254 lr0-sw1 > +check ovn-nbctl lr-route-add lr0 "::/0" 2000::fe lr0-sw1 > + > +net_add n1 > +sim_add hv > +as hv > +check ovs-vsctl add-br br-phys > +ovn_attach n1 br-phys 192.168.0.1 > +check ovs-vsctl set open . external-ids:ovn-bridge-mappings=phys:br-phys > +check ovs-vsctl add-port br-int vif1 -- \ > + set Interface vif1 external-ids:iface-id=vm0 \ > + options:tx_pcap=hv/vif1-tx.pcap \ > + options:rxq_pcap=hv/vif1-rx.pcap \ > + ofport-request=1 > + > +check ovn-nbctl --wait=sb sync > +wait_for_ports_up > + > +dnl Create MAC binding entries for the external gateway, so OVN doesn't need > +dnl to ARP/ND for it. > +lr0_dp=$(fetch_column Datapath_Binding _uuid external_ids:name=lr0) > +check_uuid ovn-sbctl create mac_binding datapath=$lr0_dp logical_port=lr0-sw1 \ > + ip=\"2000::fe\" mac=\"f0:00:0f:01:02:fe\" > +check_uuid ovn-sbctl create mac_binding datapath=$lr0_dp logical_port=lr0-sw1 \ > + ip=\"20.0.0.254\" mac=\"f0:00:0f:01:02:fe\" > +check ovn-nbctl --wait=hv sync > + > +AS_BOX([IPv6 - from external to vm0]) > +packet=$(fmt_pkt "Ether(dst='fa:16:3e:00:00:02', src='f0:00:0f:01:02:fe')/ \ > + IPv6(dst='1000::3', src='3000::4', hlim=64)/ \ > + UDP(sport=53, dport=4369)") > +as hv > +ovs-appctl ofproto/trace br-phys in_port=br-phys_n1 $packet --names > ext_ip6_ofproto_trace.txt > +check ovs-appctl netdev-dummy/receive br-phys_n1 $packet > + > +AT_CAPTURE_FILE([ext_ip6_ofproto_trace.txt]) > + > +dnl Make sure the datapath flow doesn't match on a full external address. > +AT_CHECK([grep Megaflow ext_ip6_ofproto_trace.txt], [0], [stdout]) > +AT_CHECK([grep Megaflow ext_ip6_ofproto_trace.txt | grep -q '3000::4'], [1]) > + > +dnl Make sure that the packet was received by vm0. The L2 addresses and the > +dnl hop limit will be different since the packet was routed. > +packet=$(fmt_pkt "Ether(dst='f0:00:0f:01:02:03', src='fa:16:3e:00:00:01')/ \ > + IPv6(dst='1000::3', src='3000::4', hlim=63)/ \ > + UDP(sport=53, dport=4369)") > +echo $packet >> expected-vif1 > +OVN_CHECK_PACKETS([hv/vif1-tx.pcap], [expected-vif1]) > + > +AS_BOX([IPv6 - from vm0 to external]) > +packet=$(fmt_pkt "Ether(dst='fa:16:3e:00:00:01', src='f0:00:0f:01:02:03')/ \ > + IPv6(dst='3000::4', src='1000::3', hlim=64)/ \ > + UDP(sport=53, dport=4369)") > +as hv > +ovs-appctl ofproto/trace br-int in_port=vif1 $packet --names > vm0_ip6_ofproto_trace.txt > +check ovs-appctl netdev-dummy/receive vif1 $packet > + > +AT_CAPTURE_FILE([vm0_ip6_ofproto_trace.txt]) > + > +dnl Make sure the datapath flow doesn't match on a full external address. > +AT_CHECK([grep Megaflow vm0_ip6_ofproto_trace.txt], [0], [stdout]) > +AT_CHECK([grep Megaflow vm0_ip6_ofproto_trace.txt | grep -q '3000::4'], [1]) > + > +dnl Make sure that the packet was received externally. The L2 addresses and > +dnl the hop limit will be different since the packet was routed. > +packet=$(fmt_pkt "Ether(dst='f0:00:0f:01:02:fe', src='fa:16:3e:00:00:02')/ \ > + IPv6(dst='3000::4', src='1000::3', hlim=63)/ \ > + UDP(sport=53, dport=4369)") > +echo $packet >> expected-ext > +OVN_CHECK_PACKETS([hv/br-phys_n1-tx.pcap], [expected-ext]) > + > +AS_BOX([IPv4 - from external to vm0]) > +packet=$(fmt_pkt "Ether(dst='fa:16:3e:00:00:02', src='f0:00:0f:01:02:fe')/ \ > + IP(dst='10.0.0.3', src='30.0.0.4', ttl=64)/ \ > + UDP(sport=53, dport=4369)") > +as hv > +ovs-appctl ofproto/trace br-phys in_port=br-phys_n1 $packet --names > ext_ip4_ofproto_trace.txt > +check ovs-appctl netdev-dummy/receive br-phys_n1 $packet > + > +AT_CAPTURE_FILE([ext_ip4_ofproto_trace.txt]) > + > +dnl Make sure the datapath flow doesn't match on a full external address. > +AT_CHECK([grep Megaflow ext_ip4_ofproto_trace.txt], [0], [stdout]) > +AT_CHECK([grep Megaflow ext_ip4_ofproto_trace.txt | grep -q '30.0.0.4'], [1]) > + > +dnl Make sure that the packet was received by vm0. The L2 addresses and the > +dnl hop limit will be different since the packet was routed. > +packet=$(fmt_pkt "Ether(dst='f0:00:0f:01:02:03', src='fa:16:3e:00:00:01')/ \ > + IP(dst='10.0.0.3', src='30.0.0.4', ttl=63)/ \ > + UDP(sport=53, dport=4369)") > +echo $packet >> expected-vif1 > +OVN_CHECK_PACKETS([hv/vif1-tx.pcap], [expected-vif1]) > + > +AS_BOX([IPv4 - from vm0 to external]) > +packet=$(fmt_pkt "Ether(dst='fa:16:3e:00:00:01', src='f0:00:0f:01:02:03')/ \ > + IP(dst='30.0.0.4', src='10.0.0.3', ttl=64)/ \ > + UDP(sport=53, dport=4369)") > +as hv > +ovs-appctl ofproto/trace br-int in_port=vif1 $packet --names > vm0_ip4_ofproto_trace.txt > +check ovs-appctl netdev-dummy/receive vif1 $packet > + > +AT_CAPTURE_FILE([vm0_ip4_ofproto_trace.txt]) > + > +dnl Make sure the datapath flow doesn't match on a full external address. > +AT_CHECK([grep Megaflow vm0_ip4_ofproto_trace.txt], [0], [stdout]) > +AT_CHECK([grep Megaflow vm0_ip4_ofproto_trace.txt | grep -q '30.0.0.4'], [1]) > + > +dnl Make sure that the packet was received externally. The L2 addresses and > +dnl the hop limit will be different since the packet was routed. > +packet=$(fmt_pkt "Ether(dst='f0:00:0f:01:02:fe', src='fa:16:3e:00:00:02')/ \ > + IP(dst='30.0.0.4', src='10.0.0.3', ttl=63)/ \ > + UDP(sport=53, dport=4369)") > +echo $packet >> expected-ext > +OVN_CHECK_PACKETS([hv/br-phys_n1-tx.pcap], [expected-ext]) > + > +AT_CLEANUP > +]) > > OVN_FOR_EACH_NORTHD([ > AT_SETUP([Multichassis port I-P processing]) > -- > 2.50.1 > > _______________________________________________ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Thanks Ilya for the fix. I applied to main and backported down to 24.03. Regards, Han _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev