On 7/28/25 10:19 PM, Han Zhou wrote: > On Mon, Jul 28, 2025 at 8:30 AM Ilya Maximets <i.maxim...@ovn.org> wrote: >>
Hi Ilya, Han, >> OVN routers are configured to drop any traffic with a destination >> being one of the Reserved Multicast Addresses (RFC 4291). >> >> This is done by matching on all the bits of ipv6.dst, except for bits >> 112-116 that cover all the addresses. Once installed into OVS, this >> turns into a following match: >> >> ipv6_dst=ff00::/fff0:ffff:ffff:ffff:ffff:ffff:ffff:ffff >> >> We fixed a large chunk of IPv6 datapath flow explosion issues by >> turning on prefix tacking in the flow classifier in OVS in commit >> 89e43f7528b0 ("controller: Fix IPv6 dp flow explosion by setting flow >> table prefixes."). However, prefix tracking doesn't work for masks >> that are not contiguous. That means that if a packet reaches a >> classifier subtable with non-contiguous mask, all the bits of that >> mask will be un-wildcarded. It's not a huge problem in a general case, >> because most non-contiguous masks would typically match on just a few >> bits. But ip6.mcast_rsvd is matching on 124 bits, un-wildcarding them >> for most of the IPv6 traffic traversing a router and causing creation >> of a separate exact-match datapath flow per destination IP. >> >> For setups that handle large amount of traffic from many different >> external addresses this issue makes IPv6 handling significantly harder >> than IPv4, causing much higher load on the datapath with potential >> overflow of datapath flow tables and a subsequent upcall storm. >> Even without the overflow, OVS spends a lot of time revalidating all >> these datapath flows burning CPU cycles. >> >> In general, since the number of external IP addresses is virtually >> unlimited, there should be no configuration where OVN exact-matches >> them, otherwise it will be a significant datapath scaling issue. >> >> Fix that by replacing a non-contiguous bit-match with a match on an >> address set where all the reserved multicast addresses are just listed >> directly. There are only 16 of them, so this should not be a huge >> problem to have extra 15 OpenFlow rules per router, but it will allow >> OVS to use prefix tracking for these flows and avoid creating separate >> datapath flow per destination IP. >> >> Also adding a simple lsp-to-external routing test case to make sure >> we don't have exact matches in this simple common use case. >> >> The OVS classifier can likely be improved to handle non-contiguous >> masks better, but it's not how the prefix tracking is designed, so >> it's not a simple task. >> >> Fixes: 677a3ba4d66b ("ovn: Add MLD support.") >> Reported-at: https://issues.redhat.com/browse/FDP-1557 >> Signed-off-by: Ilya Maximets <i.maxim...@ovn.org> >> --- >> lib/logical-fields.c | 8 ++- >> tests/ovn.at | 151 +++++++++++++++++++++++++++++++++++++++++++ >> 2 files changed, 157 insertions(+), 2 deletions(-) >> >> diff --git a/lib/logical-fields.c b/lib/logical-fields.c >> index e479a78c1..f19eb579b 100644 >> --- a/lib/logical-fields.c >> +++ b/lib/logical-fields.c >> @@ -266,8 +266,12 @@ ovn_init_symtab(struct shash *symtab) >> >> /* Predefined IPv6 multicast groups (RFC 4291, 2.7.1). */ >> expr_symtab_add_predicate(symtab, "ip6.mcast_rsvd", >> - "ip6.dst[116..127] == 0xff0 && " >> - "ip6.dst[0..111] == 0x0"); >> + "ip6.dst == { " >> + "ff00::0, ff01::0, ff02::0, ff03::0, " >> + "ff04::0, ff05::0, ff06::0, ff07::0, " >> + "ff08::0, ff09::0, ff0a::0, ff0b::0, " >> + "ff0c::0, ff0d::0, ff0e::0, ff0f::0 " >> + "}"); These are functionally equivalent and, playing devil's advocate a bit, it's extremely hard to prevent suboptimal matches to be added by OVN developers without having a very deep understanding of how the OVS clasifier works. I'm not sure how we can improve the developer experience, to be honest. Do you have any ideas by any chance? >> expr_symtab_add_predicate(symtab, "ip6.mcast_all_nodes", >> "ip6.dst == ff01::1 || ip6.dst == > ff02::1"); >> expr_symtab_add_predicate(symtab, "ip6.mcast_all_rtrs", >> diff --git a/tests/ovn.at b/tests/ovn.at >> index 0dabec8d9..18ce07e1a 100644 >> --- a/tests/ovn.at >> +++ b/tests/ovn.at >> @@ -41134,6 +41134,157 @@ OVN_CHECK_PACKETS([hv/vif1-tx.pcap], > [expected-vif1]) >> AT_CLEANUP >> ]) >> >> +dnl This test checks that the megaflows translated by ovs-vswitchd don't >> +dnl have extensive matches on external IP addresses for simple routing. >> +OVN_FOR_EACH_NORTHD([ >> +AT_SETUP([IPv4/v6 routing to external - megaflow check for src/dst > matches]) >> +AT_SKIP_IF([test $HAVE_SCAPY = no]) >> +ovn_start >> + >> +check ovn-nbctl ls-add sw0 >> + >> +check ovn-nbctl lsp-add sw0 vm0 >> +check ovn-nbctl lsp-set-addresses vm0 "f0:00:0f:01:02:03 10.0.0.3 > 1000::3" >> + >> +check ovn-nbctl ls-add sw1 >> + >> +check ovn-nbctl lsp-add sw1 ext >> +check ovn-nbctl lsp-set-addresses ext unknown >> +check ovn-nbctl lsp-set-type ext localnet >> +check ovn-nbctl lsp-set-options ext network_name=phys >> + >> +check ovn-nbctl lr-add lr0 >> + >> +check ovn-nbctl lrp-add lr0 lr0-sw0 fa:16:3e:00:00:01 10.0.0.250/24 > 1000::f0/64 >> +check ovn-nbctl lsp-add sw0 sw0-lr0 >> +check ovn-nbctl lsp-set-type sw0-lr0 router >> +check ovn-nbctl lsp-set-addresses sw0-lr0 router >> +check ovn-nbctl lsp-set-options sw0-lr0 router-port=lr0-sw0 >> + >> +check ovn-nbctl lrp-add lr0 lr0-sw1 fa:16:3e:00:00:02 20.0.0.250/24 > 2000::f0/64 >> +check ovn-nbctl lsp-add sw1 sw1-lr0 >> +check ovn-nbctl lsp-set-type sw1-lr0 router >> +check ovn-nbctl lsp-set-addresses sw1-lr0 router >> +check ovn-nbctl lsp-set-options sw1-lr0 router-port=lr0-sw1 >> + >> +dnl Add default routes for the external gateway. >> +check ovn-nbctl lr-route-add lr0 "0.0.0.0/0" 20.0.0.254 lr0-sw1 >> +check ovn-nbctl lr-route-add lr0 "::/0" 2000::fe lr0-sw1 >> + >> +net_add n1 >> +sim_add hv >> +as hv >> +check ovs-vsctl add-br br-phys >> +ovn_attach n1 br-phys 192.168.0.1 >> +check ovs-vsctl set open . external-ids:ovn-bridge-mappings=phys:br-phys >> +check ovs-vsctl add-port br-int vif1 -- \ >> + set Interface vif1 external-ids:iface-id=vm0 \ >> + options:tx_pcap=hv/vif1-tx.pcap \ >> + options:rxq_pcap=hv/vif1-rx.pcap \ >> + ofport-request=1 >> + >> +check ovn-nbctl --wait=sb sync >> +wait_for_ports_up >> + >> +dnl Create MAC binding entries for the external gateway, so OVN doesn't > need >> +dnl to ARP/ND for it. >> +lr0_dp=$(fetch_column Datapath_Binding _uuid external_ids:name=lr0) >> +check_uuid ovn-sbctl create mac_binding datapath=$lr0_dp > logical_port=lr0-sw1 \ >> + ip=\"2000::fe\" mac=\"f0:00:0f:01:02:fe\" >> +check_uuid ovn-sbctl create mac_binding datapath=$lr0_dp > logical_port=lr0-sw1 \ >> + ip=\"20.0.0.254\" mac=\"f0:00:0f:01:02:fe\" >> +check ovn-nbctl --wait=hv sync >> + >> +AS_BOX([IPv6 - from external to vm0]) >> +packet=$(fmt_pkt "Ether(dst='fa:16:3e:00:00:02', > src='f0:00:0f:01:02:fe')/ \ >> + IPv6(dst='1000::3', src='3000::4', hlim=64)/ \ >> + UDP(sport=53, dport=4369)") >> +as hv >> +ovs-appctl ofproto/trace br-phys in_port=br-phys_n1 $packet --names > > ext_ip6_ofproto_trace.txt >> +check ovs-appctl netdev-dummy/receive br-phys_n1 $packet >> + >> +AT_CAPTURE_FILE([ext_ip6_ofproto_trace.txt]) >> + >> +dnl Make sure the datapath flow doesn't match on a full external address. >> +AT_CHECK([grep Megaflow ext_ip6_ofproto_trace.txt], [0], [stdout]) >> +AT_CHECK([grep Megaflow ext_ip6_ofproto_trace.txt | grep -q '3000::4'], > [1]) >> + >> +dnl Make sure that the packet was received by vm0. The L2 addresses and > the >> +dnl hop limit will be different since the packet was routed. >> +packet=$(fmt_pkt "Ether(dst='f0:00:0f:01:02:03', > src='fa:16:3e:00:00:01')/ \ >> + IPv6(dst='1000::3', src='3000::4', hlim=63)/ \ >> + UDP(sport=53, dport=4369)") >> +echo $packet >> expected-vif1 >> +OVN_CHECK_PACKETS([hv/vif1-tx.pcap], [expected-vif1]) >> + >> +AS_BOX([IPv6 - from vm0 to external]) >> +packet=$(fmt_pkt "Ether(dst='fa:16:3e:00:00:01', > src='f0:00:0f:01:02:03')/ \ >> + IPv6(dst='3000::4', src='1000::3', hlim=64)/ \ >> + UDP(sport=53, dport=4369)") >> +as hv >> +ovs-appctl ofproto/trace br-int in_port=vif1 $packet --names > > vm0_ip6_ofproto_trace.txt >> +check ovs-appctl netdev-dummy/receive vif1 $packet >> + >> +AT_CAPTURE_FILE([vm0_ip6_ofproto_trace.txt]) >> + >> +dnl Make sure the datapath flow doesn't match on a full external address. >> +AT_CHECK([grep Megaflow vm0_ip6_ofproto_trace.txt], [0], [stdout]) >> +AT_CHECK([grep Megaflow vm0_ip6_ofproto_trace.txt | grep -q '3000::4'], > [1]) >> + >> +dnl Make sure that the packet was received externally. The L2 addresses > and >> +dnl the hop limit will be different since the packet was routed. >> +packet=$(fmt_pkt "Ether(dst='f0:00:0f:01:02:fe', > src='fa:16:3e:00:00:02')/ \ >> + IPv6(dst='3000::4', src='1000::3', hlim=63)/ \ >> + UDP(sport=53, dport=4369)") >> +echo $packet >> expected-ext >> +OVN_CHECK_PACKETS([hv/br-phys_n1-tx.pcap], [expected-ext]) >> + >> +AS_BOX([IPv4 - from external to vm0]) >> +packet=$(fmt_pkt "Ether(dst='fa:16:3e:00:00:02', > src='f0:00:0f:01:02:fe')/ \ >> + IP(dst='10.0.0.3', src='30.0.0.4', ttl=64)/ \ >> + UDP(sport=53, dport=4369)") >> +as hv >> +ovs-appctl ofproto/trace br-phys in_port=br-phys_n1 $packet --names > > ext_ip4_ofproto_trace.txt >> +check ovs-appctl netdev-dummy/receive br-phys_n1 $packet >> + >> +AT_CAPTURE_FILE([ext_ip4_ofproto_trace.txt]) >> + >> +dnl Make sure the datapath flow doesn't match on a full external address. >> +AT_CHECK([grep Megaflow ext_ip4_ofproto_trace.txt], [0], [stdout]) >> +AT_CHECK([grep Megaflow ext_ip4_ofproto_trace.txt | grep -q '30.0.0.4'], > [1]) >> + >> +dnl Make sure that the packet was received by vm0. The L2 addresses and > the >> +dnl hop limit will be different since the packet was routed. >> +packet=$(fmt_pkt "Ether(dst='f0:00:0f:01:02:03', > src='fa:16:3e:00:00:01')/ \ >> + IP(dst='10.0.0.3', src='30.0.0.4', ttl=63)/ \ >> + UDP(sport=53, dport=4369)") >> +echo $packet >> expected-vif1 >> +OVN_CHECK_PACKETS([hv/vif1-tx.pcap], [expected-vif1]) >> + >> +AS_BOX([IPv4 - from vm0 to external]) >> +packet=$(fmt_pkt "Ether(dst='fa:16:3e:00:00:01', > src='f0:00:0f:01:02:03')/ \ >> + IP(dst='30.0.0.4', src='10.0.0.3', ttl=64)/ \ >> + UDP(sport=53, dport=4369)") >> +as hv >> +ovs-appctl ofproto/trace br-int in_port=vif1 $packet --names > > vm0_ip4_ofproto_trace.txt >> +check ovs-appctl netdev-dummy/receive vif1 $packet >> + >> +AT_CAPTURE_FILE([vm0_ip4_ofproto_trace.txt]) >> + >> +dnl Make sure the datapath flow doesn't match on a full external address. >> +AT_CHECK([grep Megaflow vm0_ip4_ofproto_trace.txt], [0], [stdout]) >> +AT_CHECK([grep Megaflow vm0_ip4_ofproto_trace.txt | grep -q '30.0.0.4'], > [1]) >> + >> +dnl Make sure that the packet was received externally. The L2 addresses > and >> +dnl the hop limit will be different since the packet was routed. >> +packet=$(fmt_pkt "Ether(dst='f0:00:0f:01:02:fe', > src='fa:16:3e:00:00:02')/ \ >> + IP(dst='30.0.0.4', src='10.0.0.3', ttl=63)/ \ >> + UDP(sport=53, dport=4369)") >> +echo $packet >> expected-ext >> +OVN_CHECK_PACKETS([hv/br-phys_n1-tx.pcap], [expected-ext]) >> + >> +AT_CLEANUP >> +]) >> >> OVN_FOR_EACH_NORTHD([ >> AT_SETUP([Multichassis port I-P processing]) >> -- >> 2.50.1 >> >> _______________________________________________ >> dev mailing list >> d...@openvswitch.org >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > Thanks Ilya for the fix. > > I applied to main and backported down to 24.03. > In any case, thanks a lot for the fix and for getting it in, it makes sense to me too. > Regards, > Han > Regards, Dumitru _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev