On Mon, Jul 28, 2025 at 8:30 AM Ilya Maximets <i.maxim...@ovn.org> wrote:
>
> OVN routers are configured to drop any traffic with a destination
> being one of the Reserved Multicast Addresses (RFC 4291).
>
> This is done by matching on all the bits of ipv6.dst, except for bits
> 112-116 that cover all the addresses.  Once installed into OVS, this
> turns into a following match:
>
>   ipv6_dst=ff00::/fff0:ffff:ffff:ffff:ffff:ffff:ffff:ffff
>
> We fixed a large chunk of IPv6 datapath flow explosion issues by
> turning on prefix tacking in the flow classifier in OVS in commit
> 89e43f7528b0 ("controller: Fix IPv6 dp flow explosion by setting flow
> table prefixes.").  However, prefix tracking doesn't work for masks
> that are not contiguous.  That means that if a packet reaches a
> classifier subtable with non-contiguous mask, all the bits of that
> mask will be un-wildcarded.  It's not a huge problem in a general case,
> because most non-contiguous masks would typically match on just a few
> bits.  But ip6.mcast_rsvd is matching on 124 bits, un-wildcarding them
> for most of the IPv6 traffic traversing a router and causing creation
> of a separate exact-match datapath flow per destination IP.
>
> For setups that handle large amount of traffic from many different
> external addresses this issue makes IPv6 handling significantly harder
> than IPv4, causing much higher load on the datapath with potential
> overflow of datapath flow tables and a subsequent upcall storm.
> Even without the overflow, OVS spends a lot of time revalidating all
> these datapath flows burning CPU cycles.
>
> In general, since the number of external IP addresses is virtually
> unlimited, there should be no configuration where OVN exact-matches
> them, otherwise it will be a significant datapath scaling issue.
>
> Fix that by replacing a non-contiguous bit-match with a match on an
> address set where all the reserved multicast addresses are just listed
> directly.  There are only 16 of them, so this should not be a huge
> problem to have extra 15 OpenFlow rules per router, but it will allow
> OVS to use prefix tracking for these flows and avoid creating separate
> datapath flow per destination IP.
>
> Also adding a simple lsp-to-external routing test case to make sure
> we don't have exact matches in this simple common use case.
>
> The OVS classifier can likely be improved to handle non-contiguous
> masks better, but it's not how the prefix tracking is designed, so
> it's not a simple task.
>
> Fixes: 677a3ba4d66b ("ovn: Add MLD support.")
> Reported-at: https://issues.redhat.com/browse/FDP-1557
> Signed-off-by: Ilya Maximets <i.maxim...@ovn.org>
> ---
>  lib/logical-fields.c |   8 ++-
>  tests/ovn.at         | 151 +++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 157 insertions(+), 2 deletions(-)
>
> diff --git a/lib/logical-fields.c b/lib/logical-fields.c
> index e479a78c1..f19eb579b 100644
> --- a/lib/logical-fields.c
> +++ b/lib/logical-fields.c
> @@ -266,8 +266,12 @@ ovn_init_symtab(struct shash *symtab)
>
>      /* Predefined IPv6 multicast groups (RFC 4291, 2.7.1). */
>      expr_symtab_add_predicate(symtab, "ip6.mcast_rsvd",
> -                              "ip6.dst[116..127] == 0xff0 && "
> -                              "ip6.dst[0..111] == 0x0");
> +                              "ip6.dst == { "
> +                                  "ff00::0, ff01::0, ff02::0, ff03::0, "
> +                                  "ff04::0, ff05::0, ff06::0, ff07::0, "
> +                                  "ff08::0, ff09::0, ff0a::0, ff0b::0, "
> +                                  "ff0c::0, ff0d::0, ff0e::0, ff0f::0 "
> +                               "}");
>      expr_symtab_add_predicate(symtab, "ip6.mcast_all_nodes",
>                                "ip6.dst == ff01::1 || ip6.dst ==
ff02::1");
>      expr_symtab_add_predicate(symtab, "ip6.mcast_all_rtrs",
> diff --git a/tests/ovn.at b/tests/ovn.at
> index 0dabec8d9..18ce07e1a 100644
> --- a/tests/ovn.at
> +++ b/tests/ovn.at
> @@ -41134,6 +41134,157 @@ OVN_CHECK_PACKETS([hv/vif1-tx.pcap],
[expected-vif1])
>  AT_CLEANUP
>  ])
>
> +dnl This test checks that the megaflows translated by ovs-vswitchd don't
> +dnl have extensive matches on external IP addresses for simple routing.
> +OVN_FOR_EACH_NORTHD([
> +AT_SETUP([IPv4/v6 routing to external - megaflow check for src/dst
matches])
> +AT_SKIP_IF([test $HAVE_SCAPY = no])
> +ovn_start
> +
> +check ovn-nbctl ls-add sw0
> +
> +check ovn-nbctl lsp-add sw0 vm0
> +check ovn-nbctl lsp-set-addresses vm0 "f0:00:0f:01:02:03 10.0.0.3
1000::3"
> +
> +check ovn-nbctl ls-add sw1
> +
> +check ovn-nbctl lsp-add sw1 ext
> +check ovn-nbctl lsp-set-addresses ext unknown
> +check ovn-nbctl lsp-set-type ext localnet
> +check ovn-nbctl lsp-set-options ext network_name=phys
> +
> +check ovn-nbctl lr-add lr0
> +
> +check ovn-nbctl lrp-add lr0 lr0-sw0 fa:16:3e:00:00:01 10.0.0.250/24
1000::f0/64
> +check ovn-nbctl lsp-add sw0 sw0-lr0
> +check ovn-nbctl lsp-set-type sw0-lr0 router
> +check ovn-nbctl lsp-set-addresses sw0-lr0 router
> +check ovn-nbctl lsp-set-options sw0-lr0 router-port=lr0-sw0
> +
> +check ovn-nbctl lrp-add lr0 lr0-sw1 fa:16:3e:00:00:02 20.0.0.250/24
2000::f0/64
> +check ovn-nbctl lsp-add sw1 sw1-lr0
> +check ovn-nbctl lsp-set-type sw1-lr0 router
> +check ovn-nbctl lsp-set-addresses sw1-lr0 router
> +check ovn-nbctl lsp-set-options sw1-lr0 router-port=lr0-sw1
> +
> +dnl Add default routes for the external gateway.
> +check ovn-nbctl lr-route-add lr0 "0.0.0.0/0" 20.0.0.254 lr0-sw1
> +check ovn-nbctl lr-route-add lr0 "::/0" 2000::fe lr0-sw1
> +
> +net_add n1
> +sim_add hv
> +as hv
> +check ovs-vsctl add-br br-phys
> +ovn_attach n1 br-phys 192.168.0.1
> +check ovs-vsctl set open . external-ids:ovn-bridge-mappings=phys:br-phys
> +check ovs-vsctl add-port br-int vif1 -- \
> +    set Interface vif1 external-ids:iface-id=vm0 \
> +    options:tx_pcap=hv/vif1-tx.pcap \
> +    options:rxq_pcap=hv/vif1-rx.pcap \
> +    ofport-request=1
> +
> +check ovn-nbctl --wait=sb sync
> +wait_for_ports_up
> +
> +dnl Create MAC binding entries for the external gateway, so OVN doesn't
need
> +dnl to ARP/ND for it.
> +lr0_dp=$(fetch_column Datapath_Binding _uuid external_ids:name=lr0)
> +check_uuid ovn-sbctl create mac_binding datapath=$lr0_dp
logical_port=lr0-sw1 \
> +    ip=\"2000::fe\" mac=\"f0:00:0f:01:02:fe\"
> +check_uuid ovn-sbctl create mac_binding datapath=$lr0_dp
logical_port=lr0-sw1 \
> +    ip=\"20.0.0.254\" mac=\"f0:00:0f:01:02:fe\"
> +check ovn-nbctl --wait=hv sync
> +
> +AS_BOX([IPv6 - from external to vm0])
> +packet=$(fmt_pkt "Ether(dst='fa:16:3e:00:00:02',
src='f0:00:0f:01:02:fe')/ \
> +                  IPv6(dst='1000::3', src='3000::4', hlim=64)/ \
> +                  UDP(sport=53, dport=4369)")
> +as hv
> +ovs-appctl ofproto/trace br-phys in_port=br-phys_n1 $packet --names >
ext_ip6_ofproto_trace.txt
> +check ovs-appctl netdev-dummy/receive br-phys_n1 $packet
> +
> +AT_CAPTURE_FILE([ext_ip6_ofproto_trace.txt])
> +
> +dnl Make sure the datapath flow doesn't match on a full external address.
> +AT_CHECK([grep Megaflow ext_ip6_ofproto_trace.txt], [0], [stdout])
> +AT_CHECK([grep Megaflow ext_ip6_ofproto_trace.txt | grep -q '3000::4'],
[1])
> +
> +dnl Make sure that the packet was received by vm0.  The L2 addresses and
the
> +dnl hop limit will be different since the packet was routed.
> +packet=$(fmt_pkt "Ether(dst='f0:00:0f:01:02:03',
src='fa:16:3e:00:00:01')/ \
> +                  IPv6(dst='1000::3', src='3000::4', hlim=63)/ \
> +                  UDP(sport=53, dport=4369)")
> +echo $packet >> expected-vif1
> +OVN_CHECK_PACKETS([hv/vif1-tx.pcap], [expected-vif1])
> +
> +AS_BOX([IPv6 - from vm0 to external])
> +packet=$(fmt_pkt "Ether(dst='fa:16:3e:00:00:01',
src='f0:00:0f:01:02:03')/ \
> +                  IPv6(dst='3000::4', src='1000::3', hlim=64)/ \
> +                  UDP(sport=53, dport=4369)")
> +as hv
> +ovs-appctl ofproto/trace br-int in_port=vif1 $packet --names >
vm0_ip6_ofproto_trace.txt
> +check ovs-appctl netdev-dummy/receive vif1 $packet
> +
> +AT_CAPTURE_FILE([vm0_ip6_ofproto_trace.txt])
> +
> +dnl Make sure the datapath flow doesn't match on a full external address.
> +AT_CHECK([grep Megaflow vm0_ip6_ofproto_trace.txt], [0], [stdout])
> +AT_CHECK([grep Megaflow vm0_ip6_ofproto_trace.txt | grep -q '3000::4'],
[1])
> +
> +dnl Make sure that the packet was received externally.  The L2 addresses
and
> +dnl the hop limit will be different since the packet was routed.
> +packet=$(fmt_pkt "Ether(dst='f0:00:0f:01:02:fe',
src='fa:16:3e:00:00:02')/ \
> +                  IPv6(dst='3000::4', src='1000::3', hlim=63)/ \
> +                  UDP(sport=53, dport=4369)")
> +echo $packet >> expected-ext
> +OVN_CHECK_PACKETS([hv/br-phys_n1-tx.pcap], [expected-ext])
> +
> +AS_BOX([IPv4 - from external to vm0])
> +packet=$(fmt_pkt "Ether(dst='fa:16:3e:00:00:02',
src='f0:00:0f:01:02:fe')/ \
> +                  IP(dst='10.0.0.3', src='30.0.0.4', ttl=64)/ \
> +                  UDP(sport=53, dport=4369)")
> +as hv
> +ovs-appctl ofproto/trace br-phys in_port=br-phys_n1 $packet --names >
ext_ip4_ofproto_trace.txt
> +check ovs-appctl netdev-dummy/receive br-phys_n1 $packet
> +
> +AT_CAPTURE_FILE([ext_ip4_ofproto_trace.txt])
> +
> +dnl Make sure the datapath flow doesn't match on a full external address.
> +AT_CHECK([grep Megaflow ext_ip4_ofproto_trace.txt], [0], [stdout])
> +AT_CHECK([grep Megaflow ext_ip4_ofproto_trace.txt | grep -q '30.0.0.4'],
[1])
> +
> +dnl Make sure that the packet was received by vm0.  The L2 addresses and
the
> +dnl hop limit will be different since the packet was routed.
> +packet=$(fmt_pkt "Ether(dst='f0:00:0f:01:02:03',
src='fa:16:3e:00:00:01')/ \
> +                  IP(dst='10.0.0.3', src='30.0.0.4', ttl=63)/ \
> +                  UDP(sport=53, dport=4369)")
> +echo $packet >> expected-vif1
> +OVN_CHECK_PACKETS([hv/vif1-tx.pcap], [expected-vif1])
> +
> +AS_BOX([IPv4 - from vm0 to external])
> +packet=$(fmt_pkt "Ether(dst='fa:16:3e:00:00:01',
src='f0:00:0f:01:02:03')/ \
> +                  IP(dst='30.0.0.4', src='10.0.0.3', ttl=64)/ \
> +                  UDP(sport=53, dport=4369)")
> +as hv
> +ovs-appctl ofproto/trace br-int in_port=vif1 $packet --names >
vm0_ip4_ofproto_trace.txt
> +check ovs-appctl netdev-dummy/receive vif1 $packet
> +
> +AT_CAPTURE_FILE([vm0_ip4_ofproto_trace.txt])
> +
> +dnl Make sure the datapath flow doesn't match on a full external address.
> +AT_CHECK([grep Megaflow vm0_ip4_ofproto_trace.txt], [0], [stdout])
> +AT_CHECK([grep Megaflow vm0_ip4_ofproto_trace.txt | grep -q '30.0.0.4'],
[1])
> +
> +dnl Make sure that the packet was received externally.  The L2 addresses
and
> +dnl the hop limit will be different since the packet was routed.
> +packet=$(fmt_pkt "Ether(dst='f0:00:0f:01:02:fe',
src='fa:16:3e:00:00:02')/ \
> +                  IP(dst='30.0.0.4', src='10.0.0.3', ttl=63)/ \
> +                  UDP(sport=53, dport=4369)")
> +echo $packet >> expected-ext
> +OVN_CHECK_PACKETS([hv/br-phys_n1-tx.pcap], [expected-ext])
> +
> +AT_CLEANUP
> +])
>
>  OVN_FOR_EACH_NORTHD([
>  AT_SETUP([Multichassis port I-P processing])
> --
> 2.50.1
>
> _______________________________________________
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Thanks Ilya for the fix.

I applied to main and backported down to 24.03.

Regards,
Han
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to