On 7/28/25 10:19 PM, Han Zhou wrote:
> On Mon, Jul 28, 2025 at 8:30 AM Ilya Maximets <i.maxim...@ovn.org> wrote:
>>

Hi Ilya, Han,

>> OVN routers are configured to drop any traffic with a destination
>> being one of the Reserved Multicast Addresses (RFC 4291).
>>
>> This is done by matching on all the bits of ipv6.dst, except for bits
>> 112-116 that cover all the addresses.  Once installed into OVS, this
>> turns into a following match:
>>
>>   ipv6_dst=ff00::/fff0:ffff:ffff:ffff:ffff:ffff:ffff:ffff
>>
>> We fixed a large chunk of IPv6 datapath flow explosion issues by
>> turning on prefix tacking in the flow classifier in OVS in commit
>> 89e43f7528b0 ("controller: Fix IPv6 dp flow explosion by setting flow
>> table prefixes.").  However, prefix tracking doesn't work for masks
>> that are not contiguous.  That means that if a packet reaches a
>> classifier subtable with non-contiguous mask, all the bits of that
>> mask will be un-wildcarded.  It's not a huge problem in a general case,
>> because most non-contiguous masks would typically match on just a few
>> bits.  But ip6.mcast_rsvd is matching on 124 bits, un-wildcarding them
>> for most of the IPv6 traffic traversing a router and causing creation
>> of a separate exact-match datapath flow per destination IP.
>>
>> For setups that handle large amount of traffic from many different
>> external addresses this issue makes IPv6 handling significantly harder
>> than IPv4, causing much higher load on the datapath with potential
>> overflow of datapath flow tables and a subsequent upcall storm.
>> Even without the overflow, OVS spends a lot of time revalidating all
>> these datapath flows burning CPU cycles.
>>
>> In general, since the number of external IP addresses is virtually
>> unlimited, there should be no configuration where OVN exact-matches
>> them, otherwise it will be a significant datapath scaling issue.
>>
>> Fix that by replacing a non-contiguous bit-match with a match on an
>> address set where all the reserved multicast addresses are just listed
>> directly.  There are only 16 of them, so this should not be a huge
>> problem to have extra 15 OpenFlow rules per router, but it will allow
>> OVS to use prefix tracking for these flows and avoid creating separate
>> datapath flow per destination IP.
>>
>> Also adding a simple lsp-to-external routing test case to make sure
>> we don't have exact matches in this simple common use case.
>>
>> The OVS classifier can likely be improved to handle non-contiguous
>> masks better, but it's not how the prefix tracking is designed, so
>> it's not a simple task.
>>
>> Fixes: 677a3ba4d66b ("ovn: Add MLD support.")
>> Reported-at: https://issues.redhat.com/browse/FDP-1557
>> Signed-off-by: Ilya Maximets <i.maxim...@ovn.org>
>> ---
>>  lib/logical-fields.c |   8 ++-
>>  tests/ovn.at         | 151 +++++++++++++++++++++++++++++++++++++++++++
>>  2 files changed, 157 insertions(+), 2 deletions(-)
>>
>> diff --git a/lib/logical-fields.c b/lib/logical-fields.c
>> index e479a78c1..f19eb579b 100644
>> --- a/lib/logical-fields.c
>> +++ b/lib/logical-fields.c
>> @@ -266,8 +266,12 @@ ovn_init_symtab(struct shash *symtab)
>>
>>      /* Predefined IPv6 multicast groups (RFC 4291, 2.7.1). */
>>      expr_symtab_add_predicate(symtab, "ip6.mcast_rsvd",
>> -                              "ip6.dst[116..127] == 0xff0 && "
>> -                              "ip6.dst[0..111] == 0x0");
>> +                              "ip6.dst == { "
>> +                                  "ff00::0, ff01::0, ff02::0, ff03::0, "
>> +                                  "ff04::0, ff05::0, ff06::0, ff07::0, "
>> +                                  "ff08::0, ff09::0, ff0a::0, ff0b::0, "
>> +                                  "ff0c::0, ff0d::0, ff0e::0, ff0f::0 "
>> +                               "}");

These are functionally equivalent and, playing devil's advocate a bit,
it's extremely hard to prevent suboptimal matches to be added by OVN
developers without having a very deep understanding of how the OVS
clasifier works.

I'm not sure how we can improve the developer experience, to be honest.
Do you have any ideas by any chance?

>>      expr_symtab_add_predicate(symtab, "ip6.mcast_all_nodes",
>>                                "ip6.dst == ff01::1 || ip6.dst ==
> ff02::1");
>>      expr_symtab_add_predicate(symtab, "ip6.mcast_all_rtrs",
>> diff --git a/tests/ovn.at b/tests/ovn.at
>> index 0dabec8d9..18ce07e1a 100644
>> --- a/tests/ovn.at
>> +++ b/tests/ovn.at
>> @@ -41134,6 +41134,157 @@ OVN_CHECK_PACKETS([hv/vif1-tx.pcap],
> [expected-vif1])
>>  AT_CLEANUP
>>  ])
>>
>> +dnl This test checks that the megaflows translated by ovs-vswitchd don't
>> +dnl have extensive matches on external IP addresses for simple routing.
>> +OVN_FOR_EACH_NORTHD([
>> +AT_SETUP([IPv4/v6 routing to external - megaflow check for src/dst
> matches])
>> +AT_SKIP_IF([test $HAVE_SCAPY = no])
>> +ovn_start
>> +
>> +check ovn-nbctl ls-add sw0
>> +
>> +check ovn-nbctl lsp-add sw0 vm0
>> +check ovn-nbctl lsp-set-addresses vm0 "f0:00:0f:01:02:03 10.0.0.3
> 1000::3"
>> +
>> +check ovn-nbctl ls-add sw1
>> +
>> +check ovn-nbctl lsp-add sw1 ext
>> +check ovn-nbctl lsp-set-addresses ext unknown
>> +check ovn-nbctl lsp-set-type ext localnet
>> +check ovn-nbctl lsp-set-options ext network_name=phys
>> +
>> +check ovn-nbctl lr-add lr0
>> +
>> +check ovn-nbctl lrp-add lr0 lr0-sw0 fa:16:3e:00:00:01 10.0.0.250/24
> 1000::f0/64
>> +check ovn-nbctl lsp-add sw0 sw0-lr0
>> +check ovn-nbctl lsp-set-type sw0-lr0 router
>> +check ovn-nbctl lsp-set-addresses sw0-lr0 router
>> +check ovn-nbctl lsp-set-options sw0-lr0 router-port=lr0-sw0
>> +
>> +check ovn-nbctl lrp-add lr0 lr0-sw1 fa:16:3e:00:00:02 20.0.0.250/24
> 2000::f0/64
>> +check ovn-nbctl lsp-add sw1 sw1-lr0
>> +check ovn-nbctl lsp-set-type sw1-lr0 router
>> +check ovn-nbctl lsp-set-addresses sw1-lr0 router
>> +check ovn-nbctl lsp-set-options sw1-lr0 router-port=lr0-sw1
>> +
>> +dnl Add default routes for the external gateway.
>> +check ovn-nbctl lr-route-add lr0 "0.0.0.0/0" 20.0.0.254 lr0-sw1
>> +check ovn-nbctl lr-route-add lr0 "::/0" 2000::fe lr0-sw1
>> +
>> +net_add n1
>> +sim_add hv
>> +as hv
>> +check ovs-vsctl add-br br-phys
>> +ovn_attach n1 br-phys 192.168.0.1
>> +check ovs-vsctl set open . external-ids:ovn-bridge-mappings=phys:br-phys
>> +check ovs-vsctl add-port br-int vif1 -- \
>> +    set Interface vif1 external-ids:iface-id=vm0 \
>> +    options:tx_pcap=hv/vif1-tx.pcap \
>> +    options:rxq_pcap=hv/vif1-rx.pcap \
>> +    ofport-request=1
>> +
>> +check ovn-nbctl --wait=sb sync
>> +wait_for_ports_up
>> +
>> +dnl Create MAC binding entries for the external gateway, so OVN doesn't
> need
>> +dnl to ARP/ND for it.
>> +lr0_dp=$(fetch_column Datapath_Binding _uuid external_ids:name=lr0)
>> +check_uuid ovn-sbctl create mac_binding datapath=$lr0_dp
> logical_port=lr0-sw1 \
>> +    ip=\"2000::fe\" mac=\"f0:00:0f:01:02:fe\"
>> +check_uuid ovn-sbctl create mac_binding datapath=$lr0_dp
> logical_port=lr0-sw1 \
>> +    ip=\"20.0.0.254\" mac=\"f0:00:0f:01:02:fe\"
>> +check ovn-nbctl --wait=hv sync
>> +
>> +AS_BOX([IPv6 - from external to vm0])
>> +packet=$(fmt_pkt "Ether(dst='fa:16:3e:00:00:02',
> src='f0:00:0f:01:02:fe')/ \
>> +                  IPv6(dst='1000::3', src='3000::4', hlim=64)/ \
>> +                  UDP(sport=53, dport=4369)")
>> +as hv
>> +ovs-appctl ofproto/trace br-phys in_port=br-phys_n1 $packet --names >
> ext_ip6_ofproto_trace.txt
>> +check ovs-appctl netdev-dummy/receive br-phys_n1 $packet
>> +
>> +AT_CAPTURE_FILE([ext_ip6_ofproto_trace.txt])
>> +
>> +dnl Make sure the datapath flow doesn't match on a full external address.
>> +AT_CHECK([grep Megaflow ext_ip6_ofproto_trace.txt], [0], [stdout])
>> +AT_CHECK([grep Megaflow ext_ip6_ofproto_trace.txt | grep -q '3000::4'],
> [1])
>> +
>> +dnl Make sure that the packet was received by vm0.  The L2 addresses and
> the
>> +dnl hop limit will be different since the packet was routed.
>> +packet=$(fmt_pkt "Ether(dst='f0:00:0f:01:02:03',
> src='fa:16:3e:00:00:01')/ \
>> +                  IPv6(dst='1000::3', src='3000::4', hlim=63)/ \
>> +                  UDP(sport=53, dport=4369)")
>> +echo $packet >> expected-vif1
>> +OVN_CHECK_PACKETS([hv/vif1-tx.pcap], [expected-vif1])
>> +
>> +AS_BOX([IPv6 - from vm0 to external])
>> +packet=$(fmt_pkt "Ether(dst='fa:16:3e:00:00:01',
> src='f0:00:0f:01:02:03')/ \
>> +                  IPv6(dst='3000::4', src='1000::3', hlim=64)/ \
>> +                  UDP(sport=53, dport=4369)")
>> +as hv
>> +ovs-appctl ofproto/trace br-int in_port=vif1 $packet --names >
> vm0_ip6_ofproto_trace.txt
>> +check ovs-appctl netdev-dummy/receive vif1 $packet
>> +
>> +AT_CAPTURE_FILE([vm0_ip6_ofproto_trace.txt])
>> +
>> +dnl Make sure the datapath flow doesn't match on a full external address.
>> +AT_CHECK([grep Megaflow vm0_ip6_ofproto_trace.txt], [0], [stdout])
>> +AT_CHECK([grep Megaflow vm0_ip6_ofproto_trace.txt | grep -q '3000::4'],
> [1])
>> +
>> +dnl Make sure that the packet was received externally.  The L2 addresses
> and
>> +dnl the hop limit will be different since the packet was routed.
>> +packet=$(fmt_pkt "Ether(dst='f0:00:0f:01:02:fe',
> src='fa:16:3e:00:00:02')/ \
>> +                  IPv6(dst='3000::4', src='1000::3', hlim=63)/ \
>> +                  UDP(sport=53, dport=4369)")
>> +echo $packet >> expected-ext
>> +OVN_CHECK_PACKETS([hv/br-phys_n1-tx.pcap], [expected-ext])
>> +
>> +AS_BOX([IPv4 - from external to vm0])
>> +packet=$(fmt_pkt "Ether(dst='fa:16:3e:00:00:02',
> src='f0:00:0f:01:02:fe')/ \
>> +                  IP(dst='10.0.0.3', src='30.0.0.4', ttl=64)/ \
>> +                  UDP(sport=53, dport=4369)")
>> +as hv
>> +ovs-appctl ofproto/trace br-phys in_port=br-phys_n1 $packet --names >
> ext_ip4_ofproto_trace.txt
>> +check ovs-appctl netdev-dummy/receive br-phys_n1 $packet
>> +
>> +AT_CAPTURE_FILE([ext_ip4_ofproto_trace.txt])
>> +
>> +dnl Make sure the datapath flow doesn't match on a full external address.
>> +AT_CHECK([grep Megaflow ext_ip4_ofproto_trace.txt], [0], [stdout])
>> +AT_CHECK([grep Megaflow ext_ip4_ofproto_trace.txt | grep -q '30.0.0.4'],
> [1])
>> +
>> +dnl Make sure that the packet was received by vm0.  The L2 addresses and
> the
>> +dnl hop limit will be different since the packet was routed.
>> +packet=$(fmt_pkt "Ether(dst='f0:00:0f:01:02:03',
> src='fa:16:3e:00:00:01')/ \
>> +                  IP(dst='10.0.0.3', src='30.0.0.4', ttl=63)/ \
>> +                  UDP(sport=53, dport=4369)")
>> +echo $packet >> expected-vif1
>> +OVN_CHECK_PACKETS([hv/vif1-tx.pcap], [expected-vif1])
>> +
>> +AS_BOX([IPv4 - from vm0 to external])
>> +packet=$(fmt_pkt "Ether(dst='fa:16:3e:00:00:01',
> src='f0:00:0f:01:02:03')/ \
>> +                  IP(dst='30.0.0.4', src='10.0.0.3', ttl=64)/ \
>> +                  UDP(sport=53, dport=4369)")
>> +as hv
>> +ovs-appctl ofproto/trace br-int in_port=vif1 $packet --names >
> vm0_ip4_ofproto_trace.txt
>> +check ovs-appctl netdev-dummy/receive vif1 $packet
>> +
>> +AT_CAPTURE_FILE([vm0_ip4_ofproto_trace.txt])
>> +
>> +dnl Make sure the datapath flow doesn't match on a full external address.
>> +AT_CHECK([grep Megaflow vm0_ip4_ofproto_trace.txt], [0], [stdout])
>> +AT_CHECK([grep Megaflow vm0_ip4_ofproto_trace.txt | grep -q '30.0.0.4'],
> [1])
>> +
>> +dnl Make sure that the packet was received externally.  The L2 addresses
> and
>> +dnl the hop limit will be different since the packet was routed.
>> +packet=$(fmt_pkt "Ether(dst='f0:00:0f:01:02:fe',
> src='fa:16:3e:00:00:02')/ \
>> +                  IP(dst='30.0.0.4', src='10.0.0.3', ttl=63)/ \
>> +                  UDP(sport=53, dport=4369)")
>> +echo $packet >> expected-ext
>> +OVN_CHECK_PACKETS([hv/br-phys_n1-tx.pcap], [expected-ext])
>> +
>> +AT_CLEANUP
>> +])
>>
>>  OVN_FOR_EACH_NORTHD([
>>  AT_SETUP([Multichassis port I-P processing])
>> --
>> 2.50.1
>>
>> _______________________________________________
>> dev mailing list
>> d...@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> 
> Thanks Ilya for the fix.
> 
> I applied to main and backported down to 24.03.
> 

In any case, thanks a lot for the fix and for getting it in, it makes
sense to me too.

> Regards,
> Han
> 

Regards,
Dumitru

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to