On Fri, Sep 19, 2025 at 5:13 PM Mark Michelson via dev <
[email protected]> wrote:

> In logical router ingress table 22, we install a couple of conflicting
> types of flows.
>
> First, in build_arp_resolve_flows_for_lsp(), if the logical switch port
> is peered with a router, then we install a series of priority 100 flows.
> For each logical switch port on the logical switch, we install a flow on
> the peered router port. The flow matches on logical outport and next hop
> IP address. The flow rewrites the destination MAC to the MAC of the
> logical switch port where the IP address is bound.
>
> match: outport == router_port, next_hop == lsp IP
> actions: dst_mac = lsp MAC
>
> Next, in build_arp_resolve_flows_for_lrp(), if the logical router port
> is a distributed gateway port (DGP), and the port has
> options:redirect-type=bridged set, then we install a priority 50 flow.
> THis flow matches on the logical output port and checks if the DGP is
> not chassis-resident. The flow rewrites the destination MAC to the MAC
> of the DGP.
>
> match: outport == DGP, !is_chassis_resident(DGP)
> actions: dst_mac = DGP MAC
>
> On a hypervisor where the output port is the DGP, the DGP is not
> chassis-resident, and the next hop IP address is one of the IP addresses
> of the peered logical switch, then the priority 100 flow will match, and
> we will set the destination MAC to the logical switch port's MAC.
>
> This logic would be correct so long as we are using an encapsulation
> method such as Geneve or VXLAN to send the packet to the hypervisor
> where the DGP is bound. In that case, the logical router pipeline can
> run partially on the first hypervisor, then the packet can be tunneled
> to the hypervisor where the DGP is bound. On the gateway hypervisor, we
> can then continue running the logical router pipeline. This allows for
> the router features that need to run on the gateway hypervisor to run,
> and then the packet can be directed to the appropriate logical switch
> after.
>
> However, if the router has options:redirect-type=bridged set, then this
> means instead of tunneling the packet within the logical router
> pipeline, we need to use the attached switch's localnet port to
> redirect the packet to the gateway hypervisor. In commit 8ba15c3d1084c7
> it was established that the way this is done is to set the
> destination MAC address to the DGP's MAC and send the packet over the
> attached switch's localnet port to the gateway hypervisor. Once the
> packet arrives on the gateway hypervisor, since the destination MAC is
> set to the DGP's MAC, the packet will get sent to the logical router to
> be processed a second time on the gateway hypervisor. This is what the
> priority 50 flow is intended to do.
>
> Since we are not hitting the priority 50 flow, it means that packets are
> being redirected with the wrong destination MAC address. In many cases,
> this might be transparent since the packet might end up getting sent out
> the correct port in the end. But, if the destination logical switch port
> is not bound to the same chassis as the DGP (this is rare, but it
> apparently happens), then the packet will end up getting dropped because
> the logical switch will receive the packet on the localnet port and
> determine that the packet needs to be sent back out the localnet port.
> Since the loopback flag never gets set, this results in the packet
> getting dropped since the inport and outport are the same. What's worse,
> though, is that since the packet never enters the logical router
> pipeline on the gateway chassis, it means that router features that are
> intended to be invoked on the gateway chassis are not being invoked.
> NATs, load balancers, etc. are being skipped.
>
> The fix presented here is to not install the priority 100 flows in the
> case where the router port is a DGP and it has
> options:redirect-type=bridged set. This way, we will only ever hit the
> priority 50 flow, thus avoiding the issues presented above.
>
> Reported-at: https://issues.redhat.com/browse/FDP-1454
> Signed-off-by: Mark Michelson <[email protected]>
> ---
>  northd/northd.c    |  8 +++++
>  tests/multinode.at | 88 ++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 96 insertions(+)
>
> diff --git a/northd/northd.c b/northd/northd.c
> index 8b5413ef3..2c496d58d 100644
> --- a/northd/northd.c
> +++ b/northd/northd.c
> @@ -14674,6 +14674,14 @@ build_arp_resolve_flows_for_lsp(
>                          continue;
>                      }
>
> +                    if (lrp_is_l3dgw(peer)) {
> +                        const char *redirect_type =
> smap_get(&peer->nbrp->options,
> +
> "redirect-type");
> +                        if (redirect_type && !strcasecmp(redirect_type,
> "bridged")) {
> +                            continue;
> +                        }
> +                    }
> +
>                      if (!lrp_find_member_ip(peer, ip_s)) {
>                          continue;
>                      }
> diff --git a/tests/multinode.at b/tests/multinode.at
> index f27c1b0bd..959c5ba8a 100644
> --- a/tests/multinode.at
> +++ b/tests/multinode.at
> @@ -3746,3 +3746,91 @@ MAC               Type   Flags Intf/Remote ES/VTEP
>           VLAN  Seq #'s
>  ])
>
>  AT_CLEANUP
> +
> +AT_SETUP([redirect-bridged to non-gw destination switch port])
> +
> +check_fake_multinode_setup
> +cleanup_multinode_resources
> +# This test uses the following logical network:
> +#
> +# +-----------+
> +# | ls-public |
> +# +-----------+
> +#       |
> +#       |
> +# +-----------+
> +# | gw-router |
> +# +-----------+
> +#       |
> +#       |
> +# +-----------+
> +# | ls-local  |
> +# +-----------+
> +#
> +# The router port from gw-router to ls-public is a distributed gateway
> port.
> +# It also has options:redirect-type=bridged set.
> +#
> +# We create vm1 attached to ls-local that is bound to ovn-chassis-1. We
> +# also create vm2 attached to ls-public that is bound to ovn-chassis-1.
> +# The DGP on gw-router is bound to ovn-gw-1.
> +#
> +# Our goal is to successfully ping from vm1 to vm2. In order for this to
> +# work, the ping will have to traverse gw-router. vm1 and vm2 are bound to
> +# the same chassis, but the DGP is bound to ovn-gw-1. We therefore expect
> +# the following:
> +#
> +# The ping starts by entering ls-local on ovn-chassis-1.
> +# The ping then goes through the ingress pipeline of gw-router on
> ovn-chassis-1.
> +# The ping then goes through ls-public's localnet port to reach ovn-gw-1.
> +# The ping's destination MAC should be the DGP's MAC. So the ping will
> +# get processed first by ls-public on ovn-gw-1, then will be redirected to
> +# gw-router on ovn-gw-1. The packet will then re-enter ls-public on
> ovn-gw-1.
> +# The ping will then get redirected over the localnet back to
> ovn-chassis-1.
> +# From here, the ls-public pipeline can run and the ping will be output
> to vm2.
> +#
> +
> +check multinode_nbctl ls-add ls-local
> +check multinode_nbctl lsp-add ls-local vm1
> +check multinode_nbctl lsp-set-addresses vm1 "00:00:00:00:01:02 10.0.0.2
> 10::2"
> +
> +check multinode_nbctl ls-add ls-public
> +check multinode_nbctl lsp-add ls-public vm2
> +check multinode_nbctl lsp-set-addresses vm2 "00:00:00:00:02:02 20.0.0.2
> 20::2"
> +
> +check multinode_nbctl lsp-add ls-public ln-public
> +check multinode_nbctl lsp-set-type ln-public localnet
> +check multinode_nbctl lsp-set-addresses ln-public unknown
> +check multinode_nbctl lsp-set-options ln-public network_name=public
> +
> +check multinode_nbctl lr-add gw-router
> +
> +check multinode_nbctl lrp-add gw-router ro-local 00:00:00:00:01:01
> 10.0.0.1/8 10::1/64
> +check multinode_nbctl lsp-add ls-local local-ro
> +check multinode_nbctl lsp-set-type local-ro router
> +check multinode_nbctl lsp-set-addresses local-ro router
> +check multinode_nbctl lsp-set-options local-ro router-port=ro-local
> +
> +check multinode_nbctl lrp-add gw-router ro-public 00:00:00:00:02:01
> 20.0.0.1/8 20::1/64
> +check multinode_nbctl lsp-add ls-public public-ro
> +check multinode_nbctl lsp-set-type public-ro router
> +check multinode_nbctl lsp-set-addresses public-ro router
> +check multinode_nbctl lsp-set-options public-ro router-port=ro-public
> +
> +check multinode_nbctl lrp-set-gateway-chassis ro-public ovn-gw-1
> +check multinode_nbctl lrp-set-redirect-type ro-public bridged
> +
> +m_as ovn-gw-1 ovs-vsctl set open .
> external-ids:ovn-bridge-mappings=public:br-ex
> +m_as ovn-chassis-1 ovs-vsctl set open .
> external-ids:ovn-chassis-mac-mappings="public:aa:bb:cc:dd:01:01"
> +m_as ovn-gw-1 ovs-vsctl set open .
> external-ids:ovn-chassis-mac-mappings="public:aa:bb:cc:dd:02:01"
> +
> +m_as ovn-chassis-1 /data/create_fake_vm.sh vm1 vm1 00:00:00:00:01:02 1342
> 10.0.0.2 8 10.0.0.1 10::2/64 10::1
> +m_as ovn-chassis-1 /data/create_fake_vm.sh vm2 vm2 00:00:00:00:02:02 1342
> 20.0.0.2 8 20.0.0.1 20::2/64 20::1
> +
> +m_wait_for_ports_up
> +
> +M_NS_CHECK_EXEC([ovn-chassis-1], [vm1], [ping -q -c 3 -i 0.3 -w 2
> 20.0.0.2 | FORMAT_PING], \
> +[0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +
> +AT_CLEANUP
> --
> 2.50.1
>
> _______________________________________________
> dev mailing list
> [email protected]
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
>
Thank you Mark,

I have addressed the 0-day bot comment, merged this into main and
backported all the way down to 24.03.

Regards,
Ales
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to