On Fri, Sep 12, 2025 at 11:14 PM Mark Michelson via dev <
[email protected]> wrote:

> In logical router ingress table 22, we install a couple of conflicting
> types of flows.
>
> First, in build_arp_resolve_flows_for_lsp(), if the logical switch port
> is peered with a router, then we install a series of priority 100 flows.
> For each logical switch port on the logical switch, we install a flow on
> the peered router port. The flow matches on logical outport and next hop
> IP address. The flow rewrites the destination MAC to the MAC of the
> logical switch port where the IP address is bound.
>
> match: outport == router_port, next_hop == lsp IP
> actions: dst_mac = lsp MAC
>
> Next, in build_arp_resolve_flows_for_lrp(), if the logical router port
> is a distributed gateway port (DGP), and the port has
> options:redirect-type=bridged set, then we install a priority 50 flow.
> THis flow matches on the logical output port and checks if the DGP is
> not chassis-resident. The flow rewrites the destination MAC to the MAC
> of the DGP.
>
> match: outport == DGP, !is_chassis_resident(DGP)
> actions: dst_mac = DGP MAC
>
> On a hypervisor where the output port is the DGP, the DGP is not
> chassis-resident, and the next hop IP address is one of the IP addresses
> of the peered logical switch, then the priority 100 flow will match, and
> we will set the destination MAC to the logical switch port's MAC.
>
> This logic would be correct so long as we are using an encapsulation
> method such as Geneve or VXLAN to send the packet to the hypervisor
> where the DGP is bound. In that case, the logical router pipeline can
> run partially on the first hypervisor, then the packet can be tunneled
> to the hypervisor where the DGP is bound. On the gateway hypervisor, we
> can then continue running the logical router pipeline. This allows for
> the router features that need to run on the gateway hypervisor to run,
> and then the packet can be directed to the appropriate logical switch
> after.
>
> However, if the router has options:redirect-type=bridged set, then this
> means instead of tunneling the packet within the logical router
> pipeline, we need to use the attached switch's localnet port to
> redirect the packet to the gateway hypervisor. In commit 8ba15c3d1084c7
> it was established that the way this is done is to set the
> destination MAC address to the DGP's MAC and send the packet over the
> attached switch's localnet port to the gateway hypervisor. Once the
> packet arrives on the gateway hypervisor, since the destination MAC is
> set to the DGP's MAC, the packet will get sent to the logical router to
> be processed a second time on the gateway hypervisor. This is what the
> priority 50 flow is intended to do.
>
> Since we are not hitting the priority 50 flow, it means that packets are
> being redirected with the wrong destination MAC address. In many cases,
> this might be transparent since the packet might end up getting sent out
> the correct port in the end. But, if the destination logical switch port
> is not bound to the same chassis as the DGP (this is rare, but it
> apparently happens), then the packet will end up getting dropped because
> the logical switch will receive the packet on the localnet port and
> determine that the packet needs to be sent back out the localnet port.
> Since the loopback flag never gets set, this results in the packet
> getting dropped since the inport and outport are the same. What's worse,
> though, is that since the packet never enters the logical router
> pipeline on the gateway chassis, it means that router features that are
> intended to be invoked on the gateway chassis are not being invoked.
> NATs, load balancers, etc. are being skipped.
>
> The fix presented here is to not install the priority 100 flows in the
> case where the router port is a DGP and it has
> options:redirect-type=bridged set. This way, we will only ever hit the
> priority 50 flow, thus avoiding the issues presented above.
>
> Reported-at: https://issues.redhat.com/browse/FDP-1454
> Signed-off-by: Mark Michelson <[email protected]>
> ---
>

Hi Mark,

thank you for the detailed explanation. I think it all makes sense.
Looking at the reported issue it seems that it should be possible
to add a test for this. Could you please include v2 with a multinode
test illustrating the problem?

 northd/northd.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
>
> diff --git a/northd/northd.c b/northd/northd.c
> index 8b5413ef3..2c496d58d 100644
> --- a/northd/northd.c
> +++ b/northd/northd.c
> @@ -14674,6 +14674,14 @@ build_arp_resolve_flows_for_lsp(
>                          continue;
>                      }
>
> +                    if (lrp_is_l3dgw(peer)) {
> +                        const char *redirect_type =
> smap_get(&peer->nbrp->options,
> +
> "redirect-type");
> +                        if (redirect_type && !strcasecmp(redirect_type,
> "bridged")) {
> +                            continue;
> +                        }
> +                    }
> +
>                      if (!lrp_find_member_ip(peer, ip_s)) {
>                          continue;
>                      }
> --
> 2.50.1
>
> _______________________________________________
> dev mailing list
> [email protected]
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
>
Thanks,
Ales
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to