On Fri, Sep 12, 2025 at 11:14 PM Mark Michelson via dev < [email protected]> wrote:
> In logical router ingress table 22, we install a couple of conflicting > types of flows. > > First, in build_arp_resolve_flows_for_lsp(), if the logical switch port > is peered with a router, then we install a series of priority 100 flows. > For each logical switch port on the logical switch, we install a flow on > the peered router port. The flow matches on logical outport and next hop > IP address. The flow rewrites the destination MAC to the MAC of the > logical switch port where the IP address is bound. > > match: outport == router_port, next_hop == lsp IP > actions: dst_mac = lsp MAC > > Next, in build_arp_resolve_flows_for_lrp(), if the logical router port > is a distributed gateway port (DGP), and the port has > options:redirect-type=bridged set, then we install a priority 50 flow. > THis flow matches on the logical output port and checks if the DGP is > not chassis-resident. The flow rewrites the destination MAC to the MAC > of the DGP. > > match: outport == DGP, !is_chassis_resident(DGP) > actions: dst_mac = DGP MAC > > On a hypervisor where the output port is the DGP, the DGP is not > chassis-resident, and the next hop IP address is one of the IP addresses > of the peered logical switch, then the priority 100 flow will match, and > we will set the destination MAC to the logical switch port's MAC. > > This logic would be correct so long as we are using an encapsulation > method such as Geneve or VXLAN to send the packet to the hypervisor > where the DGP is bound. In that case, the logical router pipeline can > run partially on the first hypervisor, then the packet can be tunneled > to the hypervisor where the DGP is bound. On the gateway hypervisor, we > can then continue running the logical router pipeline. This allows for > the router features that need to run on the gateway hypervisor to run, > and then the packet can be directed to the appropriate logical switch > after. > > However, if the router has options:redirect-type=bridged set, then this > means instead of tunneling the packet within the logical router > pipeline, we need to use the attached switch's localnet port to > redirect the packet to the gateway hypervisor. In commit 8ba15c3d1084c7 > it was established that the way this is done is to set the > destination MAC address to the DGP's MAC and send the packet over the > attached switch's localnet port to the gateway hypervisor. Once the > packet arrives on the gateway hypervisor, since the destination MAC is > set to the DGP's MAC, the packet will get sent to the logical router to > be processed a second time on the gateway hypervisor. This is what the > priority 50 flow is intended to do. > > Since we are not hitting the priority 50 flow, it means that packets are > being redirected with the wrong destination MAC address. In many cases, > this might be transparent since the packet might end up getting sent out > the correct port in the end. But, if the destination logical switch port > is not bound to the same chassis as the DGP (this is rare, but it > apparently happens), then the packet will end up getting dropped because > the logical switch will receive the packet on the localnet port and > determine that the packet needs to be sent back out the localnet port. > Since the loopback flag never gets set, this results in the packet > getting dropped since the inport and outport are the same. What's worse, > though, is that since the packet never enters the logical router > pipeline on the gateway chassis, it means that router features that are > intended to be invoked on the gateway chassis are not being invoked. > NATs, load balancers, etc. are being skipped. > > The fix presented here is to not install the priority 100 flows in the > case where the router port is a DGP and it has > options:redirect-type=bridged set. This way, we will only ever hit the > priority 50 flow, thus avoiding the issues presented above. > > Reported-at: https://issues.redhat.com/browse/FDP-1454 > Signed-off-by: Mark Michelson <[email protected]> > --- > Hi Mark, thank you for the detailed explanation. I think it all makes sense. Looking at the reported issue it seems that it should be possible to add a test for this. Could you please include v2 with a multinode test illustrating the problem? northd/northd.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/northd/northd.c b/northd/northd.c > index 8b5413ef3..2c496d58d 100644 > --- a/northd/northd.c > +++ b/northd/northd.c > @@ -14674,6 +14674,14 @@ build_arp_resolve_flows_for_lsp( > continue; > } > > + if (lrp_is_l3dgw(peer)) { > + const char *redirect_type = > smap_get(&peer->nbrp->options, > + > "redirect-type"); > + if (redirect_type && !strcasecmp(redirect_type, > "bridged")) { > + continue; > + } > + } > + > if (!lrp_find_member_ip(peer, ip_s)) { > continue; > } > -- > 2.50.1 > > _______________________________________________ > dev mailing list > [email protected] > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > Thanks, Ales _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
