In logical router ingress table 22, we install a couple of conflicting types of flows.
First, in build_arp_resolve_flows_for_lsp(), if the logical switch port is peered with a router, then we install a series of priority 100 flows. For each logical switch port on the logical switch, we install a flow on the peered router port. The flow matches on logical outport and next hop IP address. The flow rewrites the destination MAC to the MAC of the logical switch port where the IP address is bound. match: outport == router_port, next_hop == lsp IP actions: dst_mac = lsp MAC Next, in build_arp_resolve_flows_for_lrp(), if the logical router port is a distributed gateway port (DGP), and the port has options:redirect-type=bridged set, then we install a priority 50 flow. THis flow matches on the logical output port and checks if the DGP is not chassis-resident. The flow rewrites the destination MAC to the MAC of the DGP. match: outport == DGP, !is_chassis_resident(DGP) actions: dst_mac = DGP MAC On a hypervisor where the output port is the DGP, the DGP is not chassis-resident, and the next hop IP address is one of the IP addresses of the peered logical switch, then the priority 100 flow will match, and we will set the destination MAC to the logical switch port's MAC. This logic would be correct so long as we are using an encapsulation method such as Geneve or VXLAN to send the packet to the hypervisor where the DGP is bound. In that case, the logical router pipeline can run partially on the first hypervisor, then the packet can be tunneled to the hypervisor where the DGP is bound. On the gateway hypervisor, we can then continue running the logical router pipeline. This allows for the router features that need to run on the gateway hypervisor to run, and then the packet can be directed to the appropriate logical switch after. However, if the router has options:redirect-type=bridged set, then this means instead of tunneling the packet within the logical router pipeline, we need to use the attached switch's localnet port to redirect the packet to the gateway hypervisor. In commit 8ba15c3d1084c7 it was established that the way this is done is to set the destination MAC address to the DGP's MAC and send the packet over the attached switch's localnet port to the gateway hypervisor. Once the packet arrives on the gateway hypervisor, since the destination MAC is set to the DGP's MAC, the packet will get sent to the logical router to be processed a second time on the gateway hypervisor. This is what the priority 50 flow is intended to do. Since we are not hitting the priority 50 flow, it means that packets are being redirected with the wrong destination MAC address. In many cases, this might be transparent since the packet might end up getting sent out the correct port in the end. But, if the destination logical switch port is not bound to the same chassis as the DGP (this is rare, but it apparently happens), then the packet will end up getting dropped because the logical switch will receive the packet on the localnet port and determine that the packet needs to be sent back out the localnet port. Since the loopback flag never gets set, this results in the packet getting dropped since the inport and outport are the same. What's worse, though, is that since the packet never enters the logical router pipeline on the gateway chassis, it means that router features that are intended to be invoked on the gateway chassis are not being invoked. NATs, load balancers, etc. are being skipped. The fix presented here is to not install the priority 100 flows in the case where the router port is a DGP and it has options:redirect-type=bridged set. This way, we will only ever hit the priority 50 flow, thus avoiding the issues presented above. Reported-at: https://issues.redhat.com/browse/FDP-1454 Signed-off-by: Mark Michelson <[email protected]> --- northd/northd.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/northd/northd.c b/northd/northd.c index 8b5413ef3..2c496d58d 100644 --- a/northd/northd.c +++ b/northd/northd.c @@ -14674,6 +14674,14 @@ build_arp_resolve_flows_for_lsp( continue; } + if (lrp_is_l3dgw(peer)) { + const char *redirect_type = smap_get(&peer->nbrp->options, + "redirect-type"); + if (redirect_type && !strcasecmp(redirect_type, "bridged")) { + continue; + } + } + if (!lrp_find_member_ip(peer, ip_s)) { continue; } -- 2.50.1 _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
