On Fri, Sep 19, 2025 at 5:13 PM Mark Michelson via dev < [email protected]> wrote:
> In logical router ingress table 22, we install a couple of conflicting > types of flows. > > First, in build_arp_resolve_flows_for_lsp(), if the logical switch port > is peered with a router, then we install a series of priority 100 flows. > For each logical switch port on the logical switch, we install a flow on > the peered router port. The flow matches on logical outport and next hop > IP address. The flow rewrites the destination MAC to the MAC of the > logical switch port where the IP address is bound. > > match: outport == router_port, next_hop == lsp IP > actions: dst_mac = lsp MAC > > Next, in build_arp_resolve_flows_for_lrp(), if the logical router port > is a distributed gateway port (DGP), and the port has > options:redirect-type=bridged set, then we install a priority 50 flow. > THis flow matches on the logical output port and checks if the DGP is > not chassis-resident. The flow rewrites the destination MAC to the MAC > of the DGP. > > match: outport == DGP, !is_chassis_resident(DGP) > actions: dst_mac = DGP MAC > > On a hypervisor where the output port is the DGP, the DGP is not > chassis-resident, and the next hop IP address is one of the IP addresses > of the peered logical switch, then the priority 100 flow will match, and > we will set the destination MAC to the logical switch port's MAC. > > This logic would be correct so long as we are using an encapsulation > method such as Geneve or VXLAN to send the packet to the hypervisor > where the DGP is bound. In that case, the logical router pipeline can > run partially on the first hypervisor, then the packet can be tunneled > to the hypervisor where the DGP is bound. On the gateway hypervisor, we > can then continue running the logical router pipeline. This allows for > the router features that need to run on the gateway hypervisor to run, > and then the packet can be directed to the appropriate logical switch > after. > > However, if the router has options:redirect-type=bridged set, then this > means instead of tunneling the packet within the logical router > pipeline, we need to use the attached switch's localnet port to > redirect the packet to the gateway hypervisor. In commit 8ba15c3d1084c7 > it was established that the way this is done is to set the > destination MAC address to the DGP's MAC and send the packet over the > attached switch's localnet port to the gateway hypervisor. Once the > packet arrives on the gateway hypervisor, since the destination MAC is > set to the DGP's MAC, the packet will get sent to the logical router to > be processed a second time on the gateway hypervisor. This is what the > priority 50 flow is intended to do. > > Since we are not hitting the priority 50 flow, it means that packets are > being redirected with the wrong destination MAC address. In many cases, > this might be transparent since the packet might end up getting sent out > the correct port in the end. But, if the destination logical switch port > is not bound to the same chassis as the DGP (this is rare, but it > apparently happens), then the packet will end up getting dropped because > the logical switch will receive the packet on the localnet port and > determine that the packet needs to be sent back out the localnet port. > Since the loopback flag never gets set, this results in the packet > getting dropped since the inport and outport are the same. What's worse, > though, is that since the packet never enters the logical router > pipeline on the gateway chassis, it means that router features that are > intended to be invoked on the gateway chassis are not being invoked. > NATs, load balancers, etc. are being skipped. > > The fix presented here is to not install the priority 100 flows in the > case where the router port is a DGP and it has > options:redirect-type=bridged set. This way, we will only ever hit the > priority 50 flow, thus avoiding the issues presented above. > > Reported-at: https://issues.redhat.com/browse/FDP-1454 > Signed-off-by: Mark Michelson <[email protected]> > --- > northd/northd.c | 8 +++++ > tests/multinode.at | 88 ++++++++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 96 insertions(+) > > diff --git a/northd/northd.c b/northd/northd.c > index 8b5413ef3..2c496d58d 100644 > --- a/northd/northd.c > +++ b/northd/northd.c > @@ -14674,6 +14674,14 @@ build_arp_resolve_flows_for_lsp( > continue; > } > > + if (lrp_is_l3dgw(peer)) { > + const char *redirect_type = > smap_get(&peer->nbrp->options, > + > "redirect-type"); > + if (redirect_type && !strcasecmp(redirect_type, > "bridged")) { > + continue; > + } > + } > + > if (!lrp_find_member_ip(peer, ip_s)) { > continue; > } > diff --git a/tests/multinode.at b/tests/multinode.at > index f27c1b0bd..959c5ba8a 100644 > --- a/tests/multinode.at > +++ b/tests/multinode.at > @@ -3746,3 +3746,91 @@ MAC Type Flags Intf/Remote ES/VTEP > VLAN Seq #'s > ]) > > AT_CLEANUP > + > +AT_SETUP([redirect-bridged to non-gw destination switch port]) > + > +check_fake_multinode_setup > +cleanup_multinode_resources > +# This test uses the following logical network: > +# > +# +-----------+ > +# | ls-public | > +# +-----------+ > +# | > +# | > +# +-----------+ > +# | gw-router | > +# +-----------+ > +# | > +# | > +# +-----------+ > +# | ls-local | > +# +-----------+ > +# > +# The router port from gw-router to ls-public is a distributed gateway > port. > +# It also has options:redirect-type=bridged set. > +# > +# We create vm1 attached to ls-local that is bound to ovn-chassis-1. We > +# also create vm2 attached to ls-public that is bound to ovn-chassis-1. > +# The DGP on gw-router is bound to ovn-gw-1. > +# > +# Our goal is to successfully ping from vm1 to vm2. In order for this to > +# work, the ping will have to traverse gw-router. vm1 and vm2 are bound to > +# the same chassis, but the DGP is bound to ovn-gw-1. We therefore expect > +# the following: > +# > +# The ping starts by entering ls-local on ovn-chassis-1. > +# The ping then goes through the ingress pipeline of gw-router on > ovn-chassis-1. > +# The ping then goes through ls-public's localnet port to reach ovn-gw-1. > +# The ping's destination MAC should be the DGP's MAC. So the ping will > +# get processed first by ls-public on ovn-gw-1, then will be redirected to > +# gw-router on ovn-gw-1. The packet will then re-enter ls-public on > ovn-gw-1. > +# The ping will then get redirected over the localnet back to > ovn-chassis-1. > +# From here, the ls-public pipeline can run and the ping will be output > to vm2. > +# > + > +check multinode_nbctl ls-add ls-local > +check multinode_nbctl lsp-add ls-local vm1 > +check multinode_nbctl lsp-set-addresses vm1 "00:00:00:00:01:02 10.0.0.2 > 10::2" > + > +check multinode_nbctl ls-add ls-public > +check multinode_nbctl lsp-add ls-public vm2 > +check multinode_nbctl lsp-set-addresses vm2 "00:00:00:00:02:02 20.0.0.2 > 20::2" > + > +check multinode_nbctl lsp-add ls-public ln-public > +check multinode_nbctl lsp-set-type ln-public localnet > +check multinode_nbctl lsp-set-addresses ln-public unknown > +check multinode_nbctl lsp-set-options ln-public network_name=public > + > +check multinode_nbctl lr-add gw-router > + > +check multinode_nbctl lrp-add gw-router ro-local 00:00:00:00:01:01 > 10.0.0.1/8 10::1/64 > +check multinode_nbctl lsp-add ls-local local-ro > +check multinode_nbctl lsp-set-type local-ro router > +check multinode_nbctl lsp-set-addresses local-ro router > +check multinode_nbctl lsp-set-options local-ro router-port=ro-local > + > +check multinode_nbctl lrp-add gw-router ro-public 00:00:00:00:02:01 > 20.0.0.1/8 20::1/64 > +check multinode_nbctl lsp-add ls-public public-ro > +check multinode_nbctl lsp-set-type public-ro router > +check multinode_nbctl lsp-set-addresses public-ro router > +check multinode_nbctl lsp-set-options public-ro router-port=ro-public > + > +check multinode_nbctl lrp-set-gateway-chassis ro-public ovn-gw-1 > +check multinode_nbctl lrp-set-redirect-type ro-public bridged > + > +m_as ovn-gw-1 ovs-vsctl set open . > external-ids:ovn-bridge-mappings=public:br-ex > +m_as ovn-chassis-1 ovs-vsctl set open . > external-ids:ovn-chassis-mac-mappings="public:aa:bb:cc:dd:01:01" > +m_as ovn-gw-1 ovs-vsctl set open . > external-ids:ovn-chassis-mac-mappings="public:aa:bb:cc:dd:02:01" > + > +m_as ovn-chassis-1 /data/create_fake_vm.sh vm1 vm1 00:00:00:00:01:02 1342 > 10.0.0.2 8 10.0.0.1 10::2/64 10::1 > +m_as ovn-chassis-1 /data/create_fake_vm.sh vm2 vm2 00:00:00:00:02:02 1342 > 20.0.0.2 8 20.0.0.1 20::2/64 20::1 > + > +m_wait_for_ports_up > + > +M_NS_CHECK_EXEC([ovn-chassis-1], [vm1], [ping -q -c 3 -i 0.3 -w 2 > 20.0.0.2 | FORMAT_PING], \ > +[0], [dnl > +3 packets transmitted, 3 received, 0% packet loss, time 0ms > +]) > + > +AT_CLEANUP > -- > 2.50.1 > > _______________________________________________ > dev mailing list > [email protected] > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > Thank you Mark, I have addressed the 0-day bot comment, merged this into main and backported all the way down to 24.03. Regards, Ales _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
