In logical router ingress table 22, we install a couple of conflicting
types of flows.

First, in build_arp_resolve_flows_for_lsp(), if the logical switch port
is peered with a router, then we install a series of priority 100 flows.
For each logical switch port on the logical switch, we install a flow on
the peered router port. The flow matches on logical outport and next hop
IP address. The flow rewrites the destination MAC to the MAC of the
logical switch port where the IP address is bound.

match: outport == router_port, next_hop == lsp IP
actions: dst_mac = lsp MAC

Next, in build_arp_resolve_flows_for_lrp(), if the logical router port
is a distributed gateway port (DGP), and the port has
options:redirect-type=bridged set, then we install a priority 50 flow.
THis flow matches on the logical output port and checks if the DGP is
not chassis-resident. The flow rewrites the destination MAC to the MAC
of the DGP.

match: outport == DGP, !is_chassis_resident(DGP)
actions: dst_mac = DGP MAC

On a hypervisor where the output port is the DGP, the DGP is not
chassis-resident, and the next hop IP address is one of the IP addresses
of the peered logical switch, then the priority 100 flow will match, and
we will set the destination MAC to the logical switch port's MAC.

This logic would be correct so long as we are using an encapsulation
method such as Geneve or VXLAN to send the packet to the hypervisor
where the DGP is bound. In that case, the logical router pipeline can
run partially on the first hypervisor, then the packet can be tunneled
to the hypervisor where the DGP is bound. On the gateway hypervisor, we
can then continue running the logical router pipeline. This allows for
the router features that need to run on the gateway hypervisor to run,
and then the packet can be directed to the appropriate logical switch
after.

However, if the router has options:redirect-type=bridged set, then this
means instead of tunneling the packet within the logical router
pipeline, we need to use the attached switch's localnet port to
redirect the packet to the gateway hypervisor. In commit 8ba15c3d1084c7
it was established that the way this is done is to set the
destination MAC address to the DGP's MAC and send the packet over the
attached switch's localnet port to the gateway hypervisor. Once the
packet arrives on the gateway hypervisor, since the destination MAC is
set to the DGP's MAC, the packet will get sent to the logical router to
be processed a second time on the gateway hypervisor. This is what the
priority 50 flow is intended to do.

Since we are not hitting the priority 50 flow, it means that packets are
being redirected with the wrong destination MAC address. In many cases,
this might be transparent since the packet might end up getting sent out
the correct port in the end. But, if the destination logical switch port
is not bound to the same chassis as the DGP (this is rare, but it
apparently happens), then the packet will end up getting dropped because
the logical switch will receive the packet on the localnet port and
determine that the packet needs to be sent back out the localnet port.
Since the loopback flag never gets set, this results in the packet
getting dropped since the inport and outport are the same. What's worse,
though, is that since the packet never enters the logical router
pipeline on the gateway chassis, it means that router features that are
intended to be invoked on the gateway chassis are not being invoked.
NATs, load balancers, etc. are being skipped.

The fix presented here is to not install the priority 100 flows in the
case where the router port is a DGP and it has
options:redirect-type=bridged set. This way, we will only ever hit the
priority 50 flow, thus avoiding the issues presented above.

Reported-at: https://issues.redhat.com/browse/FDP-1454
Signed-off-by: Mark Michelson <[email protected]>
---
 northd/northd.c    |  8 +++++
 tests/multinode.at | 88 ++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 96 insertions(+)

diff --git a/northd/northd.c b/northd/northd.c
index 8b5413ef3..2c496d58d 100644
--- a/northd/northd.c
+++ b/northd/northd.c
@@ -14674,6 +14674,14 @@ build_arp_resolve_flows_for_lsp(
                         continue;
                     }
 
+                    if (lrp_is_l3dgw(peer)) {
+                        const char *redirect_type = 
smap_get(&peer->nbrp->options,
+                                                            "redirect-type");
+                        if (redirect_type && !strcasecmp(redirect_type, 
"bridged")) {
+                            continue;
+                        }
+                    }
+
                     if (!lrp_find_member_ip(peer, ip_s)) {
                         continue;
                     }
diff --git a/tests/multinode.at b/tests/multinode.at
index f27c1b0bd..959c5ba8a 100644
--- a/tests/multinode.at
+++ b/tests/multinode.at
@@ -3746,3 +3746,91 @@ MAC               Type   Flags Intf/Remote ES/VTEP       
     VLAN  Seq #'s
 ])
 
 AT_CLEANUP
+
+AT_SETUP([redirect-bridged to non-gw destination switch port])
+
+check_fake_multinode_setup
+cleanup_multinode_resources
+# This test uses the following logical network:
+#
+# +-----------+
+# | ls-public |
+# +-----------+
+#       |
+#       |
+# +-----------+
+# | gw-router |
+# +-----------+
+#       |
+#       |
+# +-----------+
+# | ls-local  |
+# +-----------+
+#
+# The router port from gw-router to ls-public is a distributed gateway port.
+# It also has options:redirect-type=bridged set.
+#
+# We create vm1 attached to ls-local that is bound to ovn-chassis-1. We
+# also create vm2 attached to ls-public that is bound to ovn-chassis-1.
+# The DGP on gw-router is bound to ovn-gw-1.
+#
+# Our goal is to successfully ping from vm1 to vm2. In order for this to
+# work, the ping will have to traverse gw-router. vm1 and vm2 are bound to
+# the same chassis, but the DGP is bound to ovn-gw-1. We therefore expect
+# the following:
+#
+# The ping starts by entering ls-local on ovn-chassis-1.
+# The ping then goes through the ingress pipeline of gw-router on 
ovn-chassis-1.
+# The ping then goes through ls-public's localnet port to reach ovn-gw-1.
+# The ping's destination MAC should be the DGP's MAC. So the ping will
+# get processed first by ls-public on ovn-gw-1, then will be redirected to
+# gw-router on ovn-gw-1. The packet will then re-enter ls-public on ovn-gw-1.
+# The ping will then get redirected over the localnet back to ovn-chassis-1.
+# From here, the ls-public pipeline can run and the ping will be output to vm2.
+#
+
+check multinode_nbctl ls-add ls-local
+check multinode_nbctl lsp-add ls-local vm1
+check multinode_nbctl lsp-set-addresses vm1 "00:00:00:00:01:02 10.0.0.2 10::2"
+
+check multinode_nbctl ls-add ls-public
+check multinode_nbctl lsp-add ls-public vm2
+check multinode_nbctl lsp-set-addresses vm2 "00:00:00:00:02:02 20.0.0.2 20::2"
+
+check multinode_nbctl lsp-add ls-public ln-public
+check multinode_nbctl lsp-set-type ln-public localnet
+check multinode_nbctl lsp-set-addresses ln-public unknown
+check multinode_nbctl lsp-set-options ln-public network_name=public
+
+check multinode_nbctl lr-add gw-router
+
+check multinode_nbctl lrp-add gw-router ro-local 00:00:00:00:01:01 10.0.0.1/8 
10::1/64
+check multinode_nbctl lsp-add ls-local local-ro
+check multinode_nbctl lsp-set-type local-ro router
+check multinode_nbctl lsp-set-addresses local-ro router
+check multinode_nbctl lsp-set-options local-ro router-port=ro-local
+
+check multinode_nbctl lrp-add gw-router ro-public 00:00:00:00:02:01 20.0.0.1/8 
20::1/64
+check multinode_nbctl lsp-add ls-public public-ro
+check multinode_nbctl lsp-set-type public-ro router
+check multinode_nbctl lsp-set-addresses public-ro router
+check multinode_nbctl lsp-set-options public-ro router-port=ro-public
+
+check multinode_nbctl lrp-set-gateway-chassis ro-public ovn-gw-1
+check multinode_nbctl lrp-set-redirect-type ro-public bridged
+
+m_as ovn-gw-1 ovs-vsctl set open . 
external-ids:ovn-bridge-mappings=public:br-ex
+m_as ovn-chassis-1 ovs-vsctl set open . 
external-ids:ovn-chassis-mac-mappings="public:aa:bb:cc:dd:01:01"
+m_as ovn-gw-1 ovs-vsctl set open . 
external-ids:ovn-chassis-mac-mappings="public:aa:bb:cc:dd:02:01"
+
+m_as ovn-chassis-1 /data/create_fake_vm.sh vm1 vm1 00:00:00:00:01:02 1342 
10.0.0.2 8 10.0.0.1 10::2/64 10::1
+m_as ovn-chassis-1 /data/create_fake_vm.sh vm2 vm2 00:00:00:00:02:02 1342 
20.0.0.2 8 20.0.0.1 20::2/64 20::1
+
+m_wait_for_ports_up
+
+M_NS_CHECK_EXEC([ovn-chassis-1], [vm1], [ping -q -c 3 -i 0.3 -w 2 20.0.0.2 | 
FORMAT_PING], \
+[0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+AT_CLEANUP
-- 
2.50.1

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to