In 712fca55b3b1 ("controller: Prioritize host routes.") and later in
cd4ad2f56179 ("northd: Redistribution of NAT/LB routes.") support was
added for advertising routes for objects (logical port / NAT / LB IPs)
that are bound to a single chassis (e.g., distributed NAT) with a better
metric on the chassis where they're bound.  On all other chassis,
however, the route was still advertised but only with a worse metric.

While this works fine in deployments as described in 712fca55b3b1
("controller: Prioritize host routes."), this behavior actually makes
the dynamic routing feature unusable in cases when all inter-node
traffic is forwarded through the L3 fabric, e.g. spine-leaf topologies
with iBGP between leaves and spines and eBGP between OVN compute nodes
and fabric leafs:  that's due to the fact that eBGP routes are usually
preferred over iBGP routes.

Consider the following example:
       +------+        +------+
       |Spine1|        |Spine2|
       +------+        +------+
          |  \         /  |
          |    \     /    |
          |      \ /      |
          |       X       |
          |     /   \     |
          |   /       \   |
          | /           \ |
       +-----+          +----+
       |Leaf1|          |Leaf2|
       +-----+          +----+
          |                |
          |                |
     +----------+     +----------+
     | Chassis1 |     | Chassis2 |
     +----------+     +----------+

An OVN distributed NAT, e.g., 42.42.42.42 "bound" to Chassis1 would be
advertised with a metric of 100 on the eBGP Chassis1 <-> Leaf1
connection and with a metric of 1000 (worse) on the eBGP Chasssi2 <->
Leaf2 connection.  Leaf2 will also learn an iBGP route (through Spine1
and Spine2) for the same prefix (towards Chassis1) but because eBGP
administrative distance is better than the iBGP one, Leaf2 will always
prefer the metric 1000 route.  That means Leaf2 will always forward
traffic destined to 42.42.42.42 via Chassis2 which is sub-optimal.

The main reason for advertising the (NAT) IP on both chassis was likely
to provide redundancy in case traffic hits the OVN cluster on a node
that doesn't host the NAT.  But with topologies as the one depicted
above the redundancy is handled by the fabric.

OVN didn't have a way to disable worse metric route advertisements. This
commit adds one, through a new logical router / logical router port
option, "dynamic-routing-redistribute-local-only" which, if enabled,
informs ovn-controller to not advertise routes for chassis bound IPs
(Sb.Advertised_Route.tracked_port set) on chassis where the tracked port
is not bound.  By default this option is disabled.

Fixes: 712fca55b3b1 ("controller: Prioritize host routes.")
Fixes: cd4ad2f56179 ("northd: Redistribution of NAT/LB routes.")
Reported-at: https://issues.redhat.com/browse/FDP-1464
Signed-off-by: Dumitru Ceara <dce...@redhat.com>
---
NOTE: while this change adds a new configuration option, I see this as
a bug fix because there's no (not overly complex) way of using the OVN
dynamic routing feature in 25.03 with topologies as above (which are
likely common).

This change is backport-safe because the new option is opt-in (disabled
by default) so the default behavior stays untouched.  Also, there's no
restriction on update order as older ovn-northd or ovn-controllers just
ignore the new option and there's no database schema change.
---
 NEWS                |   5 +++
 controller/route.c  |  10 ++++-
 northd/northd.c     |  11 +++++
 ovn-nb.xml          |  39 +++++++++++++++++
 ovn-sb.xml          |  18 ++++++++
 tests/system-ovn.at | 104 ++++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 186 insertions(+), 1 deletion(-)

diff --git a/NEWS b/NEWS
index 0cce1790db..54d676be87 100644
--- a/NEWS
+++ b/NEWS
@@ -41,6 +41,11 @@ Post v25.03.0
    - Added support for running tests from the 'check-kernel' system test target
      under retis by setting OVS_TEST_WITH_RETIS=yes.  See the 'Testing' section
      of the documentation for more details.
+   - Dynamic Routing:
+     * Add the option "dynamic-routing-redistribute-local-only" to Logical
+       Routers and Logical Router Ports which refines the way in which
+       chassis-specific Advertised_Routes (e.g., for NAT and LB IPs) are
+       advertised.
 
 OVN v25.03.0 - 07 Mar 2025
 --------------------------
diff --git a/controller/route.c b/controller/route.c
index 7615f3f593..603e5749bc 100644
--- a/controller/route.c
+++ b/controller/route.c
@@ -253,15 +253,23 @@ route_run(struct route_ctx_in *r_ctx_in,
 
         unsigned int priority = PRIORITY_DEFAULT;
         if (route->tracked_port) {
+            bool redistribute_local_bound_only =
+                smap_get_bool(&route->logical_port->options,
+                              "dynamic-routing-redistribute-local-only",
+                              false);
             if (lport_is_local(r_ctx_in->sbrec_port_binding_by_name,
                                r_ctx_in->chassis,
                                route->tracked_port->logical_port)) {
                 priority = PRIORITY_LOCAL_BOUND;
                 sset_add(r_ctx_out->tracked_ports_local,
                          route->tracked_port->logical_port);
-            } else {
+            } else if (!redistribute_local_bound_only) {
                 sset_add(r_ctx_out->tracked_ports_remote,
                          route->tracked_port->logical_port);
+            } else {
+                /* Here redistribute_local_bound_only is 'true' and
+                 * 'tracked_port' is not local so skip this route. */
+                continue;
             }
         }
 
diff --git a/northd/northd.c b/northd/northd.c
index 764575f21e..4acbd2e517 100644
--- a/northd/northd.c
+++ b/northd/northd.c
@@ -4387,6 +4387,17 @@ sync_pb_for_lrp(struct ovn_port *op,
         if (portname) {
             smap_add(&new, "dynamic-routing-port-name", portname);
         }
+        const char *redistribute_local_only_name =
+            "dynamic-routing-redistribute-local-only";
+        bool redistribute_local_only_val =
+            smap_get_bool(&op->nbrp->options,
+                          redistribute_local_only_name,
+                          smap_get_bool(&op->od->nbr->options,
+                                        redistribute_local_only_name,
+                                        false));
+        if (redistribute_local_only_val) {
+            smap_add(&new, redistribute_local_only_name, "true");
+        }
     }
 
     const char *ipv6_pd_list = smap_get(&op->sb->options, "ipv6_ra_pd_list");
diff --git a/ovn-nb.xml b/ovn-nb.xml
index 4a75818075..cbe9c40bbe 100644
--- a/ovn-nb.xml
+++ b/ovn-nb.xml
@@ -3192,6 +3192,22 @@ or
         </p>
       </column>
 
+      <column name="options"
+          key="dynamic-routing-redistribute-local-only"
+          type='{"type": "boolean"}'>
+        <p>
+          Only relevant if <ref column="options" key="dynamic-routing"/>
+          is set to <code>true</code>.
+        </p>
+
+        <p>
+          This controls whether <code>ovn-controller</code> will advertise
+          <ref table="Advertised_Route" db="OVN_Southbound"/> records
+          only on the chassis where their <code>tracked_port</code> is
+          bound.  Default: <code>false</code>.
+        </p>
+      </column>
+
       <column name="options" key="dynamic-routing-vrf-name"
           type='{"type": "string"}'>
         <p>
@@ -4166,6 +4182,29 @@ or
 
       </column>
 
+      <column name="options"
+          key="dynamic-routing-redistribute-local-only"
+          type='{"type": "boolean"}'>
+        <p>
+          Only relevant if <ref column="options" key="dynamic-routing"/>
+          is set to <code>true</code>.
+        </p>
+
+        <p>
+          This controls whether <code>ovn-controller</code> will advertise
+          <ref table="Advertised_Route" db="OVN_Southbound"/> records
+          only on the chassis where their <code>tracked_port</code> is
+          bound.
+        </p>
+
+        <p>
+          If not set the value from <ref column="options"
+          key="dynamic-routing-redistribute-local-only"
+          table="Logical_Router"/> on the <ref table="Logical_Router"/> will
+          be used.
+        </p>
+      </column>
+
       <column name="options" key="dynamic-routing-maintain-vrf"
          type='{"type": "boolean"}'>
         <p>
diff --git a/ovn-sb.xml b/ovn-sb.xml
index db5faac661..395deae83d 100644
--- a/ovn-sb.xml
+++ b/ovn-sb.xml
@@ -3908,6 +3908,24 @@ tcp.flags = RST;
       </column>
     </group>
 
+    <group title="Dynamic Routing">
+      <column name="options"
+          key="dynamic-routing-redistribute-local-only"
+          type='{"type": "boolean"}'>
+        <p>
+          Only relevant if <ref column="options" key="dynamic-routing"/>
+          is set to <code>true</code>.
+        </p>
+
+        <p>
+          This controls whether <code>ovn-controller</code> will advertise
+          <ref table="Advertised_Route" db="OVN_Southbound"/> records
+          only on the chassis where their <code>tracked_port</code> is
+          bound.  Default: <code>false</code>.
+        </p>
+      </column>
+    </group>
+
     <group title="Nested Containers">
       <p>
         These columns support containers nested within a VM.  Specifically,
diff --git a/tests/system-ovn.at b/tests/system-ovn.at
index e0407383af..a86bfa44e0 100644
--- a/tests/system-ovn.at
+++ b/tests/system-ovn.at
@@ -16759,6 +16759,29 @@ OVN_ROUTE_EQUAL([ovnvrf$vrf], [dnl
 blackhole 10.42.10.10 proto ovn metric 1000
 blackhole 172.16.1.150 proto ovn metric 1000])
 
+# Set LR/LRP.options.dynamic-routing-redistribute-local-only=true
+# and verify that lower priority routes are not advertised anymore.
+check ovn-nbctl --wait=hv set logical_router_port r1-join \
+    options:dynamic-routing-redistribute-local-only=true
+
+OVN_ROUTE_EQUAL([ovnvrf$vrf], [dnl
+blackhole 172.16.1.150 proto ovn metric 1000])
+
+check ovn-nbctl --wait=hv remove logical_router_port r1-join     \
+    options dynamic-routing-redistribute-local-only              \
+    -- set logical_router R1                                     \
+            options:dynamic-routing-redistribute-local-only=true
+
+OVN_ROUTE_EQUAL([ovnvrf$vrf], [dnl
+blackhole 172.16.1.150 proto ovn metric 1000])
+
+check ovn-nbctl --wait=hv remove logical_router R1  \
+    options dynamic-routing-redistribute-local-only
+
+OVN_ROUTE_EQUAL([ovnvrf$vrf], [dnl
+blackhole 10.42.10.10 proto ovn metric 1000
+blackhole 172.16.1.150 proto ovn metric 1000])
+
 # Before cleanup of hv1 ovn-controller, trigger a recompute
 # to cleanup the local datapaths. Otherwise, the test will fail.
 # This is because we don't remove a datapath from
@@ -16904,6 +16927,29 @@ OVN_ROUTE_V6_EQUAL([ovnvrf$vrf], [dnl
 blackhole 2001:db8:1001::150 dev lo proto ovn metric 1000 pref medium
 blackhole 2001:db8:3001::150 dev lo proto ovn metric 1000 pref medium])
 
+# Set LR/LRP.options.dynamic-routing-redistribute-local-only=true
+# and verify that lower priority routes are not advertised anymore.
+check ovn-nbctl --wait=hv set logical_router_port r1-join \
+    options:dynamic-routing-redistribute-local-only=true
+
+OVN_ROUTE_V6_EQUAL([ovnvrf$vrf], [dnl
+blackhole 2001:db8:1001::150 dev lo proto ovn metric 1000 pref medium])
+
+check ovn-nbctl --wait=hv remove logical_router_port r1-join     \
+    options dynamic-routing-redistribute-local-only              \
+    -- set logical_router R1                                     \
+            options:dynamic-routing-redistribute-local-only=true
+
+OVN_ROUTE_V6_EQUAL([ovnvrf$vrf], [dnl
+blackhole 2001:db8:1001::150 dev lo proto ovn metric 1000 pref medium])
+
+check ovn-nbctl --wait=hv remove logical_router R1  \
+    options dynamic-routing-redistribute-local-only
+
+OVN_ROUTE_V6_EQUAL([ovnvrf$vrf], [dnl
+blackhole 2001:db8:1001::150 dev lo proto ovn metric 1000 pref medium
+blackhole 2001:db8:3001::150 dev lo proto ovn metric 1000 pref medium])
+
 # Before cleanup of hv1 ovn-controller, trigger a recompute
 # to cleanup the local datapaths. Otherwise, the test will fail.
 # This is because we don't remove a datapath from
@@ -17069,6 +17115,35 @@ blackhole 10.42.10.13 proto ovn metric 1000
 blackhole 172.16.1.10 proto ovn metric 1000
 blackhole 172.16.1.11 proto ovn metric 1000])
 
+# Set LR/LRP.options.dynamic-routing-redistribute-local-only=true
+# and verify that lower priority routes are not advertised anymore.
+check ovn-nbctl --wait=hv set logical_router_port r1-join \
+    options:dynamic-routing-redistribute-local-only=true
+
+OVN_ROUTE_EQUAL([ovnvrf$vrf], [dnl
+blackhole 172.16.1.10 proto ovn metric 1000
+blackhole 172.16.1.11 proto ovn metric 1000])
+
+check ovn-nbctl --wait=hv remove logical_router_port r1-join     \
+    options dynamic-routing-redistribute-local-only              \
+    -- set logical_router R1                                     \
+            options:dynamic-routing-redistribute-local-only=true
+
+OVN_ROUTE_EQUAL([ovnvrf$vrf], [dnl
+blackhole 172.16.1.10 proto ovn metric 1000
+blackhole 172.16.1.11 proto ovn metric 1000])
+
+check ovn-nbctl --wait=hv remove logical_router R1  \
+    options dynamic-routing-redistribute-local-only
+
+OVN_ROUTE_EQUAL([ovnvrf$vrf], [dnl
+blackhole 10.42.10.10 proto ovn metric 1000
+blackhole 10.42.10.11 proto ovn metric 1000
+blackhole 10.42.10.12 proto ovn metric 1000
+blackhole 10.42.10.13 proto ovn metric 1000
+blackhole 172.16.1.10 proto ovn metric 1000
+blackhole 172.16.1.11 proto ovn metric 1000])
+
 # Add "guest" LS connected the distributed router R2 and one "VM" called
 # guest1.
 # Also, connect R2 to ls-join via another DGW.
@@ -17283,6 +17358,35 @@ blackhole 2001:db8:1004::151 dev lo proto ovn metric 
1000 pref medium
 blackhole 2001:db8:1004::152 dev lo proto ovn metric 1000 pref medium
 blackhole 2001:db8:1004::153 dev lo proto ovn metric 1000 pref medium])
 
+# Set LR/LRP.options.dynamic-routing-redistribute-local-only=true
+# and verify that lower priority routes are not advertised anymore.
+check ovn-nbctl --wait=hv set logical_router_port r1-join \
+    options:dynamic-routing-redistribute-local-only=true
+
+OVN_ROUTE_V6_EQUAL([ovnvrf$vrf], [dnl
+blackhole 2001:db8:1003::150 dev lo proto ovn metric 1000 pref medium
+blackhole 2001:db8:1003::151 dev lo proto ovn metric 1000 pref medium])
+
+check ovn-nbctl --wait=hv remove logical_router_port r1-join     \
+    options dynamic-routing-redistribute-local-only              \
+    -- set logical_router R1                                     \
+            options:dynamic-routing-redistribute-local-only=true
+
+OVN_ROUTE_V6_EQUAL([ovnvrf$vrf], [dnl
+blackhole 2001:db8:1003::150 dev lo proto ovn metric 1000 pref medium
+blackhole 2001:db8:1003::151 dev lo proto ovn metric 1000 pref medium])
+
+check ovn-nbctl --wait=hv remove logical_router R1  \
+    options dynamic-routing-redistribute-local-only
+
+OVN_ROUTE_V6_EQUAL([ovnvrf$vrf], [dnl
+blackhole 2001:db8:1003::150 dev lo proto ovn metric 1000 pref medium
+blackhole 2001:db8:1003::151 dev lo proto ovn metric 1000 pref medium
+blackhole 2001:db8:1004::150 dev lo proto ovn metric 1000 pref medium
+blackhole 2001:db8:1004::151 dev lo proto ovn metric 1000 pref medium
+blackhole 2001:db8:1004::152 dev lo proto ovn metric 1000 pref medium
+blackhole 2001:db8:1004::153 dev lo proto ovn metric 1000 pref medium])
+
 # Add "guest" LS connected the distributed router R2 and one "VM" called
 # guest1.
 # Also, connect R2 to ls-join via another DGW.
-- 
2.50.1

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to