On Thu, Jul 31, 2025 at 11:53:35AM +0200, Dumitru Ceara wrote:
> On 7/31/25 11:46 AM, Felix Huettner wrote:
> > On Thu, Jul 31, 2025 at 11:34:47AM +0200, Dumitru Ceara wrote:
> >> In 712fca55b3b1 ("controller: Prioritize host routes.") and later in
> >> cd4ad2f56179 ("northd: Redistribution of NAT/LB routes.") support was
> >> added for advertising routes for objects (logical port / NAT / LB IPs)
> >> that are bound to a single chassis (e.g., distributed NAT) with a better
> >> metric on the chassis where they're bound.  On all other chassis,
> >> however, the route was still advertised but only with a worse metric.
> >>
> >> While this works fine in deployments as described in 712fca55b3b1
> >> ("controller: Prioritize host routes."), this behavior actually makes
> >> the dynamic routing feature unusable in cases when all inter-node
> >> traffic is forwarded through the L3 fabric, e.g. spine-leaf topologies
> >> with iBGP between leaves and spines and eBGP between OVN compute nodes
> >> and fabric leafs:  that's due to the fact that eBGP routes are usually
> >> preferred over iBGP routes.
> >>
> >> Consider the following example:
> >>        +------+        +------+
> >>        |Spine1|        |Spine2|
> >>        +------+        +------+
> >>           |  \         /  |
> >>           |    \     /    |
> >>           |      \ /      |
> >>           |       X       |
> >>           |     /   \     |
> >>           |   /       \   |
> >>           | /           \ |
> >>        +-----+          +----+
> >>        |Leaf1|          |Leaf2|
> >>        +-----+          +----+
> >>           |                |
> >>           |                |
> >>      +----------+     +----------+
> >>      | Chassis1 |     | Chassis2 |
> >>      +----------+     +----------+
> >>
> >> An OVN distributed NAT, e.g., 42.42.42.42 "bound" to Chassis1 would be
> >> advertised with a metric of 100 on the eBGP Chassis1 <-> Leaf1
> >> connection and with a metric of 1000 (worse) on the eBGP Chasssi2 <->
> >> Leaf2 connection.  Leaf2 will also learn an iBGP route (through Spine1
> >> and Spine2) for the same prefix (towards Chassis1) but because eBGP
> >> administrative distance is better than the iBGP one, Leaf2 will always
> >> prefer the metric 1000 route.  That means Leaf2 will always forward
> >> traffic destined to 42.42.42.42 via Chassis2 which is sub-optimal.
> >>
> >> The main reason for advertising the (NAT) IP on both chassis was likely
> >> to provide redundancy in case traffic hits the OVN cluster on a node
> >> that doesn't host the NAT.  But with topologies as the one depicted
> >> above the redundancy is handled by the fabric.
> >>
> >> OVN didn't have a way to disable worse metric route advertisements. This
> >> commit adds one, through a new logical router / logical router port
> >> option, "dynamic-routing-redistribute-local-only" which, if enabled,
> >> informs ovn-controller to not advertise routes for chassis bound IPs
> >> (Sb.Advertised_Route.tracked_port set) on chassis where the tracked port
> >> is not bound.  By default this option is disabled.
> > 
> > Hi Dumitru,
> > 
> 
> Hi Felix,
> 
> Thanks for the quick feedback!
> 
> > thanks for that.
> > 
> > Back during the implemention i was thinking that this could be done by
> > using frr (or similar) route maps that only take routes with a specific
> > metric (100) and ignore all metric 1000 routes.
> > 
> > However i just realized that this does not work as you can not
> > differenciate between non-local routes and routes that have no locality
> > at all.
> > 
> 
> I think you'd still be able to if the route-map explicitly matched on
> IPs of routes that "have no locality".  But this falls into the "overly
> complex" category of workarounds I was hinting at below IMO.
> 
> > Would it maybe make sense to also separate the metrics for non-local
> > routes from the routes without tracked_port?
> > 
> 
> That would be a behavior change at this point, and harder to backport I
> think.
> 
> > Then this should also be doable with a normal routing agent and would
> > not need an ovn change at all. But maybe it still makese sense to have
> > the feature here.
> > 

Hi Dumitru,

> 
> I might be biased but it feels like we'd be putting too much burden (and
> even more restrictions) on the CMS/routing agent on top of the current
> setup it has to do to integrate with OVN if we also require a route-map
> for this specific deployment.  I think I'd prefer this explicit OVN
> config option instead.

that sounds also good for me and yes it makes everyones life a lot
easier.

Thanks,
Felix

> 
> > Thanks,
> > Felix
> > 
> 
> Regards,
> Dumitru
> 
> >>
> >> Fixes: 712fca55b3b1 ("controller: Prioritize host routes.")
> >> Fixes: cd4ad2f56179 ("northd: Redistribution of NAT/LB routes.")
> >> Reported-at: https://issues.redhat.com/browse/FDP-1464
> >> Signed-off-by: Dumitru Ceara <dce...@redhat.com>
> >> ---
> >> NOTE: while this change adds a new configuration option, I see this as
> >> a bug fix because there's no (not overly complex) way of using the OVN
> >> dynamic routing feature in 25.03 with topologies as above (which are
> >> likely common).
> >>
> >> This change is backport-safe because the new option is opt-in (disabled
> >> by default) so the default behavior stays untouched.  Also, there's no
> >> restriction on update order as older ovn-northd or ovn-controllers just
> >> ignore the new option and there's no database schema change.
> >> ---
> >>  NEWS                |   5 +++
> >>  controller/route.c  |  10 ++++-
> >>  northd/northd.c     |  11 +++++
> >>  ovn-nb.xml          |  39 +++++++++++++++++
> >>  ovn-sb.xml          |  18 ++++++++
> >>  tests/system-ovn.at | 104 ++++++++++++++++++++++++++++++++++++++++++++
> >>  6 files changed, 186 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/NEWS b/NEWS
> >> index 0cce1790db..54d676be87 100644
> >> --- a/NEWS
> >> +++ b/NEWS
> >> @@ -41,6 +41,11 @@ Post v25.03.0
> >>     - Added support for running tests from the 'check-kernel' system test 
> >> target
> >>       under retis by setting OVS_TEST_WITH_RETIS=yes.  See the 'Testing' 
> >> section
> >>       of the documentation for more details.
> >> +   - Dynamic Routing:
> >> +     * Add the option "dynamic-routing-redistribute-local-only" to Logical
> >> +       Routers and Logical Router Ports which refines the way in which
> >> +       chassis-specific Advertised_Routes (e.g., for NAT and LB IPs) are
> >> +       advertised.
> >>  
> >>  OVN v25.03.0 - 07 Mar 2025
> >>  --------------------------
> >> diff --git a/controller/route.c b/controller/route.c
> >> index 7615f3f593..603e5749bc 100644
> >> --- a/controller/route.c
> >> +++ b/controller/route.c
> >> @@ -253,15 +253,23 @@ route_run(struct route_ctx_in *r_ctx_in,
> >>  
> >>          unsigned int priority = PRIORITY_DEFAULT;
> >>          if (route->tracked_port) {
> >> +            bool redistribute_local_bound_only =
> >> +                smap_get_bool(&route->logical_port->options,
> >> +                              "dynamic-routing-redistribute-local-only",
> >> +                              false);
> >>              if (lport_is_local(r_ctx_in->sbrec_port_binding_by_name,
> >>                                 r_ctx_in->chassis,
> >>                                 route->tracked_port->logical_port)) {
> >>                  priority = PRIORITY_LOCAL_BOUND;
> >>                  sset_add(r_ctx_out->tracked_ports_local,
> >>                           route->tracked_port->logical_port);
> >> -            } else {
> >> +            } else if (!redistribute_local_bound_only) {
> >>                  sset_add(r_ctx_out->tracked_ports_remote,
> >>                           route->tracked_port->logical_port);
> >> +            } else {
> >> +                /* Here redistribute_local_bound_only is 'true' and
> >> +                 * 'tracked_port' is not local so skip this route. */
> >> +                continue;
> >>              }
> >>          }
> >>  
> >> diff --git a/northd/northd.c b/northd/northd.c
> >> index 764575f21e..4acbd2e517 100644
> >> --- a/northd/northd.c
> >> +++ b/northd/northd.c
> >> @@ -4387,6 +4387,17 @@ sync_pb_for_lrp(struct ovn_port *op,
> >>          if (portname) {
> >>              smap_add(&new, "dynamic-routing-port-name", portname);
> >>          }
> >> +        const char *redistribute_local_only_name =
> >> +            "dynamic-routing-redistribute-local-only";
> >> +        bool redistribute_local_only_val =
> >> +            smap_get_bool(&op->nbrp->options,
> >> +                          redistribute_local_only_name,
> >> +                          smap_get_bool(&op->od->nbr->options,
> >> +                                        redistribute_local_only_name,
> >> +                                        false));
> >> +        if (redistribute_local_only_val) {
> >> +            smap_add(&new, redistribute_local_only_name, "true");
> >> +        }
> >>      }
> >>  
> >>      const char *ipv6_pd_list = smap_get(&op->sb->options, 
> >> "ipv6_ra_pd_list");
> >> diff --git a/ovn-nb.xml b/ovn-nb.xml
> >> index 4a75818075..cbe9c40bbe 100644
> >> --- a/ovn-nb.xml
> >> +++ b/ovn-nb.xml
> >> @@ -3192,6 +3192,22 @@ or
> >>          </p>
> >>        </column>
> >>  
> >> +      <column name="options"
> >> +          key="dynamic-routing-redistribute-local-only"
> >> +          type='{"type": "boolean"}'>
> >> +        <p>
> >> +          Only relevant if <ref column="options" key="dynamic-routing"/>
> >> +          is set to <code>true</code>.
> >> +        </p>
> >> +
> >> +        <p>
> >> +          This controls whether <code>ovn-controller</code> will advertise
> >> +          <ref table="Advertised_Route" db="OVN_Southbound"/> records
> >> +          only on the chassis where their <code>tracked_port</code> is
> >> +          bound.  Default: <code>false</code>.
> >> +        </p>
> >> +      </column>
> >> +
> >>        <column name="options" key="dynamic-routing-vrf-name"
> >>            type='{"type": "string"}'>
> >>          <p>
> >> @@ -4166,6 +4182,29 @@ or
> >>  
> >>        </column>
> >>  
> >> +      <column name="options"
> >> +          key="dynamic-routing-redistribute-local-only"
> >> +          type='{"type": "boolean"}'>
> >> +        <p>
> >> +          Only relevant if <ref column="options" key="dynamic-routing"/>
> >> +          is set to <code>true</code>.
> >> +        </p>
> >> +
> >> +        <p>
> >> +          This controls whether <code>ovn-controller</code> will advertise
> >> +          <ref table="Advertised_Route" db="OVN_Southbound"/> records
> >> +          only on the chassis where their <code>tracked_port</code> is
> >> +          bound.
> >> +        </p>
> >> +
> >> +        <p>
> >> +          If not set the value from <ref column="options"
> >> +          key="dynamic-routing-redistribute-local-only"
> >> +          table="Logical_Router"/> on the <ref table="Logical_Router"/> 
> >> will
> >> +          be used.
> >> +        </p>
> >> +      </column>
> >> +
> >>        <column name="options" key="dynamic-routing-maintain-vrf"
> >>           type='{"type": "boolean"}'>
> >>          <p>
> >> diff --git a/ovn-sb.xml b/ovn-sb.xml
> >> index db5faac661..395deae83d 100644
> >> --- a/ovn-sb.xml
> >> +++ b/ovn-sb.xml
> >> @@ -3908,6 +3908,24 @@ tcp.flags = RST;
> >>        </column>
> >>      </group>
> >>  
> >> +    <group title="Dynamic Routing">
> >> +      <column name="options"
> >> +          key="dynamic-routing-redistribute-local-only"
> >> +          type='{"type": "boolean"}'>
> >> +        <p>
> >> +          Only relevant if <ref column="options" key="dynamic-routing"/>
> >> +          is set to <code>true</code>.
> >> +        </p>
> >> +
> >> +        <p>
> >> +          This controls whether <code>ovn-controller</code> will advertise
> >> +          <ref table="Advertised_Route" db="OVN_Southbound"/> records
> >> +          only on the chassis where their <code>tracked_port</code> is
> >> +          bound.  Default: <code>false</code>.
> >> +        </p>
> >> +      </column>
> >> +    </group>
> >> +
> >>      <group title="Nested Containers">
> >>        <p>
> >>          These columns support containers nested within a VM.  
> >> Specifically,
> >> diff --git a/tests/system-ovn.at b/tests/system-ovn.at
> >> index e0407383af..a86bfa44e0 100644
> >> --- a/tests/system-ovn.at
> >> +++ b/tests/system-ovn.at
> >> @@ -16759,6 +16759,29 @@ OVN_ROUTE_EQUAL([ovnvrf$vrf], [dnl
> >>  blackhole 10.42.10.10 proto ovn metric 1000
> >>  blackhole 172.16.1.150 proto ovn metric 1000])
> >>  
> >> +# Set LR/LRP.options.dynamic-routing-redistribute-local-only=true
> >> +# and verify that lower priority routes are not advertised anymore.
> >> +check ovn-nbctl --wait=hv set logical_router_port r1-join \
> >> +    options:dynamic-routing-redistribute-local-only=true
> >> +
> >> +OVN_ROUTE_EQUAL([ovnvrf$vrf], [dnl
> >> +blackhole 172.16.1.150 proto ovn metric 1000])
> >> +
> >> +check ovn-nbctl --wait=hv remove logical_router_port r1-join     \
> >> +    options dynamic-routing-redistribute-local-only              \
> >> +    -- set logical_router R1                                     \
> >> +            options:dynamic-routing-redistribute-local-only=true
> >> +
> >> +OVN_ROUTE_EQUAL([ovnvrf$vrf], [dnl
> >> +blackhole 172.16.1.150 proto ovn metric 1000])
> >> +
> >> +check ovn-nbctl --wait=hv remove logical_router R1  \
> >> +    options dynamic-routing-redistribute-local-only
> >> +
> >> +OVN_ROUTE_EQUAL([ovnvrf$vrf], [dnl
> >> +blackhole 10.42.10.10 proto ovn metric 1000
> >> +blackhole 172.16.1.150 proto ovn metric 1000])
> >> +
> >>  # Before cleanup of hv1 ovn-controller, trigger a recompute
> >>  # to cleanup the local datapaths. Otherwise, the test will fail.
> >>  # This is because we don't remove a datapath from
> >> @@ -16904,6 +16927,29 @@ OVN_ROUTE_V6_EQUAL([ovnvrf$vrf], [dnl
> >>  blackhole 2001:db8:1001::150 dev lo proto ovn metric 1000 pref medium
> >>  blackhole 2001:db8:3001::150 dev lo proto ovn metric 1000 pref medium])
> >>  
> >> +# Set LR/LRP.options.dynamic-routing-redistribute-local-only=true
> >> +# and verify that lower priority routes are not advertised anymore.
> >> +check ovn-nbctl --wait=hv set logical_router_port r1-join \
> >> +    options:dynamic-routing-redistribute-local-only=true
> >> +
> >> +OVN_ROUTE_V6_EQUAL([ovnvrf$vrf], [dnl
> >> +blackhole 2001:db8:1001::150 dev lo proto ovn metric 1000 pref medium])
> >> +
> >> +check ovn-nbctl --wait=hv remove logical_router_port r1-join     \
> >> +    options dynamic-routing-redistribute-local-only              \
> >> +    -- set logical_router R1                                     \
> >> +            options:dynamic-routing-redistribute-local-only=true
> >> +
> >> +OVN_ROUTE_V6_EQUAL([ovnvrf$vrf], [dnl
> >> +blackhole 2001:db8:1001::150 dev lo proto ovn metric 1000 pref medium])
> >> +
> >> +check ovn-nbctl --wait=hv remove logical_router R1  \
> >> +    options dynamic-routing-redistribute-local-only
> >> +
> >> +OVN_ROUTE_V6_EQUAL([ovnvrf$vrf], [dnl
> >> +blackhole 2001:db8:1001::150 dev lo proto ovn metric 1000 pref medium
> >> +blackhole 2001:db8:3001::150 dev lo proto ovn metric 1000 pref medium])
> >> +
> >>  # Before cleanup of hv1 ovn-controller, trigger a recompute
> >>  # to cleanup the local datapaths. Otherwise, the test will fail.
> >>  # This is because we don't remove a datapath from
> >> @@ -17069,6 +17115,35 @@ blackhole 10.42.10.13 proto ovn metric 1000
> >>  blackhole 172.16.1.10 proto ovn metric 1000
> >>  blackhole 172.16.1.11 proto ovn metric 1000])
> >>  
> >> +# Set LR/LRP.options.dynamic-routing-redistribute-local-only=true
> >> +# and verify that lower priority routes are not advertised anymore.
> >> +check ovn-nbctl --wait=hv set logical_router_port r1-join \
> >> +    options:dynamic-routing-redistribute-local-only=true
> >> +
> >> +OVN_ROUTE_EQUAL([ovnvrf$vrf], [dnl
> >> +blackhole 172.16.1.10 proto ovn metric 1000
> >> +blackhole 172.16.1.11 proto ovn metric 1000])
> >> +
> >> +check ovn-nbctl --wait=hv remove logical_router_port r1-join     \
> >> +    options dynamic-routing-redistribute-local-only              \
> >> +    -- set logical_router R1                                     \
> >> +            options:dynamic-routing-redistribute-local-only=true
> >> +
> >> +OVN_ROUTE_EQUAL([ovnvrf$vrf], [dnl
> >> +blackhole 172.16.1.10 proto ovn metric 1000
> >> +blackhole 172.16.1.11 proto ovn metric 1000])
> >> +
> >> +check ovn-nbctl --wait=hv remove logical_router R1  \
> >> +    options dynamic-routing-redistribute-local-only
> >> +
> >> +OVN_ROUTE_EQUAL([ovnvrf$vrf], [dnl
> >> +blackhole 10.42.10.10 proto ovn metric 1000
> >> +blackhole 10.42.10.11 proto ovn metric 1000
> >> +blackhole 10.42.10.12 proto ovn metric 1000
> >> +blackhole 10.42.10.13 proto ovn metric 1000
> >> +blackhole 172.16.1.10 proto ovn metric 1000
> >> +blackhole 172.16.1.11 proto ovn metric 1000])
> >> +
> >>  # Add "guest" LS connected the distributed router R2 and one "VM" called
> >>  # guest1.
> >>  # Also, connect R2 to ls-join via another DGW.
> >> @@ -17283,6 +17358,35 @@ blackhole 2001:db8:1004::151 dev lo proto ovn 
> >> metric 1000 pref medium
> >>  blackhole 2001:db8:1004::152 dev lo proto ovn metric 1000 pref medium
> >>  blackhole 2001:db8:1004::153 dev lo proto ovn metric 1000 pref medium])
> >>  
> >> +# Set LR/LRP.options.dynamic-routing-redistribute-local-only=true
> >> +# and verify that lower priority routes are not advertised anymore.
> >> +check ovn-nbctl --wait=hv set logical_router_port r1-join \
> >> +    options:dynamic-routing-redistribute-local-only=true
> >> +
> >> +OVN_ROUTE_V6_EQUAL([ovnvrf$vrf], [dnl
> >> +blackhole 2001:db8:1003::150 dev lo proto ovn metric 1000 pref medium
> >> +blackhole 2001:db8:1003::151 dev lo proto ovn metric 1000 pref medium])
> >> +
> >> +check ovn-nbctl --wait=hv remove logical_router_port r1-join     \
> >> +    options dynamic-routing-redistribute-local-only              \
> >> +    -- set logical_router R1                                     \
> >> +            options:dynamic-routing-redistribute-local-only=true
> >> +
> >> +OVN_ROUTE_V6_EQUAL([ovnvrf$vrf], [dnl
> >> +blackhole 2001:db8:1003::150 dev lo proto ovn metric 1000 pref medium
> >> +blackhole 2001:db8:1003::151 dev lo proto ovn metric 1000 pref medium])
> >> +
> >> +check ovn-nbctl --wait=hv remove logical_router R1  \
> >> +    options dynamic-routing-redistribute-local-only
> >> +
> >> +OVN_ROUTE_V6_EQUAL([ovnvrf$vrf], [dnl
> >> +blackhole 2001:db8:1003::150 dev lo proto ovn metric 1000 pref medium
> >> +blackhole 2001:db8:1003::151 dev lo proto ovn metric 1000 pref medium
> >> +blackhole 2001:db8:1004::150 dev lo proto ovn metric 1000 pref medium
> >> +blackhole 2001:db8:1004::151 dev lo proto ovn metric 1000 pref medium
> >> +blackhole 2001:db8:1004::152 dev lo proto ovn metric 1000 pref medium
> >> +blackhole 2001:db8:1004::153 dev lo proto ovn metric 1000 pref medium])
> >> +
> >>  # Add "guest" LS connected the distributed router R2 and one "VM" called
> >>  # guest1.
> >>  # Also, connect R2 to ls-join via another DGW.
> >> -- 
> >> 2.50.1
> >>
> > 
> 
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to