On 7/31/25 11:46 AM, Felix Huettner wrote:
> On Thu, Jul 31, 2025 at 11:34:47AM +0200, Dumitru Ceara wrote:
>> In 712fca55b3b1 ("controller: Prioritize host routes.") and later in
>> cd4ad2f56179 ("northd: Redistribution of NAT/LB routes.") support was
>> added for advertising routes for objects (logical port / NAT / LB IPs)
>> that are bound to a single chassis (e.g., distributed NAT) with a better
>> metric on the chassis where they're bound.  On all other chassis,
>> however, the route was still advertised but only with a worse metric.
>>
>> While this works fine in deployments as described in 712fca55b3b1
>> ("controller: Prioritize host routes."), this behavior actually makes
>> the dynamic routing feature unusable in cases when all inter-node
>> traffic is forwarded through the L3 fabric, e.g. spine-leaf topologies
>> with iBGP between leaves and spines and eBGP between OVN compute nodes
>> and fabric leafs:  that's due to the fact that eBGP routes are usually
>> preferred over iBGP routes.
>>
>> Consider the following example:
>>        +------+        +------+
>>        |Spine1|        |Spine2|
>>        +------+        +------+
>>           |  \         /  |
>>           |    \     /    |
>>           |      \ /      |
>>           |       X       |
>>           |     /   \     |
>>           |   /       \   |
>>           | /           \ |
>>        +-----+          +----+
>>        |Leaf1|          |Leaf2|
>>        +-----+          +----+
>>           |                |
>>           |                |
>>      +----------+     +----------+
>>      | Chassis1 |     | Chassis2 |
>>      +----------+     +----------+
>>
>> An OVN distributed NAT, e.g., 42.42.42.42 "bound" to Chassis1 would be
>> advertised with a metric of 100 on the eBGP Chassis1 <-> Leaf1
>> connection and with a metric of 1000 (worse) on the eBGP Chasssi2 <->
>> Leaf2 connection.  Leaf2 will also learn an iBGP route (through Spine1
>> and Spine2) for the same prefix (towards Chassis1) but because eBGP
>> administrative distance is better than the iBGP one, Leaf2 will always
>> prefer the metric 1000 route.  That means Leaf2 will always forward
>> traffic destined to 42.42.42.42 via Chassis2 which is sub-optimal.
>>
>> The main reason for advertising the (NAT) IP on both chassis was likely
>> to provide redundancy in case traffic hits the OVN cluster on a node
>> that doesn't host the NAT.  But with topologies as the one depicted
>> above the redundancy is handled by the fabric.
>>
>> OVN didn't have a way to disable worse metric route advertisements. This
>> commit adds one, through a new logical router / logical router port
>> option, "dynamic-routing-redistribute-local-only" which, if enabled,
>> informs ovn-controller to not advertise routes for chassis bound IPs
>> (Sb.Advertised_Route.tracked_port set) on chassis where the tracked port
>> is not bound.  By default this option is disabled.
> 
> Hi Dumitru,
> 

Hi Felix,

Thanks for the quick feedback!

> thanks for that.
> 
> Back during the implemention i was thinking that this could be done by
> using frr (or similar) route maps that only take routes with a specific
> metric (100) and ignore all metric 1000 routes.
> 
> However i just realized that this does not work as you can not
> differenciate between non-local routes and routes that have no locality
> at all.
> 

I think you'd still be able to if the route-map explicitly matched on
IPs of routes that "have no locality".  But this falls into the "overly
complex" category of workarounds I was hinting at below IMO.

> Would it maybe make sense to also separate the metrics for non-local
> routes from the routes without tracked_port?
> 

That would be a behavior change at this point, and harder to backport I
think.

> Then this should also be doable with a normal routing agent and would
> not need an ovn change at all. But maybe it still makese sense to have
> the feature here.
> 

I might be biased but it feels like we'd be putting too much burden (and
even more restrictions) on the CMS/routing agent on top of the current
setup it has to do to integrate with OVN if we also require a route-map
for this specific deployment.  I think I'd prefer this explicit OVN
config option instead.

> Thanks,
> Felix
> 

Regards,
Dumitru

>>
>> Fixes: 712fca55b3b1 ("controller: Prioritize host routes.")
>> Fixes: cd4ad2f56179 ("northd: Redistribution of NAT/LB routes.")
>> Reported-at: https://issues.redhat.com/browse/FDP-1464
>> Signed-off-by: Dumitru Ceara <dce...@redhat.com>
>> ---
>> NOTE: while this change adds a new configuration option, I see this as
>> a bug fix because there's no (not overly complex) way of using the OVN
>> dynamic routing feature in 25.03 with topologies as above (which are
>> likely common).
>>
>> This change is backport-safe because the new option is opt-in (disabled
>> by default) so the default behavior stays untouched.  Also, there's no
>> restriction on update order as older ovn-northd or ovn-controllers just
>> ignore the new option and there's no database schema change.
>> ---
>>  NEWS                |   5 +++
>>  controller/route.c  |  10 ++++-
>>  northd/northd.c     |  11 +++++
>>  ovn-nb.xml          |  39 +++++++++++++++++
>>  ovn-sb.xml          |  18 ++++++++
>>  tests/system-ovn.at | 104 ++++++++++++++++++++++++++++++++++++++++++++
>>  6 files changed, 186 insertions(+), 1 deletion(-)
>>
>> diff --git a/NEWS b/NEWS
>> index 0cce1790db..54d676be87 100644
>> --- a/NEWS
>> +++ b/NEWS
>> @@ -41,6 +41,11 @@ Post v25.03.0
>>     - Added support for running tests from the 'check-kernel' system test 
>> target
>>       under retis by setting OVS_TEST_WITH_RETIS=yes.  See the 'Testing' 
>> section
>>       of the documentation for more details.
>> +   - Dynamic Routing:
>> +     * Add the option "dynamic-routing-redistribute-local-only" to Logical
>> +       Routers and Logical Router Ports which refines the way in which
>> +       chassis-specific Advertised_Routes (e.g., for NAT and LB IPs) are
>> +       advertised.
>>  
>>  OVN v25.03.0 - 07 Mar 2025
>>  --------------------------
>> diff --git a/controller/route.c b/controller/route.c
>> index 7615f3f593..603e5749bc 100644
>> --- a/controller/route.c
>> +++ b/controller/route.c
>> @@ -253,15 +253,23 @@ route_run(struct route_ctx_in *r_ctx_in,
>>  
>>          unsigned int priority = PRIORITY_DEFAULT;
>>          if (route->tracked_port) {
>> +            bool redistribute_local_bound_only =
>> +                smap_get_bool(&route->logical_port->options,
>> +                              "dynamic-routing-redistribute-local-only",
>> +                              false);
>>              if (lport_is_local(r_ctx_in->sbrec_port_binding_by_name,
>>                                 r_ctx_in->chassis,
>>                                 route->tracked_port->logical_port)) {
>>                  priority = PRIORITY_LOCAL_BOUND;
>>                  sset_add(r_ctx_out->tracked_ports_local,
>>                           route->tracked_port->logical_port);
>> -            } else {
>> +            } else if (!redistribute_local_bound_only) {
>>                  sset_add(r_ctx_out->tracked_ports_remote,
>>                           route->tracked_port->logical_port);
>> +            } else {
>> +                /* Here redistribute_local_bound_only is 'true' and
>> +                 * 'tracked_port' is not local so skip this route. */
>> +                continue;
>>              }
>>          }
>>  
>> diff --git a/northd/northd.c b/northd/northd.c
>> index 764575f21e..4acbd2e517 100644
>> --- a/northd/northd.c
>> +++ b/northd/northd.c
>> @@ -4387,6 +4387,17 @@ sync_pb_for_lrp(struct ovn_port *op,
>>          if (portname) {
>>              smap_add(&new, "dynamic-routing-port-name", portname);
>>          }
>> +        const char *redistribute_local_only_name =
>> +            "dynamic-routing-redistribute-local-only";
>> +        bool redistribute_local_only_val =
>> +            smap_get_bool(&op->nbrp->options,
>> +                          redistribute_local_only_name,
>> +                          smap_get_bool(&op->od->nbr->options,
>> +                                        redistribute_local_only_name,
>> +                                        false));
>> +        if (redistribute_local_only_val) {
>> +            smap_add(&new, redistribute_local_only_name, "true");
>> +        }
>>      }
>>  
>>      const char *ipv6_pd_list = smap_get(&op->sb->options, 
>> "ipv6_ra_pd_list");
>> diff --git a/ovn-nb.xml b/ovn-nb.xml
>> index 4a75818075..cbe9c40bbe 100644
>> --- a/ovn-nb.xml
>> +++ b/ovn-nb.xml
>> @@ -3192,6 +3192,22 @@ or
>>          </p>
>>        </column>
>>  
>> +      <column name="options"
>> +          key="dynamic-routing-redistribute-local-only"
>> +          type='{"type": "boolean"}'>
>> +        <p>
>> +          Only relevant if <ref column="options" key="dynamic-routing"/>
>> +          is set to <code>true</code>.
>> +        </p>
>> +
>> +        <p>
>> +          This controls whether <code>ovn-controller</code> will advertise
>> +          <ref table="Advertised_Route" db="OVN_Southbound"/> records
>> +          only on the chassis where their <code>tracked_port</code> is
>> +          bound.  Default: <code>false</code>.
>> +        </p>
>> +      </column>
>> +
>>        <column name="options" key="dynamic-routing-vrf-name"
>>            type='{"type": "string"}'>
>>          <p>
>> @@ -4166,6 +4182,29 @@ or
>>  
>>        </column>
>>  
>> +      <column name="options"
>> +          key="dynamic-routing-redistribute-local-only"
>> +          type='{"type": "boolean"}'>
>> +        <p>
>> +          Only relevant if <ref column="options" key="dynamic-routing"/>
>> +          is set to <code>true</code>.
>> +        </p>
>> +
>> +        <p>
>> +          This controls whether <code>ovn-controller</code> will advertise
>> +          <ref table="Advertised_Route" db="OVN_Southbound"/> records
>> +          only on the chassis where their <code>tracked_port</code> is
>> +          bound.
>> +        </p>
>> +
>> +        <p>
>> +          If not set the value from <ref column="options"
>> +          key="dynamic-routing-redistribute-local-only"
>> +          table="Logical_Router"/> on the <ref table="Logical_Router"/> will
>> +          be used.
>> +        </p>
>> +      </column>
>> +
>>        <column name="options" key="dynamic-routing-maintain-vrf"
>>           type='{"type": "boolean"}'>
>>          <p>
>> diff --git a/ovn-sb.xml b/ovn-sb.xml
>> index db5faac661..395deae83d 100644
>> --- a/ovn-sb.xml
>> +++ b/ovn-sb.xml
>> @@ -3908,6 +3908,24 @@ tcp.flags = RST;
>>        </column>
>>      </group>
>>  
>> +    <group title="Dynamic Routing">
>> +      <column name="options"
>> +          key="dynamic-routing-redistribute-local-only"
>> +          type='{"type": "boolean"}'>
>> +        <p>
>> +          Only relevant if <ref column="options" key="dynamic-routing"/>
>> +          is set to <code>true</code>.
>> +        </p>
>> +
>> +        <p>
>> +          This controls whether <code>ovn-controller</code> will advertise
>> +          <ref table="Advertised_Route" db="OVN_Southbound"/> records
>> +          only on the chassis where their <code>tracked_port</code> is
>> +          bound.  Default: <code>false</code>.
>> +        </p>
>> +      </column>
>> +    </group>
>> +
>>      <group title="Nested Containers">
>>        <p>
>>          These columns support containers nested within a VM.  Specifically,
>> diff --git a/tests/system-ovn.at b/tests/system-ovn.at
>> index e0407383af..a86bfa44e0 100644
>> --- a/tests/system-ovn.at
>> +++ b/tests/system-ovn.at
>> @@ -16759,6 +16759,29 @@ OVN_ROUTE_EQUAL([ovnvrf$vrf], [dnl
>>  blackhole 10.42.10.10 proto ovn metric 1000
>>  blackhole 172.16.1.150 proto ovn metric 1000])
>>  
>> +# Set LR/LRP.options.dynamic-routing-redistribute-local-only=true
>> +# and verify that lower priority routes are not advertised anymore.
>> +check ovn-nbctl --wait=hv set logical_router_port r1-join \
>> +    options:dynamic-routing-redistribute-local-only=true
>> +
>> +OVN_ROUTE_EQUAL([ovnvrf$vrf], [dnl
>> +blackhole 172.16.1.150 proto ovn metric 1000])
>> +
>> +check ovn-nbctl --wait=hv remove logical_router_port r1-join     \
>> +    options dynamic-routing-redistribute-local-only              \
>> +    -- set logical_router R1                                     \
>> +            options:dynamic-routing-redistribute-local-only=true
>> +
>> +OVN_ROUTE_EQUAL([ovnvrf$vrf], [dnl
>> +blackhole 172.16.1.150 proto ovn metric 1000])
>> +
>> +check ovn-nbctl --wait=hv remove logical_router R1  \
>> +    options dynamic-routing-redistribute-local-only
>> +
>> +OVN_ROUTE_EQUAL([ovnvrf$vrf], [dnl
>> +blackhole 10.42.10.10 proto ovn metric 1000
>> +blackhole 172.16.1.150 proto ovn metric 1000])
>> +
>>  # Before cleanup of hv1 ovn-controller, trigger a recompute
>>  # to cleanup the local datapaths. Otherwise, the test will fail.
>>  # This is because we don't remove a datapath from
>> @@ -16904,6 +16927,29 @@ OVN_ROUTE_V6_EQUAL([ovnvrf$vrf], [dnl
>>  blackhole 2001:db8:1001::150 dev lo proto ovn metric 1000 pref medium
>>  blackhole 2001:db8:3001::150 dev lo proto ovn metric 1000 pref medium])
>>  
>> +# Set LR/LRP.options.dynamic-routing-redistribute-local-only=true
>> +# and verify that lower priority routes are not advertised anymore.
>> +check ovn-nbctl --wait=hv set logical_router_port r1-join \
>> +    options:dynamic-routing-redistribute-local-only=true
>> +
>> +OVN_ROUTE_V6_EQUAL([ovnvrf$vrf], [dnl
>> +blackhole 2001:db8:1001::150 dev lo proto ovn metric 1000 pref medium])
>> +
>> +check ovn-nbctl --wait=hv remove logical_router_port r1-join     \
>> +    options dynamic-routing-redistribute-local-only              \
>> +    -- set logical_router R1                                     \
>> +            options:dynamic-routing-redistribute-local-only=true
>> +
>> +OVN_ROUTE_V6_EQUAL([ovnvrf$vrf], [dnl
>> +blackhole 2001:db8:1001::150 dev lo proto ovn metric 1000 pref medium])
>> +
>> +check ovn-nbctl --wait=hv remove logical_router R1  \
>> +    options dynamic-routing-redistribute-local-only
>> +
>> +OVN_ROUTE_V6_EQUAL([ovnvrf$vrf], [dnl
>> +blackhole 2001:db8:1001::150 dev lo proto ovn metric 1000 pref medium
>> +blackhole 2001:db8:3001::150 dev lo proto ovn metric 1000 pref medium])
>> +
>>  # Before cleanup of hv1 ovn-controller, trigger a recompute
>>  # to cleanup the local datapaths. Otherwise, the test will fail.
>>  # This is because we don't remove a datapath from
>> @@ -17069,6 +17115,35 @@ blackhole 10.42.10.13 proto ovn metric 1000
>>  blackhole 172.16.1.10 proto ovn metric 1000
>>  blackhole 172.16.1.11 proto ovn metric 1000])
>>  
>> +# Set LR/LRP.options.dynamic-routing-redistribute-local-only=true
>> +# and verify that lower priority routes are not advertised anymore.
>> +check ovn-nbctl --wait=hv set logical_router_port r1-join \
>> +    options:dynamic-routing-redistribute-local-only=true
>> +
>> +OVN_ROUTE_EQUAL([ovnvrf$vrf], [dnl
>> +blackhole 172.16.1.10 proto ovn metric 1000
>> +blackhole 172.16.1.11 proto ovn metric 1000])
>> +
>> +check ovn-nbctl --wait=hv remove logical_router_port r1-join     \
>> +    options dynamic-routing-redistribute-local-only              \
>> +    -- set logical_router R1                                     \
>> +            options:dynamic-routing-redistribute-local-only=true
>> +
>> +OVN_ROUTE_EQUAL([ovnvrf$vrf], [dnl
>> +blackhole 172.16.1.10 proto ovn metric 1000
>> +blackhole 172.16.1.11 proto ovn metric 1000])
>> +
>> +check ovn-nbctl --wait=hv remove logical_router R1  \
>> +    options dynamic-routing-redistribute-local-only
>> +
>> +OVN_ROUTE_EQUAL([ovnvrf$vrf], [dnl
>> +blackhole 10.42.10.10 proto ovn metric 1000
>> +blackhole 10.42.10.11 proto ovn metric 1000
>> +blackhole 10.42.10.12 proto ovn metric 1000
>> +blackhole 10.42.10.13 proto ovn metric 1000
>> +blackhole 172.16.1.10 proto ovn metric 1000
>> +blackhole 172.16.1.11 proto ovn metric 1000])
>> +
>>  # Add "guest" LS connected the distributed router R2 and one "VM" called
>>  # guest1.
>>  # Also, connect R2 to ls-join via another DGW.
>> @@ -17283,6 +17358,35 @@ blackhole 2001:db8:1004::151 dev lo proto ovn 
>> metric 1000 pref medium
>>  blackhole 2001:db8:1004::152 dev lo proto ovn metric 1000 pref medium
>>  blackhole 2001:db8:1004::153 dev lo proto ovn metric 1000 pref medium])
>>  
>> +# Set LR/LRP.options.dynamic-routing-redistribute-local-only=true
>> +# and verify that lower priority routes are not advertised anymore.
>> +check ovn-nbctl --wait=hv set logical_router_port r1-join \
>> +    options:dynamic-routing-redistribute-local-only=true
>> +
>> +OVN_ROUTE_V6_EQUAL([ovnvrf$vrf], [dnl
>> +blackhole 2001:db8:1003::150 dev lo proto ovn metric 1000 pref medium
>> +blackhole 2001:db8:1003::151 dev lo proto ovn metric 1000 pref medium])
>> +
>> +check ovn-nbctl --wait=hv remove logical_router_port r1-join     \
>> +    options dynamic-routing-redistribute-local-only              \
>> +    -- set logical_router R1                                     \
>> +            options:dynamic-routing-redistribute-local-only=true
>> +
>> +OVN_ROUTE_V6_EQUAL([ovnvrf$vrf], [dnl
>> +blackhole 2001:db8:1003::150 dev lo proto ovn metric 1000 pref medium
>> +blackhole 2001:db8:1003::151 dev lo proto ovn metric 1000 pref medium])
>> +
>> +check ovn-nbctl --wait=hv remove logical_router R1  \
>> +    options dynamic-routing-redistribute-local-only
>> +
>> +OVN_ROUTE_V6_EQUAL([ovnvrf$vrf], [dnl
>> +blackhole 2001:db8:1003::150 dev lo proto ovn metric 1000 pref medium
>> +blackhole 2001:db8:1003::151 dev lo proto ovn metric 1000 pref medium
>> +blackhole 2001:db8:1004::150 dev lo proto ovn metric 1000 pref medium
>> +blackhole 2001:db8:1004::151 dev lo proto ovn metric 1000 pref medium
>> +blackhole 2001:db8:1004::152 dev lo proto ovn metric 1000 pref medium
>> +blackhole 2001:db8:1004::153 dev lo proto ovn metric 1000 pref medium])
>> +
>>  # Add "guest" LS connected the distributed router R2 and one "VM" called
>>  # guest1.
>>  # Also, connect R2 to ls-join via another DGW.
>> -- 
>> 2.50.1
>>
> 

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to