Hi all,

On 7/31/25 1:13 PM, Felix Huettner wrote:
> On Thu, Jul 31, 2025 at 11:53:35AM +0200, Dumitru Ceara wrote:
>> On 7/31/25 11:46 AM, Felix Huettner wrote:
>>> On Thu, Jul 31, 2025 at 11:34:47AM +0200, Dumitru Ceara wrote:
>>>> In 712fca55b3b1 ("controller: Prioritize host routes.") and later in
>>>> cd4ad2f56179 ("northd: Redistribution of NAT/LB routes.") support was
>>>> added for advertising routes for objects (logical port / NAT / LB IPs)
>>>> that are bound to a single chassis (e.g., distributed NAT) with a better
>>>> metric on the chassis where they're bound.  On all other chassis,
>>>> however, the route was still advertised but only with a worse metric.
>>>>
>>>> While this works fine in deployments as described in 712fca55b3b1
>>>> ("controller: Prioritize host routes."), this behavior actually makes
>>>> the dynamic routing feature unusable in cases when all inter-node
>>>> traffic is forwarded through the L3 fabric, e.g. spine-leaf topologies
>>>> with iBGP between leaves and spines and eBGP between OVN compute nodes
>>>> and fabric leafs:  that's due to the fact that eBGP routes are usually
>>>> preferred over iBGP routes.
>>>>
>>>> Consider the following example:
>>>>        +------+        +------+
>>>>        |Spine1|        |Spine2|
>>>>        +------+        +------+
>>>>           |  \         /  |
>>>>           |    \     /    |
>>>>           |      \ /      |
>>>>           |       X       |
>>>>           |     /   \     |
>>>>           |   /       \   |
>>>>           | /           \ |
>>>>        +-----+          +----+
>>>>        |Leaf1|          |Leaf2|
>>>>        +-----+          +----+
>>>>           |                |
>>>>           |                |
>>>>      +----------+     +----------+
>>>>      | Chassis1 |     | Chassis2 |
>>>>      +----------+     +----------+
>>>>
>>>> An OVN distributed NAT, e.g., 42.42.42.42 "bound" to Chassis1 would be
>>>> advertised with a metric of 100 on the eBGP Chassis1 <-> Leaf1
>>>> connection and with a metric of 1000 (worse) on the eBGP Chasssi2 <->
>>>> Leaf2 connection.  Leaf2 will also learn an iBGP route (through Spine1
>>>> and Spine2) for the same prefix (towards Chassis1) but because eBGP
>>>> administrative distance is better than the iBGP one, Leaf2 will always
>>>> prefer the metric 1000 route.  That means Leaf2 will always forward
>>>> traffic destined to 42.42.42.42 via Chassis2 which is sub-optimal.
>>>>
>>>> The main reason for advertising the (NAT) IP on both chassis was likely
>>>> to provide redundancy in case traffic hits the OVN cluster on a node
>>>> that doesn't host the NAT.  But with topologies as the one depicted
>>>> above the redundancy is handled by the fabric.
>>>>
>>>> OVN didn't have a way to disable worse metric route advertisements. This
>>>> commit adds one, through a new logical router / logical router port
>>>> option, "dynamic-routing-redistribute-local-only" which, if enabled,
>>>> informs ovn-controller to not advertise routes for chassis bound IPs
>>>> (Sb.Advertised_Route.tracked_port set) on chassis where the tracked port
>>>> is not bound.  By default this option is disabled.
>>>
>>> Hi Dumitru,
>>>
>>
>> Hi Felix,
>>
>> Thanks for the quick feedback!
>>
>>> thanks for that.
>>>
>>> Back during the implemention i was thinking that this could be done by
>>> using frr (or similar) route maps that only take routes with a specific
>>> metric (100) and ignore all metric 1000 routes.
>>>
>>> However i just realized that this does not work as you can not
>>> differenciate between non-local routes and routes that have no locality
>>> at all.
>>>
>>
>> I think you'd still be able to if the route-map explicitly matched on
>> IPs of routes that "have no locality".  But this falls into the "overly
>> complex" category of workarounds I was hinting at below IMO.
>>
>>> Would it maybe make sense to also separate the metrics for non-local
>>> routes from the routes without tracked_port?
>>>
>>
>> That would be a behavior change at this point, and harder to backport I
>> think.
>>
>>> Then this should also be doable with a normal routing agent and would
>>> not need an ovn change at all. But maybe it still makese sense to have
>>> the feature here.
>>>
> 
> Hi Dumitru,
> 
>>
>> I might be biased but it feels like we'd be putting too much burden (and
>> even more restrictions) on the CMS/routing agent on top of the current
>> setup it has to do to integrate with OVN if we also require a route-map
>> for this specific deployment.  I think I'd prefer this explicit OVN
>> config option instead.
> 
> that sounds also good for me and yes it makes everyones life a lot
> easier.
> 
> Thanks,
> Felix
> 
>>
>>> Thanks,
>>> Felix
>>>
>>
>> Regards,
>> Dumitru
>>
>>>>
>>>> Fixes: 712fca55b3b1 ("controller: Prioritize host routes.")
>>>> Fixes: cd4ad2f56179 ("northd: Redistribution of NAT/LB routes.")
>>>> Reported-at: https://issues.redhat.com/browse/FDP-1464
>>>> Signed-off-by: Dumitru Ceara <dce...@redhat.com>
>>>> ---
>>>> NOTE: while this change adds a new configuration option, I see this as
>>>> a bug fix because there's no (not overly complex) way of using the OVN
>>>> dynamic routing feature in 25.03 with topologies as above (which are
>>>> likely common).
>>>>
>>>> This change is backport-safe because the new option is opt-in (disabled
>>>> by default) so the default behavior stays untouched.  Also, there's no
>>>> restriction on update order as older ovn-northd or ovn-controllers just
>>>> ignore the new option and there's no database schema change.
>>>> ---
>>>>  NEWS                |   5 +++
>>>>  controller/route.c  |  10 ++++-
>>>>  northd/northd.c     |  11 +++++
>>>>  ovn-nb.xml          |  39 +++++++++++++++++
>>>>  ovn-sb.xml          |  18 ++++++++
>>>>  tests/system-ovn.at | 104 ++++++++++++++++++++++++++++++++++++++++++++
>>>>  6 files changed, 186 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/NEWS b/NEWS
>>>> index 0cce1790db..54d676be87 100644
>>>> --- a/NEWS
>>>> +++ b/NEWS
>>>> @@ -41,6 +41,11 @@ Post v25.03.0
>>>>     - Added support for running tests from the 'check-kernel' system test 
>>>> target
>>>>       under retis by setting OVS_TEST_WITH_RETIS=yes.  See the 'Testing' 
>>>> section
>>>>       of the documentation for more details.
>>>> +   - Dynamic Routing:
>>>> +     * Add the option "dynamic-routing-redistribute-local-only" to Logical
>>>> +       Routers and Logical Router Ports which refines the way in which
>>>> +       chassis-specific Advertised_Routes (e.g., for NAT and LB IPs) are
>>>> +       advertised.
>>>>  
>>>>  OVN v25.03.0 - 07 Mar 2025
>>>>  --------------------------
>>>> diff --git a/controller/route.c b/controller/route.c
>>>> index 7615f3f593..603e5749bc 100644
>>>> --- a/controller/route.c
>>>> +++ b/controller/route.c
>>>> @@ -253,15 +253,23 @@ route_run(struct route_ctx_in *r_ctx_in,
>>>>  
>>>>          unsigned int priority = PRIORITY_DEFAULT;
>>>>          if (route->tracked_port) {
>>>> +            bool redistribute_local_bound_only =
>>>> +                smap_get_bool(&route->logical_port->options,
>>>> +                              "dynamic-routing-redistribute-local-only",
>>>> +                              false);
>>>>              if (lport_is_local(r_ctx_in->sbrec_port_binding_by_name,
>>>>                                 r_ctx_in->chassis,
>>>>                                 route->tracked_port->logical_port)) {
>>>>                  priority = PRIORITY_LOCAL_BOUND;
>>>>                  sset_add(r_ctx_out->tracked_ports_local,
>>>>                           route->tracked_port->logical_port);
>>>> -            } else {
>>>> +            } else if (!redistribute_local_bound_only) {
>>>>                  sset_add(r_ctx_out->tracked_ports_remote,
>>>>                           route->tracked_port->logical_port);
>>>> +            } else {
>>>> +                /* Here redistribute_local_bound_only is 'true' and
>>>> +                 * 'tracked_port' is not local so skip this route. */
>>>> +                continue;
>>>>              }
>>>>          }
>>>>  
>>>> diff --git a/northd/northd.c b/northd/northd.c
>>>> index 764575f21e..4acbd2e517 100644
>>>> --- a/northd/northd.c
>>>> +++ b/northd/northd.c
>>>> @@ -4387,6 +4387,17 @@ sync_pb_for_lrp(struct ovn_port *op,
>>>>          if (portname) {
>>>>              smap_add(&new, "dynamic-routing-port-name", portname);
>>>>          }
>>>> +        const char *redistribute_local_only_name =
>>>> +            "dynamic-routing-redistribute-local-only";
>>>> +        bool redistribute_local_only_val =
>>>> +            smap_get_bool(&op->nbrp->options,
>>>> +                          redistribute_local_only_name,
>>>> +                          smap_get_bool(&op->od->nbr->options,
>>>> +                                        redistribute_local_only_name,
>>>> +                                        false));
>>>> +        if (redistribute_local_only_val) {
>>>> +            smap_add(&new, redistribute_local_only_name, "true");
>>>> +        }

While Jakub was testing this patch in his lab he found that will only
set the new option on the "cr-lrp" port binding (for routers with DGP)
while the advertised_route.logical_port actually points to the regular
router port binding.

I think we need to change the way the advertised_route.logical_port
value is set.

In any case, I'll work on a v2 which will include an additional test
case too.

I'm marking this patch as "changes requested" in patchwork.

Regards,
Dumitru

>>>>      }
>>>>  
>>>>      const char *ipv6_pd_list = smap_get(&op->sb->options, 
>>>> "ipv6_ra_pd_list");
>>>> diff --git a/ovn-nb.xml b/ovn-nb.xml
>>>> index 4a75818075..cbe9c40bbe 100644
>>>> --- a/ovn-nb.xml
>>>> +++ b/ovn-nb.xml
>>>> @@ -3192,6 +3192,22 @@ or
>>>>          </p>
>>>>        </column>
>>>>  
>>>> +      <column name="options"
>>>> +          key="dynamic-routing-redistribute-local-only"
>>>> +          type='{"type": "boolean"}'>
>>>> +        <p>
>>>> +          Only relevant if <ref column="options" key="dynamic-routing"/>
>>>> +          is set to <code>true</code>.
>>>> +        </p>
>>>> +
>>>> +        <p>
>>>> +          This controls whether <code>ovn-controller</code> will advertise
>>>> +          <ref table="Advertised_Route" db="OVN_Southbound"/> records
>>>> +          only on the chassis where their <code>tracked_port</code> is
>>>> +          bound.  Default: <code>false</code>.
>>>> +        </p>
>>>> +      </column>
>>>> +
>>>>        <column name="options" key="dynamic-routing-vrf-name"
>>>>            type='{"type": "string"}'>
>>>>          <p>
>>>> @@ -4166,6 +4182,29 @@ or
>>>>  
>>>>        </column>
>>>>  
>>>> +      <column name="options"
>>>> +          key="dynamic-routing-redistribute-local-only"
>>>> +          type='{"type": "boolean"}'>
>>>> +        <p>
>>>> +          Only relevant if <ref column="options" key="dynamic-routing"/>
>>>> +          is set to <code>true</code>.
>>>> +        </p>
>>>> +
>>>> +        <p>
>>>> +          This controls whether <code>ovn-controller</code> will advertise
>>>> +          <ref table="Advertised_Route" db="OVN_Southbound"/> records
>>>> +          only on the chassis where their <code>tracked_port</code> is
>>>> +          bound.
>>>> +        </p>
>>>> +
>>>> +        <p>
>>>> +          If not set the value from <ref column="options"
>>>> +          key="dynamic-routing-redistribute-local-only"
>>>> +          table="Logical_Router"/> on the <ref table="Logical_Router"/> 
>>>> will
>>>> +          be used.
>>>> +        </p>
>>>> +      </column>
>>>> +
>>>>        <column name="options" key="dynamic-routing-maintain-vrf"
>>>>           type='{"type": "boolean"}'>
>>>>          <p>
>>>> diff --git a/ovn-sb.xml b/ovn-sb.xml
>>>> index db5faac661..395deae83d 100644
>>>> --- a/ovn-sb.xml
>>>> +++ b/ovn-sb.xml
>>>> @@ -3908,6 +3908,24 @@ tcp.flags = RST;
>>>>        </column>
>>>>      </group>
>>>>  
>>>> +    <group title="Dynamic Routing">
>>>> +      <column name="options"
>>>> +          key="dynamic-routing-redistribute-local-only"
>>>> +          type='{"type": "boolean"}'>
>>>> +        <p>
>>>> +          Only relevant if <ref column="options" key="dynamic-routing"/>
>>>> +          is set to <code>true</code>.
>>>> +        </p>
>>>> +
>>>> +        <p>
>>>> +          This controls whether <code>ovn-controller</code> will advertise
>>>> +          <ref table="Advertised_Route" db="OVN_Southbound"/> records
>>>> +          only on the chassis where their <code>tracked_port</code> is
>>>> +          bound.  Default: <code>false</code>.
>>>> +        </p>
>>>> +      </column>
>>>> +    </group>
>>>> +
>>>>      <group title="Nested Containers">
>>>>        <p>
>>>>          These columns support containers nested within a VM.  
>>>> Specifically,
>>>> diff --git a/tests/system-ovn.at b/tests/system-ovn.at
>>>> index e0407383af..a86bfa44e0 100644
>>>> --- a/tests/system-ovn.at
>>>> +++ b/tests/system-ovn.at
>>>> @@ -16759,6 +16759,29 @@ OVN_ROUTE_EQUAL([ovnvrf$vrf], [dnl
>>>>  blackhole 10.42.10.10 proto ovn metric 1000
>>>>  blackhole 172.16.1.150 proto ovn metric 1000])
>>>>  
>>>> +# Set LR/LRP.options.dynamic-routing-redistribute-local-only=true
>>>> +# and verify that lower priority routes are not advertised anymore.
>>>> +check ovn-nbctl --wait=hv set logical_router_port r1-join \
>>>> +    options:dynamic-routing-redistribute-local-only=true
>>>> +
>>>> +OVN_ROUTE_EQUAL([ovnvrf$vrf], [dnl
>>>> +blackhole 172.16.1.150 proto ovn metric 1000])
>>>> +
>>>> +check ovn-nbctl --wait=hv remove logical_router_port r1-join     \
>>>> +    options dynamic-routing-redistribute-local-only              \
>>>> +    -- set logical_router R1                                     \
>>>> +            options:dynamic-routing-redistribute-local-only=true
>>>> +
>>>> +OVN_ROUTE_EQUAL([ovnvrf$vrf], [dnl
>>>> +blackhole 172.16.1.150 proto ovn metric 1000])
>>>> +
>>>> +check ovn-nbctl --wait=hv remove logical_router R1  \
>>>> +    options dynamic-routing-redistribute-local-only
>>>> +
>>>> +OVN_ROUTE_EQUAL([ovnvrf$vrf], [dnl
>>>> +blackhole 10.42.10.10 proto ovn metric 1000
>>>> +blackhole 172.16.1.150 proto ovn metric 1000])
>>>> +
>>>>  # Before cleanup of hv1 ovn-controller, trigger a recompute
>>>>  # to cleanup the local datapaths. Otherwise, the test will fail.
>>>>  # This is because we don't remove a datapath from
>>>> @@ -16904,6 +16927,29 @@ OVN_ROUTE_V6_EQUAL([ovnvrf$vrf], [dnl
>>>>  blackhole 2001:db8:1001::150 dev lo proto ovn metric 1000 pref medium
>>>>  blackhole 2001:db8:3001::150 dev lo proto ovn metric 1000 pref medium])
>>>>  
>>>> +# Set LR/LRP.options.dynamic-routing-redistribute-local-only=true
>>>> +# and verify that lower priority routes are not advertised anymore.
>>>> +check ovn-nbctl --wait=hv set logical_router_port r1-join \
>>>> +    options:dynamic-routing-redistribute-local-only=true
>>>> +
>>>> +OVN_ROUTE_V6_EQUAL([ovnvrf$vrf], [dnl
>>>> +blackhole 2001:db8:1001::150 dev lo proto ovn metric 1000 pref medium])
>>>> +
>>>> +check ovn-nbctl --wait=hv remove logical_router_port r1-join     \
>>>> +    options dynamic-routing-redistribute-local-only              \
>>>> +    -- set logical_router R1                                     \
>>>> +            options:dynamic-routing-redistribute-local-only=true
>>>> +
>>>> +OVN_ROUTE_V6_EQUAL([ovnvrf$vrf], [dnl
>>>> +blackhole 2001:db8:1001::150 dev lo proto ovn metric 1000 pref medium])
>>>> +
>>>> +check ovn-nbctl --wait=hv remove logical_router R1  \
>>>> +    options dynamic-routing-redistribute-local-only
>>>> +
>>>> +OVN_ROUTE_V6_EQUAL([ovnvrf$vrf], [dnl
>>>> +blackhole 2001:db8:1001::150 dev lo proto ovn metric 1000 pref medium
>>>> +blackhole 2001:db8:3001::150 dev lo proto ovn metric 1000 pref medium])
>>>> +
>>>>  # Before cleanup of hv1 ovn-controller, trigger a recompute
>>>>  # to cleanup the local datapaths. Otherwise, the test will fail.
>>>>  # This is because we don't remove a datapath from
>>>> @@ -17069,6 +17115,35 @@ blackhole 10.42.10.13 proto ovn metric 1000
>>>>  blackhole 172.16.1.10 proto ovn metric 1000
>>>>  blackhole 172.16.1.11 proto ovn metric 1000])
>>>>  
>>>> +# Set LR/LRP.options.dynamic-routing-redistribute-local-only=true
>>>> +# and verify that lower priority routes are not advertised anymore.
>>>> +check ovn-nbctl --wait=hv set logical_router_port r1-join \
>>>> +    options:dynamic-routing-redistribute-local-only=true
>>>> +
>>>> +OVN_ROUTE_EQUAL([ovnvrf$vrf], [dnl
>>>> +blackhole 172.16.1.10 proto ovn metric 1000
>>>> +blackhole 172.16.1.11 proto ovn metric 1000])
>>>> +
>>>> +check ovn-nbctl --wait=hv remove logical_router_port r1-join     \
>>>> +    options dynamic-routing-redistribute-local-only              \
>>>> +    -- set logical_router R1                                     \
>>>> +            options:dynamic-routing-redistribute-local-only=true
>>>> +
>>>> +OVN_ROUTE_EQUAL([ovnvrf$vrf], [dnl
>>>> +blackhole 172.16.1.10 proto ovn metric 1000
>>>> +blackhole 172.16.1.11 proto ovn metric 1000])
>>>> +
>>>> +check ovn-nbctl --wait=hv remove logical_router R1  \
>>>> +    options dynamic-routing-redistribute-local-only
>>>> +
>>>> +OVN_ROUTE_EQUAL([ovnvrf$vrf], [dnl
>>>> +blackhole 10.42.10.10 proto ovn metric 1000
>>>> +blackhole 10.42.10.11 proto ovn metric 1000
>>>> +blackhole 10.42.10.12 proto ovn metric 1000
>>>> +blackhole 10.42.10.13 proto ovn metric 1000
>>>> +blackhole 172.16.1.10 proto ovn metric 1000
>>>> +blackhole 172.16.1.11 proto ovn metric 1000])
>>>> +
>>>>  # Add "guest" LS connected the distributed router R2 and one "VM" called
>>>>  # guest1.
>>>>  # Also, connect R2 to ls-join via another DGW.
>>>> @@ -17283,6 +17358,35 @@ blackhole 2001:db8:1004::151 dev lo proto ovn 
>>>> metric 1000 pref medium
>>>>  blackhole 2001:db8:1004::152 dev lo proto ovn metric 1000 pref medium
>>>>  blackhole 2001:db8:1004::153 dev lo proto ovn metric 1000 pref medium])
>>>>  
>>>> +# Set LR/LRP.options.dynamic-routing-redistribute-local-only=true
>>>> +# and verify that lower priority routes are not advertised anymore.
>>>> +check ovn-nbctl --wait=hv set logical_router_port r1-join \
>>>> +    options:dynamic-routing-redistribute-local-only=true
>>>> +
>>>> +OVN_ROUTE_V6_EQUAL([ovnvrf$vrf], [dnl
>>>> +blackhole 2001:db8:1003::150 dev lo proto ovn metric 1000 pref medium
>>>> +blackhole 2001:db8:1003::151 dev lo proto ovn metric 1000 pref medium])
>>>> +
>>>> +check ovn-nbctl --wait=hv remove logical_router_port r1-join     \
>>>> +    options dynamic-routing-redistribute-local-only              \
>>>> +    -- set logical_router R1                                     \
>>>> +            options:dynamic-routing-redistribute-local-only=true
>>>> +
>>>> +OVN_ROUTE_V6_EQUAL([ovnvrf$vrf], [dnl
>>>> +blackhole 2001:db8:1003::150 dev lo proto ovn metric 1000 pref medium
>>>> +blackhole 2001:db8:1003::151 dev lo proto ovn metric 1000 pref medium])
>>>> +
>>>> +check ovn-nbctl --wait=hv remove logical_router R1  \
>>>> +    options dynamic-routing-redistribute-local-only
>>>> +
>>>> +OVN_ROUTE_V6_EQUAL([ovnvrf$vrf], [dnl
>>>> +blackhole 2001:db8:1003::150 dev lo proto ovn metric 1000 pref medium
>>>> +blackhole 2001:db8:1003::151 dev lo proto ovn metric 1000 pref medium
>>>> +blackhole 2001:db8:1004::150 dev lo proto ovn metric 1000 pref medium
>>>> +blackhole 2001:db8:1004::151 dev lo proto ovn metric 1000 pref medium
>>>> +blackhole 2001:db8:1004::152 dev lo proto ovn metric 1000 pref medium
>>>> +blackhole 2001:db8:1004::153 dev lo proto ovn metric 1000 pref medium])
>>>> +
>>>>  # Add "guest" LS connected the distributed router R2 and one "VM" called
>>>>  # guest1.
>>>>  # Also, connect R2 to ls-join via another DGW.
>>>> -- 
>>>> 2.50.1
>>>>
>>>
>>
> 

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to