Hi all, On 7/31/25 1:13 PM, Felix Huettner wrote: > On Thu, Jul 31, 2025 at 11:53:35AM +0200, Dumitru Ceara wrote: >> On 7/31/25 11:46 AM, Felix Huettner wrote: >>> On Thu, Jul 31, 2025 at 11:34:47AM +0200, Dumitru Ceara wrote: >>>> In 712fca55b3b1 ("controller: Prioritize host routes.") and later in >>>> cd4ad2f56179 ("northd: Redistribution of NAT/LB routes.") support was >>>> added for advertising routes for objects (logical port / NAT / LB IPs) >>>> that are bound to a single chassis (e.g., distributed NAT) with a better >>>> metric on the chassis where they're bound. On all other chassis, >>>> however, the route was still advertised but only with a worse metric. >>>> >>>> While this works fine in deployments as described in 712fca55b3b1 >>>> ("controller: Prioritize host routes."), this behavior actually makes >>>> the dynamic routing feature unusable in cases when all inter-node >>>> traffic is forwarded through the L3 fabric, e.g. spine-leaf topologies >>>> with iBGP between leaves and spines and eBGP between OVN compute nodes >>>> and fabric leafs: that's due to the fact that eBGP routes are usually >>>> preferred over iBGP routes. >>>> >>>> Consider the following example: >>>> +------+ +------+ >>>> |Spine1| |Spine2| >>>> +------+ +------+ >>>> | \ / | >>>> | \ / | >>>> | \ / | >>>> | X | >>>> | / \ | >>>> | / \ | >>>> | / \ | >>>> +-----+ +----+ >>>> |Leaf1| |Leaf2| >>>> +-----+ +----+ >>>> | | >>>> | | >>>> +----------+ +----------+ >>>> | Chassis1 | | Chassis2 | >>>> +----------+ +----------+ >>>> >>>> An OVN distributed NAT, e.g., 42.42.42.42 "bound" to Chassis1 would be >>>> advertised with a metric of 100 on the eBGP Chassis1 <-> Leaf1 >>>> connection and with a metric of 1000 (worse) on the eBGP Chasssi2 <-> >>>> Leaf2 connection. Leaf2 will also learn an iBGP route (through Spine1 >>>> and Spine2) for the same prefix (towards Chassis1) but because eBGP >>>> administrative distance is better than the iBGP one, Leaf2 will always >>>> prefer the metric 1000 route. That means Leaf2 will always forward >>>> traffic destined to 42.42.42.42 via Chassis2 which is sub-optimal. >>>> >>>> The main reason for advertising the (NAT) IP on both chassis was likely >>>> to provide redundancy in case traffic hits the OVN cluster on a node >>>> that doesn't host the NAT. But with topologies as the one depicted >>>> above the redundancy is handled by the fabric. >>>> >>>> OVN didn't have a way to disable worse metric route advertisements. This >>>> commit adds one, through a new logical router / logical router port >>>> option, "dynamic-routing-redistribute-local-only" which, if enabled, >>>> informs ovn-controller to not advertise routes for chassis bound IPs >>>> (Sb.Advertised_Route.tracked_port set) on chassis where the tracked port >>>> is not bound. By default this option is disabled. >>> >>> Hi Dumitru, >>> >> >> Hi Felix, >> >> Thanks for the quick feedback! >> >>> thanks for that. >>> >>> Back during the implemention i was thinking that this could be done by >>> using frr (or similar) route maps that only take routes with a specific >>> metric (100) and ignore all metric 1000 routes. >>> >>> However i just realized that this does not work as you can not >>> differenciate between non-local routes and routes that have no locality >>> at all. >>> >> >> I think you'd still be able to if the route-map explicitly matched on >> IPs of routes that "have no locality". But this falls into the "overly >> complex" category of workarounds I was hinting at below IMO. >> >>> Would it maybe make sense to also separate the metrics for non-local >>> routes from the routes without tracked_port? >>> >> >> That would be a behavior change at this point, and harder to backport I >> think. >> >>> Then this should also be doable with a normal routing agent and would >>> not need an ovn change at all. But maybe it still makese sense to have >>> the feature here. >>> > > Hi Dumitru, > >> >> I might be biased but it feels like we'd be putting too much burden (and >> even more restrictions) on the CMS/routing agent on top of the current >> setup it has to do to integrate with OVN if we also require a route-map >> for this specific deployment. I think I'd prefer this explicit OVN >> config option instead. > > that sounds also good for me and yes it makes everyones life a lot > easier. > > Thanks, > Felix > >> >>> Thanks, >>> Felix >>> >> >> Regards, >> Dumitru >> >>>> >>>> Fixes: 712fca55b3b1 ("controller: Prioritize host routes.") >>>> Fixes: cd4ad2f56179 ("northd: Redistribution of NAT/LB routes.") >>>> Reported-at: https://issues.redhat.com/browse/FDP-1464 >>>> Signed-off-by: Dumitru Ceara <dce...@redhat.com> >>>> --- >>>> NOTE: while this change adds a new configuration option, I see this as >>>> a bug fix because there's no (not overly complex) way of using the OVN >>>> dynamic routing feature in 25.03 with topologies as above (which are >>>> likely common). >>>> >>>> This change is backport-safe because the new option is opt-in (disabled >>>> by default) so the default behavior stays untouched. Also, there's no >>>> restriction on update order as older ovn-northd or ovn-controllers just >>>> ignore the new option and there's no database schema change. >>>> --- >>>> NEWS | 5 +++ >>>> controller/route.c | 10 ++++- >>>> northd/northd.c | 11 +++++ >>>> ovn-nb.xml | 39 +++++++++++++++++ >>>> ovn-sb.xml | 18 ++++++++ >>>> tests/system-ovn.at | 104 ++++++++++++++++++++++++++++++++++++++++++++ >>>> 6 files changed, 186 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/NEWS b/NEWS >>>> index 0cce1790db..54d676be87 100644 >>>> --- a/NEWS >>>> +++ b/NEWS >>>> @@ -41,6 +41,11 @@ Post v25.03.0 >>>> - Added support for running tests from the 'check-kernel' system test >>>> target >>>> under retis by setting OVS_TEST_WITH_RETIS=yes. See the 'Testing' >>>> section >>>> of the documentation for more details. >>>> + - Dynamic Routing: >>>> + * Add the option "dynamic-routing-redistribute-local-only" to Logical >>>> + Routers and Logical Router Ports which refines the way in which >>>> + chassis-specific Advertised_Routes (e.g., for NAT and LB IPs) are >>>> + advertised. >>>> >>>> OVN v25.03.0 - 07 Mar 2025 >>>> -------------------------- >>>> diff --git a/controller/route.c b/controller/route.c >>>> index 7615f3f593..603e5749bc 100644 >>>> --- a/controller/route.c >>>> +++ b/controller/route.c >>>> @@ -253,15 +253,23 @@ route_run(struct route_ctx_in *r_ctx_in, >>>> >>>> unsigned int priority = PRIORITY_DEFAULT; >>>> if (route->tracked_port) { >>>> + bool redistribute_local_bound_only = >>>> + smap_get_bool(&route->logical_port->options, >>>> + "dynamic-routing-redistribute-local-only", >>>> + false); >>>> if (lport_is_local(r_ctx_in->sbrec_port_binding_by_name, >>>> r_ctx_in->chassis, >>>> route->tracked_port->logical_port)) { >>>> priority = PRIORITY_LOCAL_BOUND; >>>> sset_add(r_ctx_out->tracked_ports_local, >>>> route->tracked_port->logical_port); >>>> - } else { >>>> + } else if (!redistribute_local_bound_only) { >>>> sset_add(r_ctx_out->tracked_ports_remote, >>>> route->tracked_port->logical_port); >>>> + } else { >>>> + /* Here redistribute_local_bound_only is 'true' and >>>> + * 'tracked_port' is not local so skip this route. */ >>>> + continue; >>>> } >>>> } >>>> >>>> diff --git a/northd/northd.c b/northd/northd.c >>>> index 764575f21e..4acbd2e517 100644 >>>> --- a/northd/northd.c >>>> +++ b/northd/northd.c >>>> @@ -4387,6 +4387,17 @@ sync_pb_for_lrp(struct ovn_port *op, >>>> if (portname) { >>>> smap_add(&new, "dynamic-routing-port-name", portname); >>>> } >>>> + const char *redistribute_local_only_name = >>>> + "dynamic-routing-redistribute-local-only"; >>>> + bool redistribute_local_only_val = >>>> + smap_get_bool(&op->nbrp->options, >>>> + redistribute_local_only_name, >>>> + smap_get_bool(&op->od->nbr->options, >>>> + redistribute_local_only_name, >>>> + false)); >>>> + if (redistribute_local_only_val) { >>>> + smap_add(&new, redistribute_local_only_name, "true"); >>>> + }
While Jakub was testing this patch in his lab he found that will only set the new option on the "cr-lrp" port binding (for routers with DGP) while the advertised_route.logical_port actually points to the regular router port binding. I think we need to change the way the advertised_route.logical_port value is set. In any case, I'll work on a v2 which will include an additional test case too. I'm marking this patch as "changes requested" in patchwork. Regards, Dumitru >>>> } >>>> >>>> const char *ipv6_pd_list = smap_get(&op->sb->options, >>>> "ipv6_ra_pd_list"); >>>> diff --git a/ovn-nb.xml b/ovn-nb.xml >>>> index 4a75818075..cbe9c40bbe 100644 >>>> --- a/ovn-nb.xml >>>> +++ b/ovn-nb.xml >>>> @@ -3192,6 +3192,22 @@ or >>>> </p> >>>> </column> >>>> >>>> + <column name="options" >>>> + key="dynamic-routing-redistribute-local-only" >>>> + type='{"type": "boolean"}'> >>>> + <p> >>>> + Only relevant if <ref column="options" key="dynamic-routing"/> >>>> + is set to <code>true</code>. >>>> + </p> >>>> + >>>> + <p> >>>> + This controls whether <code>ovn-controller</code> will advertise >>>> + <ref table="Advertised_Route" db="OVN_Southbound"/> records >>>> + only on the chassis where their <code>tracked_port</code> is >>>> + bound. Default: <code>false</code>. >>>> + </p> >>>> + </column> >>>> + >>>> <column name="options" key="dynamic-routing-vrf-name" >>>> type='{"type": "string"}'> >>>> <p> >>>> @@ -4166,6 +4182,29 @@ or >>>> >>>> </column> >>>> >>>> + <column name="options" >>>> + key="dynamic-routing-redistribute-local-only" >>>> + type='{"type": "boolean"}'> >>>> + <p> >>>> + Only relevant if <ref column="options" key="dynamic-routing"/> >>>> + is set to <code>true</code>. >>>> + </p> >>>> + >>>> + <p> >>>> + This controls whether <code>ovn-controller</code> will advertise >>>> + <ref table="Advertised_Route" db="OVN_Southbound"/> records >>>> + only on the chassis where their <code>tracked_port</code> is >>>> + bound. >>>> + </p> >>>> + >>>> + <p> >>>> + If not set the value from <ref column="options" >>>> + key="dynamic-routing-redistribute-local-only" >>>> + table="Logical_Router"/> on the <ref table="Logical_Router"/> >>>> will >>>> + be used. >>>> + </p> >>>> + </column> >>>> + >>>> <column name="options" key="dynamic-routing-maintain-vrf" >>>> type='{"type": "boolean"}'> >>>> <p> >>>> diff --git a/ovn-sb.xml b/ovn-sb.xml >>>> index db5faac661..395deae83d 100644 >>>> --- a/ovn-sb.xml >>>> +++ b/ovn-sb.xml >>>> @@ -3908,6 +3908,24 @@ tcp.flags = RST; >>>> </column> >>>> </group> >>>> >>>> + <group title="Dynamic Routing"> >>>> + <column name="options" >>>> + key="dynamic-routing-redistribute-local-only" >>>> + type='{"type": "boolean"}'> >>>> + <p> >>>> + Only relevant if <ref column="options" key="dynamic-routing"/> >>>> + is set to <code>true</code>. >>>> + </p> >>>> + >>>> + <p> >>>> + This controls whether <code>ovn-controller</code> will advertise >>>> + <ref table="Advertised_Route" db="OVN_Southbound"/> records >>>> + only on the chassis where their <code>tracked_port</code> is >>>> + bound. Default: <code>false</code>. >>>> + </p> >>>> + </column> >>>> + </group> >>>> + >>>> <group title="Nested Containers"> >>>> <p> >>>> These columns support containers nested within a VM. >>>> Specifically, >>>> diff --git a/tests/system-ovn.at b/tests/system-ovn.at >>>> index e0407383af..a86bfa44e0 100644 >>>> --- a/tests/system-ovn.at >>>> +++ b/tests/system-ovn.at >>>> @@ -16759,6 +16759,29 @@ OVN_ROUTE_EQUAL([ovnvrf$vrf], [dnl >>>> blackhole 10.42.10.10 proto ovn metric 1000 >>>> blackhole 172.16.1.150 proto ovn metric 1000]) >>>> >>>> +# Set LR/LRP.options.dynamic-routing-redistribute-local-only=true >>>> +# and verify that lower priority routes are not advertised anymore. >>>> +check ovn-nbctl --wait=hv set logical_router_port r1-join \ >>>> + options:dynamic-routing-redistribute-local-only=true >>>> + >>>> +OVN_ROUTE_EQUAL([ovnvrf$vrf], [dnl >>>> +blackhole 172.16.1.150 proto ovn metric 1000]) >>>> + >>>> +check ovn-nbctl --wait=hv remove logical_router_port r1-join \ >>>> + options dynamic-routing-redistribute-local-only \ >>>> + -- set logical_router R1 \ >>>> + options:dynamic-routing-redistribute-local-only=true >>>> + >>>> +OVN_ROUTE_EQUAL([ovnvrf$vrf], [dnl >>>> +blackhole 172.16.1.150 proto ovn metric 1000]) >>>> + >>>> +check ovn-nbctl --wait=hv remove logical_router R1 \ >>>> + options dynamic-routing-redistribute-local-only >>>> + >>>> +OVN_ROUTE_EQUAL([ovnvrf$vrf], [dnl >>>> +blackhole 10.42.10.10 proto ovn metric 1000 >>>> +blackhole 172.16.1.150 proto ovn metric 1000]) >>>> + >>>> # Before cleanup of hv1 ovn-controller, trigger a recompute >>>> # to cleanup the local datapaths. Otherwise, the test will fail. >>>> # This is because we don't remove a datapath from >>>> @@ -16904,6 +16927,29 @@ OVN_ROUTE_V6_EQUAL([ovnvrf$vrf], [dnl >>>> blackhole 2001:db8:1001::150 dev lo proto ovn metric 1000 pref medium >>>> blackhole 2001:db8:3001::150 dev lo proto ovn metric 1000 pref medium]) >>>> >>>> +# Set LR/LRP.options.dynamic-routing-redistribute-local-only=true >>>> +# and verify that lower priority routes are not advertised anymore. >>>> +check ovn-nbctl --wait=hv set logical_router_port r1-join \ >>>> + options:dynamic-routing-redistribute-local-only=true >>>> + >>>> +OVN_ROUTE_V6_EQUAL([ovnvrf$vrf], [dnl >>>> +blackhole 2001:db8:1001::150 dev lo proto ovn metric 1000 pref medium]) >>>> + >>>> +check ovn-nbctl --wait=hv remove logical_router_port r1-join \ >>>> + options dynamic-routing-redistribute-local-only \ >>>> + -- set logical_router R1 \ >>>> + options:dynamic-routing-redistribute-local-only=true >>>> + >>>> +OVN_ROUTE_V6_EQUAL([ovnvrf$vrf], [dnl >>>> +blackhole 2001:db8:1001::150 dev lo proto ovn metric 1000 pref medium]) >>>> + >>>> +check ovn-nbctl --wait=hv remove logical_router R1 \ >>>> + options dynamic-routing-redistribute-local-only >>>> + >>>> +OVN_ROUTE_V6_EQUAL([ovnvrf$vrf], [dnl >>>> +blackhole 2001:db8:1001::150 dev lo proto ovn metric 1000 pref medium >>>> +blackhole 2001:db8:3001::150 dev lo proto ovn metric 1000 pref medium]) >>>> + >>>> # Before cleanup of hv1 ovn-controller, trigger a recompute >>>> # to cleanup the local datapaths. Otherwise, the test will fail. >>>> # This is because we don't remove a datapath from >>>> @@ -17069,6 +17115,35 @@ blackhole 10.42.10.13 proto ovn metric 1000 >>>> blackhole 172.16.1.10 proto ovn metric 1000 >>>> blackhole 172.16.1.11 proto ovn metric 1000]) >>>> >>>> +# Set LR/LRP.options.dynamic-routing-redistribute-local-only=true >>>> +# and verify that lower priority routes are not advertised anymore. >>>> +check ovn-nbctl --wait=hv set logical_router_port r1-join \ >>>> + options:dynamic-routing-redistribute-local-only=true >>>> + >>>> +OVN_ROUTE_EQUAL([ovnvrf$vrf], [dnl >>>> +blackhole 172.16.1.10 proto ovn metric 1000 >>>> +blackhole 172.16.1.11 proto ovn metric 1000]) >>>> + >>>> +check ovn-nbctl --wait=hv remove logical_router_port r1-join \ >>>> + options dynamic-routing-redistribute-local-only \ >>>> + -- set logical_router R1 \ >>>> + options:dynamic-routing-redistribute-local-only=true >>>> + >>>> +OVN_ROUTE_EQUAL([ovnvrf$vrf], [dnl >>>> +blackhole 172.16.1.10 proto ovn metric 1000 >>>> +blackhole 172.16.1.11 proto ovn metric 1000]) >>>> + >>>> +check ovn-nbctl --wait=hv remove logical_router R1 \ >>>> + options dynamic-routing-redistribute-local-only >>>> + >>>> +OVN_ROUTE_EQUAL([ovnvrf$vrf], [dnl >>>> +blackhole 10.42.10.10 proto ovn metric 1000 >>>> +blackhole 10.42.10.11 proto ovn metric 1000 >>>> +blackhole 10.42.10.12 proto ovn metric 1000 >>>> +blackhole 10.42.10.13 proto ovn metric 1000 >>>> +blackhole 172.16.1.10 proto ovn metric 1000 >>>> +blackhole 172.16.1.11 proto ovn metric 1000]) >>>> + >>>> # Add "guest" LS connected the distributed router R2 and one "VM" called >>>> # guest1. >>>> # Also, connect R2 to ls-join via another DGW. >>>> @@ -17283,6 +17358,35 @@ blackhole 2001:db8:1004::151 dev lo proto ovn >>>> metric 1000 pref medium >>>> blackhole 2001:db8:1004::152 dev lo proto ovn metric 1000 pref medium >>>> blackhole 2001:db8:1004::153 dev lo proto ovn metric 1000 pref medium]) >>>> >>>> +# Set LR/LRP.options.dynamic-routing-redistribute-local-only=true >>>> +# and verify that lower priority routes are not advertised anymore. >>>> +check ovn-nbctl --wait=hv set logical_router_port r1-join \ >>>> + options:dynamic-routing-redistribute-local-only=true >>>> + >>>> +OVN_ROUTE_V6_EQUAL([ovnvrf$vrf], [dnl >>>> +blackhole 2001:db8:1003::150 dev lo proto ovn metric 1000 pref medium >>>> +blackhole 2001:db8:1003::151 dev lo proto ovn metric 1000 pref medium]) >>>> + >>>> +check ovn-nbctl --wait=hv remove logical_router_port r1-join \ >>>> + options dynamic-routing-redistribute-local-only \ >>>> + -- set logical_router R1 \ >>>> + options:dynamic-routing-redistribute-local-only=true >>>> + >>>> +OVN_ROUTE_V6_EQUAL([ovnvrf$vrf], [dnl >>>> +blackhole 2001:db8:1003::150 dev lo proto ovn metric 1000 pref medium >>>> +blackhole 2001:db8:1003::151 dev lo proto ovn metric 1000 pref medium]) >>>> + >>>> +check ovn-nbctl --wait=hv remove logical_router R1 \ >>>> + options dynamic-routing-redistribute-local-only >>>> + >>>> +OVN_ROUTE_V6_EQUAL([ovnvrf$vrf], [dnl >>>> +blackhole 2001:db8:1003::150 dev lo proto ovn metric 1000 pref medium >>>> +blackhole 2001:db8:1003::151 dev lo proto ovn metric 1000 pref medium >>>> +blackhole 2001:db8:1004::150 dev lo proto ovn metric 1000 pref medium >>>> +blackhole 2001:db8:1004::151 dev lo proto ovn metric 1000 pref medium >>>> +blackhole 2001:db8:1004::152 dev lo proto ovn metric 1000 pref medium >>>> +blackhole 2001:db8:1004::153 dev lo proto ovn metric 1000 pref medium]) >>>> + >>>> # Add "guest" LS connected the distributed router R2 and one "VM" called >>>> # guest1. >>>> # Also, connect R2 to ls-join via another DGW. >>>> -- >>>> 2.50.1 >>>> >>> >> > _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev