On 7/31/25 11:46 AM, Felix Huettner wrote: > On Thu, Jul 31, 2025 at 11:34:47AM +0200, Dumitru Ceara wrote: >> In 712fca55b3b1 ("controller: Prioritize host routes.") and later in >> cd4ad2f56179 ("northd: Redistribution of NAT/LB routes.") support was >> added for advertising routes for objects (logical port / NAT / LB IPs) >> that are bound to a single chassis (e.g., distributed NAT) with a better >> metric on the chassis where they're bound. On all other chassis, >> however, the route was still advertised but only with a worse metric. >> >> While this works fine in deployments as described in 712fca55b3b1 >> ("controller: Prioritize host routes."), this behavior actually makes >> the dynamic routing feature unusable in cases when all inter-node >> traffic is forwarded through the L3 fabric, e.g. spine-leaf topologies >> with iBGP between leaves and spines and eBGP between OVN compute nodes >> and fabric leafs: that's due to the fact that eBGP routes are usually >> preferred over iBGP routes. >> >> Consider the following example: >> +------+ +------+ >> |Spine1| |Spine2| >> +------+ +------+ >> | \ / | >> | \ / | >> | \ / | >> | X | >> | / \ | >> | / \ | >> | / \ | >> +-----+ +----+ >> |Leaf1| |Leaf2| >> +-----+ +----+ >> | | >> | | >> +----------+ +----------+ >> | Chassis1 | | Chassis2 | >> +----------+ +----------+ >> >> An OVN distributed NAT, e.g., 42.42.42.42 "bound" to Chassis1 would be >> advertised with a metric of 100 on the eBGP Chassis1 <-> Leaf1 >> connection and with a metric of 1000 (worse) on the eBGP Chasssi2 <-> >> Leaf2 connection. Leaf2 will also learn an iBGP route (through Spine1 >> and Spine2) for the same prefix (towards Chassis1) but because eBGP >> administrative distance is better than the iBGP one, Leaf2 will always >> prefer the metric 1000 route. That means Leaf2 will always forward >> traffic destined to 42.42.42.42 via Chassis2 which is sub-optimal. >> >> The main reason for advertising the (NAT) IP on both chassis was likely >> to provide redundancy in case traffic hits the OVN cluster on a node >> that doesn't host the NAT. But with topologies as the one depicted >> above the redundancy is handled by the fabric. >> >> OVN didn't have a way to disable worse metric route advertisements. This >> commit adds one, through a new logical router / logical router port >> option, "dynamic-routing-redistribute-local-only" which, if enabled, >> informs ovn-controller to not advertise routes for chassis bound IPs >> (Sb.Advertised_Route.tracked_port set) on chassis where the tracked port >> is not bound. By default this option is disabled. > > Hi Dumitru, >
Hi Felix, Thanks for the quick feedback! > thanks for that. > > Back during the implemention i was thinking that this could be done by > using frr (or similar) route maps that only take routes with a specific > metric (100) and ignore all metric 1000 routes. > > However i just realized that this does not work as you can not > differenciate between non-local routes and routes that have no locality > at all. > I think you'd still be able to if the route-map explicitly matched on IPs of routes that "have no locality". But this falls into the "overly complex" category of workarounds I was hinting at below IMO. > Would it maybe make sense to also separate the metrics for non-local > routes from the routes without tracked_port? > That would be a behavior change at this point, and harder to backport I think. > Then this should also be doable with a normal routing agent and would > not need an ovn change at all. But maybe it still makese sense to have > the feature here. > I might be biased but it feels like we'd be putting too much burden (and even more restrictions) on the CMS/routing agent on top of the current setup it has to do to integrate with OVN if we also require a route-map for this specific deployment. I think I'd prefer this explicit OVN config option instead. > Thanks, > Felix > Regards, Dumitru >> >> Fixes: 712fca55b3b1 ("controller: Prioritize host routes.") >> Fixes: cd4ad2f56179 ("northd: Redistribution of NAT/LB routes.") >> Reported-at: https://issues.redhat.com/browse/FDP-1464 >> Signed-off-by: Dumitru Ceara <dce...@redhat.com> >> --- >> NOTE: while this change adds a new configuration option, I see this as >> a bug fix because there's no (not overly complex) way of using the OVN >> dynamic routing feature in 25.03 with topologies as above (which are >> likely common). >> >> This change is backport-safe because the new option is opt-in (disabled >> by default) so the default behavior stays untouched. Also, there's no >> restriction on update order as older ovn-northd or ovn-controllers just >> ignore the new option and there's no database schema change. >> --- >> NEWS | 5 +++ >> controller/route.c | 10 ++++- >> northd/northd.c | 11 +++++ >> ovn-nb.xml | 39 +++++++++++++++++ >> ovn-sb.xml | 18 ++++++++ >> tests/system-ovn.at | 104 ++++++++++++++++++++++++++++++++++++++++++++ >> 6 files changed, 186 insertions(+), 1 deletion(-) >> >> diff --git a/NEWS b/NEWS >> index 0cce1790db..54d676be87 100644 >> --- a/NEWS >> +++ b/NEWS >> @@ -41,6 +41,11 @@ Post v25.03.0 >> - Added support for running tests from the 'check-kernel' system test >> target >> under retis by setting OVS_TEST_WITH_RETIS=yes. See the 'Testing' >> section >> of the documentation for more details. >> + - Dynamic Routing: >> + * Add the option "dynamic-routing-redistribute-local-only" to Logical >> + Routers and Logical Router Ports which refines the way in which >> + chassis-specific Advertised_Routes (e.g., for NAT and LB IPs) are >> + advertised. >> >> OVN v25.03.0 - 07 Mar 2025 >> -------------------------- >> diff --git a/controller/route.c b/controller/route.c >> index 7615f3f593..603e5749bc 100644 >> --- a/controller/route.c >> +++ b/controller/route.c >> @@ -253,15 +253,23 @@ route_run(struct route_ctx_in *r_ctx_in, >> >> unsigned int priority = PRIORITY_DEFAULT; >> if (route->tracked_port) { >> + bool redistribute_local_bound_only = >> + smap_get_bool(&route->logical_port->options, >> + "dynamic-routing-redistribute-local-only", >> + false); >> if (lport_is_local(r_ctx_in->sbrec_port_binding_by_name, >> r_ctx_in->chassis, >> route->tracked_port->logical_port)) { >> priority = PRIORITY_LOCAL_BOUND; >> sset_add(r_ctx_out->tracked_ports_local, >> route->tracked_port->logical_port); >> - } else { >> + } else if (!redistribute_local_bound_only) { >> sset_add(r_ctx_out->tracked_ports_remote, >> route->tracked_port->logical_port); >> + } else { >> + /* Here redistribute_local_bound_only is 'true' and >> + * 'tracked_port' is not local so skip this route. */ >> + continue; >> } >> } >> >> diff --git a/northd/northd.c b/northd/northd.c >> index 764575f21e..4acbd2e517 100644 >> --- a/northd/northd.c >> +++ b/northd/northd.c >> @@ -4387,6 +4387,17 @@ sync_pb_for_lrp(struct ovn_port *op, >> if (portname) { >> smap_add(&new, "dynamic-routing-port-name", portname); >> } >> + const char *redistribute_local_only_name = >> + "dynamic-routing-redistribute-local-only"; >> + bool redistribute_local_only_val = >> + smap_get_bool(&op->nbrp->options, >> + redistribute_local_only_name, >> + smap_get_bool(&op->od->nbr->options, >> + redistribute_local_only_name, >> + false)); >> + if (redistribute_local_only_val) { >> + smap_add(&new, redistribute_local_only_name, "true"); >> + } >> } >> >> const char *ipv6_pd_list = smap_get(&op->sb->options, >> "ipv6_ra_pd_list"); >> diff --git a/ovn-nb.xml b/ovn-nb.xml >> index 4a75818075..cbe9c40bbe 100644 >> --- a/ovn-nb.xml >> +++ b/ovn-nb.xml >> @@ -3192,6 +3192,22 @@ or >> </p> >> </column> >> >> + <column name="options" >> + key="dynamic-routing-redistribute-local-only" >> + type='{"type": "boolean"}'> >> + <p> >> + Only relevant if <ref column="options" key="dynamic-routing"/> >> + is set to <code>true</code>. >> + </p> >> + >> + <p> >> + This controls whether <code>ovn-controller</code> will advertise >> + <ref table="Advertised_Route" db="OVN_Southbound"/> records >> + only on the chassis where their <code>tracked_port</code> is >> + bound. Default: <code>false</code>. >> + </p> >> + </column> >> + >> <column name="options" key="dynamic-routing-vrf-name" >> type='{"type": "string"}'> >> <p> >> @@ -4166,6 +4182,29 @@ or >> >> </column> >> >> + <column name="options" >> + key="dynamic-routing-redistribute-local-only" >> + type='{"type": "boolean"}'> >> + <p> >> + Only relevant if <ref column="options" key="dynamic-routing"/> >> + is set to <code>true</code>. >> + </p> >> + >> + <p> >> + This controls whether <code>ovn-controller</code> will advertise >> + <ref table="Advertised_Route" db="OVN_Southbound"/> records >> + only on the chassis where their <code>tracked_port</code> is >> + bound. >> + </p> >> + >> + <p> >> + If not set the value from <ref column="options" >> + key="dynamic-routing-redistribute-local-only" >> + table="Logical_Router"/> on the <ref table="Logical_Router"/> will >> + be used. >> + </p> >> + </column> >> + >> <column name="options" key="dynamic-routing-maintain-vrf" >> type='{"type": "boolean"}'> >> <p> >> diff --git a/ovn-sb.xml b/ovn-sb.xml >> index db5faac661..395deae83d 100644 >> --- a/ovn-sb.xml >> +++ b/ovn-sb.xml >> @@ -3908,6 +3908,24 @@ tcp.flags = RST; >> </column> >> </group> >> >> + <group title="Dynamic Routing"> >> + <column name="options" >> + key="dynamic-routing-redistribute-local-only" >> + type='{"type": "boolean"}'> >> + <p> >> + Only relevant if <ref column="options" key="dynamic-routing"/> >> + is set to <code>true</code>. >> + </p> >> + >> + <p> >> + This controls whether <code>ovn-controller</code> will advertise >> + <ref table="Advertised_Route" db="OVN_Southbound"/> records >> + only on the chassis where their <code>tracked_port</code> is >> + bound. Default: <code>false</code>. >> + </p> >> + </column> >> + </group> >> + >> <group title="Nested Containers"> >> <p> >> These columns support containers nested within a VM. Specifically, >> diff --git a/tests/system-ovn.at b/tests/system-ovn.at >> index e0407383af..a86bfa44e0 100644 >> --- a/tests/system-ovn.at >> +++ b/tests/system-ovn.at >> @@ -16759,6 +16759,29 @@ OVN_ROUTE_EQUAL([ovnvrf$vrf], [dnl >> blackhole 10.42.10.10 proto ovn metric 1000 >> blackhole 172.16.1.150 proto ovn metric 1000]) >> >> +# Set LR/LRP.options.dynamic-routing-redistribute-local-only=true >> +# and verify that lower priority routes are not advertised anymore. >> +check ovn-nbctl --wait=hv set logical_router_port r1-join \ >> + options:dynamic-routing-redistribute-local-only=true >> + >> +OVN_ROUTE_EQUAL([ovnvrf$vrf], [dnl >> +blackhole 172.16.1.150 proto ovn metric 1000]) >> + >> +check ovn-nbctl --wait=hv remove logical_router_port r1-join \ >> + options dynamic-routing-redistribute-local-only \ >> + -- set logical_router R1 \ >> + options:dynamic-routing-redistribute-local-only=true >> + >> +OVN_ROUTE_EQUAL([ovnvrf$vrf], [dnl >> +blackhole 172.16.1.150 proto ovn metric 1000]) >> + >> +check ovn-nbctl --wait=hv remove logical_router R1 \ >> + options dynamic-routing-redistribute-local-only >> + >> +OVN_ROUTE_EQUAL([ovnvrf$vrf], [dnl >> +blackhole 10.42.10.10 proto ovn metric 1000 >> +blackhole 172.16.1.150 proto ovn metric 1000]) >> + >> # Before cleanup of hv1 ovn-controller, trigger a recompute >> # to cleanup the local datapaths. Otherwise, the test will fail. >> # This is because we don't remove a datapath from >> @@ -16904,6 +16927,29 @@ OVN_ROUTE_V6_EQUAL([ovnvrf$vrf], [dnl >> blackhole 2001:db8:1001::150 dev lo proto ovn metric 1000 pref medium >> blackhole 2001:db8:3001::150 dev lo proto ovn metric 1000 pref medium]) >> >> +# Set LR/LRP.options.dynamic-routing-redistribute-local-only=true >> +# and verify that lower priority routes are not advertised anymore. >> +check ovn-nbctl --wait=hv set logical_router_port r1-join \ >> + options:dynamic-routing-redistribute-local-only=true >> + >> +OVN_ROUTE_V6_EQUAL([ovnvrf$vrf], [dnl >> +blackhole 2001:db8:1001::150 dev lo proto ovn metric 1000 pref medium]) >> + >> +check ovn-nbctl --wait=hv remove logical_router_port r1-join \ >> + options dynamic-routing-redistribute-local-only \ >> + -- set logical_router R1 \ >> + options:dynamic-routing-redistribute-local-only=true >> + >> +OVN_ROUTE_V6_EQUAL([ovnvrf$vrf], [dnl >> +blackhole 2001:db8:1001::150 dev lo proto ovn metric 1000 pref medium]) >> + >> +check ovn-nbctl --wait=hv remove logical_router R1 \ >> + options dynamic-routing-redistribute-local-only >> + >> +OVN_ROUTE_V6_EQUAL([ovnvrf$vrf], [dnl >> +blackhole 2001:db8:1001::150 dev lo proto ovn metric 1000 pref medium >> +blackhole 2001:db8:3001::150 dev lo proto ovn metric 1000 pref medium]) >> + >> # Before cleanup of hv1 ovn-controller, trigger a recompute >> # to cleanup the local datapaths. Otherwise, the test will fail. >> # This is because we don't remove a datapath from >> @@ -17069,6 +17115,35 @@ blackhole 10.42.10.13 proto ovn metric 1000 >> blackhole 172.16.1.10 proto ovn metric 1000 >> blackhole 172.16.1.11 proto ovn metric 1000]) >> >> +# Set LR/LRP.options.dynamic-routing-redistribute-local-only=true >> +# and verify that lower priority routes are not advertised anymore. >> +check ovn-nbctl --wait=hv set logical_router_port r1-join \ >> + options:dynamic-routing-redistribute-local-only=true >> + >> +OVN_ROUTE_EQUAL([ovnvrf$vrf], [dnl >> +blackhole 172.16.1.10 proto ovn metric 1000 >> +blackhole 172.16.1.11 proto ovn metric 1000]) >> + >> +check ovn-nbctl --wait=hv remove logical_router_port r1-join \ >> + options dynamic-routing-redistribute-local-only \ >> + -- set logical_router R1 \ >> + options:dynamic-routing-redistribute-local-only=true >> + >> +OVN_ROUTE_EQUAL([ovnvrf$vrf], [dnl >> +blackhole 172.16.1.10 proto ovn metric 1000 >> +blackhole 172.16.1.11 proto ovn metric 1000]) >> + >> +check ovn-nbctl --wait=hv remove logical_router R1 \ >> + options dynamic-routing-redistribute-local-only >> + >> +OVN_ROUTE_EQUAL([ovnvrf$vrf], [dnl >> +blackhole 10.42.10.10 proto ovn metric 1000 >> +blackhole 10.42.10.11 proto ovn metric 1000 >> +blackhole 10.42.10.12 proto ovn metric 1000 >> +blackhole 10.42.10.13 proto ovn metric 1000 >> +blackhole 172.16.1.10 proto ovn metric 1000 >> +blackhole 172.16.1.11 proto ovn metric 1000]) >> + >> # Add "guest" LS connected the distributed router R2 and one "VM" called >> # guest1. >> # Also, connect R2 to ls-join via another DGW. >> @@ -17283,6 +17358,35 @@ blackhole 2001:db8:1004::151 dev lo proto ovn >> metric 1000 pref medium >> blackhole 2001:db8:1004::152 dev lo proto ovn metric 1000 pref medium >> blackhole 2001:db8:1004::153 dev lo proto ovn metric 1000 pref medium]) >> >> +# Set LR/LRP.options.dynamic-routing-redistribute-local-only=true >> +# and verify that lower priority routes are not advertised anymore. >> +check ovn-nbctl --wait=hv set logical_router_port r1-join \ >> + options:dynamic-routing-redistribute-local-only=true >> + >> +OVN_ROUTE_V6_EQUAL([ovnvrf$vrf], [dnl >> +blackhole 2001:db8:1003::150 dev lo proto ovn metric 1000 pref medium >> +blackhole 2001:db8:1003::151 dev lo proto ovn metric 1000 pref medium]) >> + >> +check ovn-nbctl --wait=hv remove logical_router_port r1-join \ >> + options dynamic-routing-redistribute-local-only \ >> + -- set logical_router R1 \ >> + options:dynamic-routing-redistribute-local-only=true >> + >> +OVN_ROUTE_V6_EQUAL([ovnvrf$vrf], [dnl >> +blackhole 2001:db8:1003::150 dev lo proto ovn metric 1000 pref medium >> +blackhole 2001:db8:1003::151 dev lo proto ovn metric 1000 pref medium]) >> + >> +check ovn-nbctl --wait=hv remove logical_router R1 \ >> + options dynamic-routing-redistribute-local-only >> + >> +OVN_ROUTE_V6_EQUAL([ovnvrf$vrf], [dnl >> +blackhole 2001:db8:1003::150 dev lo proto ovn metric 1000 pref medium >> +blackhole 2001:db8:1003::151 dev lo proto ovn metric 1000 pref medium >> +blackhole 2001:db8:1004::150 dev lo proto ovn metric 1000 pref medium >> +blackhole 2001:db8:1004::151 dev lo proto ovn metric 1000 pref medium >> +blackhole 2001:db8:1004::152 dev lo proto ovn metric 1000 pref medium >> +blackhole 2001:db8:1004::153 dev lo proto ovn metric 1000 pref medium]) >> + >> # Add "guest" LS connected the distributed router R2 and one "VM" called >> # guest1. >> # Also, connect R2 to ls-join via another DGW. >> -- >> 2.50.1 >> > _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev