In 712fca55b3b1 ("controller: Prioritize host routes.") and later in cd4ad2f56179 ("northd: Redistribution of NAT/LB routes.") support was added for advertising routes for objects (logical port / NAT / LB IPs) that are bound to a single chassis (e.g., distributed NAT) with a better metric on the chassis where they're bound. On all other chassis, however, the route was still advertised but only with a worse metric.
While this works fine in deployments as described in 712fca55b3b1 ("controller: Prioritize host routes."), this behavior actually makes the dynamic routing feature unusable in cases when all inter-node traffic is forwarded through the L3 fabric, e.g. spine-leaf topologies with iBGP between leaves and spines and eBGP between OVN compute nodes and fabric leafs: that's due to the fact that eBGP routes are usually preferred over iBGP routes. Consider the following example: +------+ +------+ |Spine1| |Spine2| +------+ +------+ | \ / | | \ / | | \ / | | X | | / \ | | / \ | | / \ | +-----+ +----+ |Leaf1| |Leaf2| +-----+ +----+ | | | | +----------+ +----------+ | Chassis1 | | Chassis2 | +----------+ +----------+ An OVN distributed NAT, e.g., 42.42.42.42 "bound" to Chassis1 would be advertised with a metric of 100 on the eBGP Chassis1 <-> Leaf1 connection and with a metric of 1000 (worse) on the eBGP Chasssi2 <-> Leaf2 connection. Leaf2 will also learn an iBGP route (through Spine1 and Spine2) for the same prefix (towards Chassis1) but because eBGP administrative distance is better than the iBGP one, Leaf2 will always prefer the metric 1000 route. That means Leaf2 will always forward traffic destined to 42.42.42.42 via Chassis2 which is sub-optimal. The main reason for advertising the (NAT) IP on both chassis was likely to provide redundancy in case traffic hits the OVN cluster on a node that doesn't host the NAT. But with topologies as the one depicted above the redundancy is handled by the fabric. OVN didn't have a way to disable worse metric route advertisements. This commit adds one, through a new logical router / logical router port option, "dynamic-routing-redistribute-local-only" which, if enabled, informs ovn-controller to not advertise routes for chassis bound IPs (Sb.Advertised_Route.tracked_port set) on chassis where the tracked port is not bound. By default this option is disabled. Fixes: 712fca55b3b1 ("controller: Prioritize host routes.") Fixes: cd4ad2f56179 ("northd: Redistribution of NAT/LB routes.") Reported-at: https://issues.redhat.com/browse/FDP-1464 Signed-off-by: Dumitru Ceara <dce...@redhat.com> --- NOTE: while this change adds a new configuration option, I see this as a bug fix because there's no (not overly complex) way of using the OVN dynamic routing feature in 25.03 with topologies as above (which are likely common). This change is backport-safe because the new option is opt-in (disabled by default) so the default behavior stays untouched. Also, there's no restriction on update order as older ovn-northd or ovn-controllers just ignore the new option and there's no database schema change. --- NEWS | 5 +++ controller/route.c | 10 ++++- northd/northd.c | 11 +++++ ovn-nb.xml | 39 +++++++++++++++++ ovn-sb.xml | 18 ++++++++ tests/system-ovn.at | 104 ++++++++++++++++++++++++++++++++++++++++++++ 6 files changed, 186 insertions(+), 1 deletion(-) diff --git a/NEWS b/NEWS index 0cce1790db..54d676be87 100644 --- a/NEWS +++ b/NEWS @@ -41,6 +41,11 @@ Post v25.03.0 - Added support for running tests from the 'check-kernel' system test target under retis by setting OVS_TEST_WITH_RETIS=yes. See the 'Testing' section of the documentation for more details. + - Dynamic Routing: + * Add the option "dynamic-routing-redistribute-local-only" to Logical + Routers and Logical Router Ports which refines the way in which + chassis-specific Advertised_Routes (e.g., for NAT and LB IPs) are + advertised. OVN v25.03.0 - 07 Mar 2025 -------------------------- diff --git a/controller/route.c b/controller/route.c index 7615f3f593..603e5749bc 100644 --- a/controller/route.c +++ b/controller/route.c @@ -253,15 +253,23 @@ route_run(struct route_ctx_in *r_ctx_in, unsigned int priority = PRIORITY_DEFAULT; if (route->tracked_port) { + bool redistribute_local_bound_only = + smap_get_bool(&route->logical_port->options, + "dynamic-routing-redistribute-local-only", + false); if (lport_is_local(r_ctx_in->sbrec_port_binding_by_name, r_ctx_in->chassis, route->tracked_port->logical_port)) { priority = PRIORITY_LOCAL_BOUND; sset_add(r_ctx_out->tracked_ports_local, route->tracked_port->logical_port); - } else { + } else if (!redistribute_local_bound_only) { sset_add(r_ctx_out->tracked_ports_remote, route->tracked_port->logical_port); + } else { + /* Here redistribute_local_bound_only is 'true' and + * 'tracked_port' is not local so skip this route. */ + continue; } } diff --git a/northd/northd.c b/northd/northd.c index 764575f21e..4acbd2e517 100644 --- a/northd/northd.c +++ b/northd/northd.c @@ -4387,6 +4387,17 @@ sync_pb_for_lrp(struct ovn_port *op, if (portname) { smap_add(&new, "dynamic-routing-port-name", portname); } + const char *redistribute_local_only_name = + "dynamic-routing-redistribute-local-only"; + bool redistribute_local_only_val = + smap_get_bool(&op->nbrp->options, + redistribute_local_only_name, + smap_get_bool(&op->od->nbr->options, + redistribute_local_only_name, + false)); + if (redistribute_local_only_val) { + smap_add(&new, redistribute_local_only_name, "true"); + } } const char *ipv6_pd_list = smap_get(&op->sb->options, "ipv6_ra_pd_list"); diff --git a/ovn-nb.xml b/ovn-nb.xml index 4a75818075..cbe9c40bbe 100644 --- a/ovn-nb.xml +++ b/ovn-nb.xml @@ -3192,6 +3192,22 @@ or </p> </column> + <column name="options" + key="dynamic-routing-redistribute-local-only" + type='{"type": "boolean"}'> + <p> + Only relevant if <ref column="options" key="dynamic-routing"/> + is set to <code>true</code>. + </p> + + <p> + This controls whether <code>ovn-controller</code> will advertise + <ref table="Advertised_Route" db="OVN_Southbound"/> records + only on the chassis where their <code>tracked_port</code> is + bound. Default: <code>false</code>. + </p> + </column> + <column name="options" key="dynamic-routing-vrf-name" type='{"type": "string"}'> <p> @@ -4166,6 +4182,29 @@ or </column> + <column name="options" + key="dynamic-routing-redistribute-local-only" + type='{"type": "boolean"}'> + <p> + Only relevant if <ref column="options" key="dynamic-routing"/> + is set to <code>true</code>. + </p> + + <p> + This controls whether <code>ovn-controller</code> will advertise + <ref table="Advertised_Route" db="OVN_Southbound"/> records + only on the chassis where their <code>tracked_port</code> is + bound. + </p> + + <p> + If not set the value from <ref column="options" + key="dynamic-routing-redistribute-local-only" + table="Logical_Router"/> on the <ref table="Logical_Router"/> will + be used. + </p> + </column> + <column name="options" key="dynamic-routing-maintain-vrf" type='{"type": "boolean"}'> <p> diff --git a/ovn-sb.xml b/ovn-sb.xml index db5faac661..395deae83d 100644 --- a/ovn-sb.xml +++ b/ovn-sb.xml @@ -3908,6 +3908,24 @@ tcp.flags = RST; </column> </group> + <group title="Dynamic Routing"> + <column name="options" + key="dynamic-routing-redistribute-local-only" + type='{"type": "boolean"}'> + <p> + Only relevant if <ref column="options" key="dynamic-routing"/> + is set to <code>true</code>. + </p> + + <p> + This controls whether <code>ovn-controller</code> will advertise + <ref table="Advertised_Route" db="OVN_Southbound"/> records + only on the chassis where their <code>tracked_port</code> is + bound. Default: <code>false</code>. + </p> + </column> + </group> + <group title="Nested Containers"> <p> These columns support containers nested within a VM. Specifically, diff --git a/tests/system-ovn.at b/tests/system-ovn.at index e0407383af..a86bfa44e0 100644 --- a/tests/system-ovn.at +++ b/tests/system-ovn.at @@ -16759,6 +16759,29 @@ OVN_ROUTE_EQUAL([ovnvrf$vrf], [dnl blackhole 10.42.10.10 proto ovn metric 1000 blackhole 172.16.1.150 proto ovn metric 1000]) +# Set LR/LRP.options.dynamic-routing-redistribute-local-only=true +# and verify that lower priority routes are not advertised anymore. +check ovn-nbctl --wait=hv set logical_router_port r1-join \ + options:dynamic-routing-redistribute-local-only=true + +OVN_ROUTE_EQUAL([ovnvrf$vrf], [dnl +blackhole 172.16.1.150 proto ovn metric 1000]) + +check ovn-nbctl --wait=hv remove logical_router_port r1-join \ + options dynamic-routing-redistribute-local-only \ + -- set logical_router R1 \ + options:dynamic-routing-redistribute-local-only=true + +OVN_ROUTE_EQUAL([ovnvrf$vrf], [dnl +blackhole 172.16.1.150 proto ovn metric 1000]) + +check ovn-nbctl --wait=hv remove logical_router R1 \ + options dynamic-routing-redistribute-local-only + +OVN_ROUTE_EQUAL([ovnvrf$vrf], [dnl +blackhole 10.42.10.10 proto ovn metric 1000 +blackhole 172.16.1.150 proto ovn metric 1000]) + # Before cleanup of hv1 ovn-controller, trigger a recompute # to cleanup the local datapaths. Otherwise, the test will fail. # This is because we don't remove a datapath from @@ -16904,6 +16927,29 @@ OVN_ROUTE_V6_EQUAL([ovnvrf$vrf], [dnl blackhole 2001:db8:1001::150 dev lo proto ovn metric 1000 pref medium blackhole 2001:db8:3001::150 dev lo proto ovn metric 1000 pref medium]) +# Set LR/LRP.options.dynamic-routing-redistribute-local-only=true +# and verify that lower priority routes are not advertised anymore. +check ovn-nbctl --wait=hv set logical_router_port r1-join \ + options:dynamic-routing-redistribute-local-only=true + +OVN_ROUTE_V6_EQUAL([ovnvrf$vrf], [dnl +blackhole 2001:db8:1001::150 dev lo proto ovn metric 1000 pref medium]) + +check ovn-nbctl --wait=hv remove logical_router_port r1-join \ + options dynamic-routing-redistribute-local-only \ + -- set logical_router R1 \ + options:dynamic-routing-redistribute-local-only=true + +OVN_ROUTE_V6_EQUAL([ovnvrf$vrf], [dnl +blackhole 2001:db8:1001::150 dev lo proto ovn metric 1000 pref medium]) + +check ovn-nbctl --wait=hv remove logical_router R1 \ + options dynamic-routing-redistribute-local-only + +OVN_ROUTE_V6_EQUAL([ovnvrf$vrf], [dnl +blackhole 2001:db8:1001::150 dev lo proto ovn metric 1000 pref medium +blackhole 2001:db8:3001::150 dev lo proto ovn metric 1000 pref medium]) + # Before cleanup of hv1 ovn-controller, trigger a recompute # to cleanup the local datapaths. Otherwise, the test will fail. # This is because we don't remove a datapath from @@ -17069,6 +17115,35 @@ blackhole 10.42.10.13 proto ovn metric 1000 blackhole 172.16.1.10 proto ovn metric 1000 blackhole 172.16.1.11 proto ovn metric 1000]) +# Set LR/LRP.options.dynamic-routing-redistribute-local-only=true +# and verify that lower priority routes are not advertised anymore. +check ovn-nbctl --wait=hv set logical_router_port r1-join \ + options:dynamic-routing-redistribute-local-only=true + +OVN_ROUTE_EQUAL([ovnvrf$vrf], [dnl +blackhole 172.16.1.10 proto ovn metric 1000 +blackhole 172.16.1.11 proto ovn metric 1000]) + +check ovn-nbctl --wait=hv remove logical_router_port r1-join \ + options dynamic-routing-redistribute-local-only \ + -- set logical_router R1 \ + options:dynamic-routing-redistribute-local-only=true + +OVN_ROUTE_EQUAL([ovnvrf$vrf], [dnl +blackhole 172.16.1.10 proto ovn metric 1000 +blackhole 172.16.1.11 proto ovn metric 1000]) + +check ovn-nbctl --wait=hv remove logical_router R1 \ + options dynamic-routing-redistribute-local-only + +OVN_ROUTE_EQUAL([ovnvrf$vrf], [dnl +blackhole 10.42.10.10 proto ovn metric 1000 +blackhole 10.42.10.11 proto ovn metric 1000 +blackhole 10.42.10.12 proto ovn metric 1000 +blackhole 10.42.10.13 proto ovn metric 1000 +blackhole 172.16.1.10 proto ovn metric 1000 +blackhole 172.16.1.11 proto ovn metric 1000]) + # Add "guest" LS connected the distributed router R2 and one "VM" called # guest1. # Also, connect R2 to ls-join via another DGW. @@ -17283,6 +17358,35 @@ blackhole 2001:db8:1004::151 dev lo proto ovn metric 1000 pref medium blackhole 2001:db8:1004::152 dev lo proto ovn metric 1000 pref medium blackhole 2001:db8:1004::153 dev lo proto ovn metric 1000 pref medium]) +# Set LR/LRP.options.dynamic-routing-redistribute-local-only=true +# and verify that lower priority routes are not advertised anymore. +check ovn-nbctl --wait=hv set logical_router_port r1-join \ + options:dynamic-routing-redistribute-local-only=true + +OVN_ROUTE_V6_EQUAL([ovnvrf$vrf], [dnl +blackhole 2001:db8:1003::150 dev lo proto ovn metric 1000 pref medium +blackhole 2001:db8:1003::151 dev lo proto ovn metric 1000 pref medium]) + +check ovn-nbctl --wait=hv remove logical_router_port r1-join \ + options dynamic-routing-redistribute-local-only \ + -- set logical_router R1 \ + options:dynamic-routing-redistribute-local-only=true + +OVN_ROUTE_V6_EQUAL([ovnvrf$vrf], [dnl +blackhole 2001:db8:1003::150 dev lo proto ovn metric 1000 pref medium +blackhole 2001:db8:1003::151 dev lo proto ovn metric 1000 pref medium]) + +check ovn-nbctl --wait=hv remove logical_router R1 \ + options dynamic-routing-redistribute-local-only + +OVN_ROUTE_V6_EQUAL([ovnvrf$vrf], [dnl +blackhole 2001:db8:1003::150 dev lo proto ovn metric 1000 pref medium +blackhole 2001:db8:1003::151 dev lo proto ovn metric 1000 pref medium +blackhole 2001:db8:1004::150 dev lo proto ovn metric 1000 pref medium +blackhole 2001:db8:1004::151 dev lo proto ovn metric 1000 pref medium +blackhole 2001:db8:1004::152 dev lo proto ovn metric 1000 pref medium +blackhole 2001:db8:1004::153 dev lo proto ovn metric 1000 pref medium]) + # Add "guest" LS connected the distributed router R2 and one "VM" called # guest1. # Also, connect R2 to ls-join via another DGW. -- 2.50.1 _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev