Thanks for the review, Dumitru. I addressed the nits and pushed the change to main and branch-25.09.
On Thu, Dec 4, 2025 at 5:56 AM Dumitru Ceara <[email protected]> wrote: > > On 12/3/25 10:59 PM, Mark Michelson via dev wrote: > > If a logical switch or logical router is added and removed very quickly, > > then it is possible for the IDL to place it in both the "new" and > > "deleted" tracked maps. > > > > This was not being accounted for in the logical switch and logical > > router handlers. > > > > For logical routers, this was not a huge issue. We would detect the > > logical router as deleted, then not be able to find the in-memory > > unsynced datapath. This would result in a recompute. Adding the check > > for this specific situation allows us to avoid an unnecessary recompute. > > > > For logical switches, on the other hand, this was a much worse > > situation. The code first checks if the logical switch is new. In this > > case, the added and deleted logical switch matched this check, so we > > treated it as a new logical switch. This "new" logical switch would then > > get propagated to all other engine nodes. If this logical switch > > happened to reference other database rows that also had been deleted, > > this could cause chaos. One place in particular where this was observed > > was in the en_lb_data node. If the logical switch referenced a load > > balancer group that was deleted, then we would try to look up the load > > balancer group in our in-memory map of load balancer groups. However, > > since the actual database row had been deleted, it no longer existed in > > our in-memory map. This resulted in an assertion being triggered and > > crashing ovn-northd. > > > > Added and deleted logical switches could also trigger a crash in the > > datapath syncing code as well. Since the logical switch was deleted, its > > memory could be freed by the IDL. This could cause a crash at any point > > in its processing. > > > > The fix here is to detect the situation where the IDL reports the > > logical router or logical switch as both new and deleted. In this case, > > we treat it as a no-op and continue. This avoids both the recompute for > > logical routers, and the crashes with logical switches. > > > > Reported-at: https://issues.redhat.com/browse/FDP-2780 > > Reported-at: https://issues.redhat.com/browse/FDP-2805 > > Signed-off-by: Mark Michelson <[email protected]> > > --- > > Hi Mark, > > Thanks for the fix! > > Nit: missing fixes tag. > > Fixes: 50a483086d80 ("northd: Add IP for new logical switches in > en-datapath-logical-switch node.") > > > northd/en-datapath-logical-router.c | 7 +++++++ > > northd/en-datapath-logical-switch.c | 7 +++++++ > > tests/ovn-northd.at | 31 +++++++++++++++++++++++++++++ > > 3 files changed, 45 insertions(+) > > > > diff --git a/northd/en-datapath-logical-router.c > > b/northd/en-datapath-logical-router.c > > index 4625edfd0..56785ebfb 100644 > > --- a/northd/en-datapath-logical-router.c > > +++ b/northd/en-datapath-logical-router.c > > @@ -164,6 +164,13 @@ > > en_datapath_logical_router_logical_router_handler(struct engine_node *node, > > struct ovn_unsynced_datapath *udp; > > const struct nbrec_logical_router *nbr; > > NBREC_LOGICAL_ROUTER_TABLE_FOR_EACH_TRACKED (nbr, nb_lr_table) { > > + /* If the logical router is added and removed within the same > > + * transaction, then this is a no-op > > + */ > > + if (nbrec_logical_router_is_new(nbr) && > > + nbrec_logical_router_is_deleted(nbr)) { > > + continue; > > + } > > udp = ovn_unsynced_datapath_find(map, &nbr->header_.uuid); > > > > if (nbrec_logical_router_is_deleted(nbr) && !udp) { > > diff --git a/northd/en-datapath-logical-switch.c > > b/northd/en-datapath-logical-switch.c > > index 0527239ce..12afbb45a 100644 > > --- a/northd/en-datapath-logical-switch.c > > +++ b/northd/en-datapath-logical-switch.c > > @@ -136,6 +136,13 @@ datapath_logical_switch_handler(struct engine_node > > *node, void *data) > > > > const struct nbrec_logical_switch *nbs; > > NBREC_LOGICAL_SWITCH_TABLE_FOR_EACH_TRACKED (nbs, nb_ls_table) { > > + /* If the logical switch is added and removed within the same > > + * transaction, then this is a no-op > > + */ > > + if (nbrec_logical_switch_is_new(nbs) && > > + nbrec_logical_switch_is_deleted(nbs)) { > > + continue; > > + } > > struct ovn_unsynced_datapath *udp = > > ovn_unsynced_datapath_find(map, &nbs->header_.uuid); > > > > diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at > > index b5674a464..44524a57a 100644 > > --- a/tests/ovn-northd.at > > +++ b/tests/ovn-northd.at > > @@ -18069,3 +18069,34 @@ wait_row_count Igmp_Group 0 address=mrouters > > OVN_CLEANUP_NORTHD > > AT_CLEANUP > > ]) > > + > > +OVN_FOR_EACH_NORTHD_NO_HV([ > > +AT_SETUP([Added and Deleted Logical Switch]) > > +ovn_start > > + > > +# For this test, we will pause northd. While northd is paused, we will > > +# add a logical switch, a load balancer, and a load balancer group. The > > load > > +# balancer will be in the load balancer group, and the logical switch will > > use > > +# the load balancer group. We will then delete the logical switch, the load > > +# balancer, and the load balancer group. We then will unpause ovn-northd. > > +# > > +# The test only fails if putting northd to sleep or waking it up fails due > > to > > +# ovn-northd crashing. > > +# > > +# We run this a few times in a loop just to be certain we aren't causing > > any > > +# catastrophic issues. > > +for (( i=0; i<5; i++ )) ; do > > Nit: I'd write this as: > > for i in $(seq 5); do > > > + sleep_northd > > + check ovn-nbctl ls-add ls1 > > + check ovn-nbctl lb-add lb1 192.168.0.1 10.0.0.1 > > + lb_uuid=$(ovn-nbctl --bare --columns=_uuid find load_balancer name=lb1) > > Nit: I'd use fetch_column: > > lb_uuid=$(fetch_column nb:Load_Balancer _uuid name=lb1 > > > + lbg_uuid=$(ovn-nbctl create load_balancer_group name=lbg1 > > load_balancer=$lb_uuid) > > + check ovn-nbctl set logical_switch ls1 load_balancer_group=$lbg_uuid > > + check ovn-nbctl ls-del ls1 > > + check ovn-nbctl lb-del lb1 > > + check ovn-nbctl --all destroy load_balancer_group > > + wake_up_northd > > +done > > +OVN_CLEANUP_NORTHD > > +AT_CLEANUP > > +]) > > With the tiny nits addressed: > > Acked-by: Dumitru Ceara <[email protected]> > > Regards, > Dumitru > > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
