If a logical switch or logical router is added and removed very quickly, then it is possible for the IDL to place it in both the "new" and "deleted" tracked maps.
This was not being accounted for in the logical switch and logical router handlers. For logical routers, this was not a huge issue. We would detect the logical router as deleted, then not be able to find the in-memory unsynced datapath. This would result in a recompute. Adding the check for this specific situation allows us to avoid an unnecessary recompute. For logical switches, on the other hand, this was a much worse situation. The code first checks if the logical switch is new. In this case, the added and deleted logical switch matched this check, so we treated it as a new logical switch. This "new" logical switch would then get propagated to all other engine nodes. If this logical switch happened to reference other database rows that also had been deleted, this could cause chaos. One place in particular where this was observed was in the en_lb_data node. If the logical switch referenced a load balancer group that was deleted, then we would try to look up the load balancer group in our in-memory map of load balancer groups. However, since the actual database row had been deleted, it no longer existed in our in-memory map. This resulted in an assertion being triggered and crashing ovn-northd. Added and deleted logical switches could also trigger a crash in the datapath syncing code as well. Since the logical switch was deleted, its memory could be freed by the IDL. This could cause a crash at any point in its processing. The fix here is to detect the situation where the IDL reports the logical router or logical switch as both new and deleted. In this case, we treat it as a no-op and continue. This avoids both the recompute for logical routers, and the crashes with logical switches. Reported-at: https://issues.redhat.com/browse/FDP-2780 Reported-at: https://issues.redhat.com/browse/FDP-2805 Signed-off-by: Mark Michelson <[email protected]> --- northd/en-datapath-logical-router.c | 7 +++++++ northd/en-datapath-logical-switch.c | 7 +++++++ tests/ovn-northd.at | 31 +++++++++++++++++++++++++++++ 3 files changed, 45 insertions(+) diff --git a/northd/en-datapath-logical-router.c b/northd/en-datapath-logical-router.c index 4625edfd0..56785ebfb 100644 --- a/northd/en-datapath-logical-router.c +++ b/northd/en-datapath-logical-router.c @@ -164,6 +164,13 @@ en_datapath_logical_router_logical_router_handler(struct engine_node *node, struct ovn_unsynced_datapath *udp; const struct nbrec_logical_router *nbr; NBREC_LOGICAL_ROUTER_TABLE_FOR_EACH_TRACKED (nbr, nb_lr_table) { + /* If the logical router is added and removed within the same + * transaction, then this is a no-op + */ + if (nbrec_logical_router_is_new(nbr) && + nbrec_logical_router_is_deleted(nbr)) { + continue; + } udp = ovn_unsynced_datapath_find(map, &nbr->header_.uuid); if (nbrec_logical_router_is_deleted(nbr) && !udp) { diff --git a/northd/en-datapath-logical-switch.c b/northd/en-datapath-logical-switch.c index 0527239ce..12afbb45a 100644 --- a/northd/en-datapath-logical-switch.c +++ b/northd/en-datapath-logical-switch.c @@ -136,6 +136,13 @@ datapath_logical_switch_handler(struct engine_node *node, void *data) const struct nbrec_logical_switch *nbs; NBREC_LOGICAL_SWITCH_TABLE_FOR_EACH_TRACKED (nbs, nb_ls_table) { + /* If the logical switch is added and removed within the same + * transaction, then this is a no-op + */ + if (nbrec_logical_switch_is_new(nbs) && + nbrec_logical_switch_is_deleted(nbs)) { + continue; + } struct ovn_unsynced_datapath *udp = ovn_unsynced_datapath_find(map, &nbs->header_.uuid); diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at index b5674a464..44524a57a 100644 --- a/tests/ovn-northd.at +++ b/tests/ovn-northd.at @@ -18069,3 +18069,34 @@ wait_row_count Igmp_Group 0 address=mrouters OVN_CLEANUP_NORTHD AT_CLEANUP ]) + +OVN_FOR_EACH_NORTHD_NO_HV([ +AT_SETUP([Added and Deleted Logical Switch]) +ovn_start + +# For this test, we will pause northd. While northd is paused, we will +# add a logical switch, a load balancer, and a load balancer group. The load +# balancer will be in the load balancer group, and the logical switch will use +# the load balancer group. We will then delete the logical switch, the load +# balancer, and the load balancer group. We then will unpause ovn-northd. +# +# The test only fails if putting northd to sleep or waking it up fails due to +# ovn-northd crashing. +# +# We run this a few times in a loop just to be certain we aren't causing any +# catastrophic issues. +for (( i=0; i<5; i++ )) ; do + sleep_northd + check ovn-nbctl ls-add ls1 + check ovn-nbctl lb-add lb1 192.168.0.1 10.0.0.1 + lb_uuid=$(ovn-nbctl --bare --columns=_uuid find load_balancer name=lb1) + lbg_uuid=$(ovn-nbctl create load_balancer_group name=lbg1 load_balancer=$lb_uuid) + check ovn-nbctl set logical_switch ls1 load_balancer_group=$lbg_uuid + check ovn-nbctl ls-del ls1 + check ovn-nbctl lb-del lb1 + check ovn-nbctl --all destroy load_balancer_group + wake_up_northd +done +OVN_CLEANUP_NORTHD +AT_CLEANUP +]) -- 2.51.1 _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
