If a logical switch or logical router is added and removed very quickly,
then it is possible for the IDL to place it in both the "new" and
"deleted" tracked maps.

This was not being accounted for in the logical switch and logical
router handlers.

For logical routers, this was not a huge issue. We would detect the
logical router as deleted, then not be able to find the in-memory
unsynced datapath. This would result in a recompute. Adding the check
for this specific situation allows us to avoid an unnecessary recompute.

For logical switches, on the other hand, this was a much worse
situation. The code first checks if the logical switch is new. In this
case, the added and deleted logical switch matched this check, so we
treated it as a new logical switch. This "new" logical switch would then
get propagated to all other engine nodes. If this logical switch
happened to reference other database rows that also had been deleted,
this could cause chaos. One place in particular where this was observed
was in the en_lb_data node. If the logical switch referenced a load
balancer group that was deleted, then we would try to look up the load
balancer group in our in-memory map of load balancer groups. However,
since the actual database row had been deleted, it no longer existed in
our in-memory map. This resulted in an assertion being triggered and
crashing ovn-northd.

Added and deleted logical switches could also trigger a crash in the
datapath syncing code as well. Since the logical switch was deleted, its
memory could be freed by the IDL. This could cause a crash at any point
in its processing.

The fix here is to detect the situation where the IDL reports the
logical router or logical switch as both new and deleted. In this case,
we treat it as a no-op and continue. This avoids both the recompute for
logical routers, and the crashes with logical switches.

Reported-at: https://issues.redhat.com/browse/FDP-2780
Reported-at: https://issues.redhat.com/browse/FDP-2805
Signed-off-by: Mark Michelson <[email protected]>
---
 northd/en-datapath-logical-router.c |  7 +++++++
 northd/en-datapath-logical-switch.c |  7 +++++++
 tests/ovn-northd.at                 | 31 +++++++++++++++++++++++++++++
 3 files changed, 45 insertions(+)

diff --git a/northd/en-datapath-logical-router.c 
b/northd/en-datapath-logical-router.c
index 4625edfd0..56785ebfb 100644
--- a/northd/en-datapath-logical-router.c
+++ b/northd/en-datapath-logical-router.c
@@ -164,6 +164,13 @@ en_datapath_logical_router_logical_router_handler(struct 
engine_node *node,
     struct ovn_unsynced_datapath *udp;
     const struct nbrec_logical_router *nbr;
     NBREC_LOGICAL_ROUTER_TABLE_FOR_EACH_TRACKED (nbr, nb_lr_table) {
+        /* If the logical router is added and removed within the same
+         * transaction, then this is a no-op
+         */
+        if (nbrec_logical_router_is_new(nbr) &&
+            nbrec_logical_router_is_deleted(nbr)) {
+            continue;
+        }
         udp = ovn_unsynced_datapath_find(map, &nbr->header_.uuid);
 
         if (nbrec_logical_router_is_deleted(nbr) && !udp) {
diff --git a/northd/en-datapath-logical-switch.c 
b/northd/en-datapath-logical-switch.c
index 0527239ce..12afbb45a 100644
--- a/northd/en-datapath-logical-switch.c
+++ b/northd/en-datapath-logical-switch.c
@@ -136,6 +136,13 @@ datapath_logical_switch_handler(struct engine_node *node, 
void *data)
 
     const struct nbrec_logical_switch *nbs;
     NBREC_LOGICAL_SWITCH_TABLE_FOR_EACH_TRACKED (nbs, nb_ls_table) {
+        /* If the logical switch is added and removed within the same
+         * transaction, then this is a no-op
+         */
+        if (nbrec_logical_switch_is_new(nbs) &&
+            nbrec_logical_switch_is_deleted(nbs)) {
+            continue;
+        }
         struct ovn_unsynced_datapath *udp =
             ovn_unsynced_datapath_find(map, &nbs->header_.uuid);
 
diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
index b5674a464..44524a57a 100644
--- a/tests/ovn-northd.at
+++ b/tests/ovn-northd.at
@@ -18069,3 +18069,34 @@ wait_row_count Igmp_Group 0 address=mrouters
 OVN_CLEANUP_NORTHD
 AT_CLEANUP
 ])
+
+OVN_FOR_EACH_NORTHD_NO_HV([
+AT_SETUP([Added and Deleted Logical Switch])
+ovn_start
+
+# For this test, we will pause northd. While northd is paused, we will
+# add a logical switch, a load balancer, and a load balancer group. The load
+# balancer will be in the load balancer group, and the logical switch will use
+# the load balancer group. We will then delete the logical switch, the load
+# balancer, and the load balancer group. We then will unpause ovn-northd.
+#
+# The test only fails if putting northd to sleep or waking it up fails due to
+# ovn-northd crashing.
+#
+# We run this a few times in a loop just to be certain we aren't causing any
+# catastrophic issues.
+for (( i=0; i<5; i++ )) ; do
+    sleep_northd
+    check ovn-nbctl ls-add ls1
+    check ovn-nbctl lb-add lb1 192.168.0.1 10.0.0.1
+    lb_uuid=$(ovn-nbctl --bare --columns=_uuid find load_balancer name=lb1)
+    lbg_uuid=$(ovn-nbctl create load_balancer_group name=lbg1 
load_balancer=$lb_uuid)
+    check ovn-nbctl set logical_switch ls1 load_balancer_group=$lbg_uuid
+    check ovn-nbctl ls-del ls1
+    check ovn-nbctl lb-del lb1
+    check ovn-nbctl --all destroy load_balancer_group
+    wake_up_northd
+done
+OVN_CLEANUP_NORTHD
+AT_CLEANUP
+])
-- 
2.51.1

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to