Hello Lucas, thanks for the contribution.

I think this idea makes a lot of sense. The gist of it is:
* Keep lflows in memory between engine runs and only destroy lflows if they are not going to be synced to the SB DB. * Use in-memory lflows to find corresponding SB logical_flows by UUID instead of the other way around.

For the most part, I think this works. However, there are some issues with the patch as it is. I have comments in-line to address these.

On 9/3/25 8:49 AM, Lucas Vargas Dias via dev wrote:
Change the logic to save sbflow uuid and just update if
the lflow is reused. Otherwise, it's removed.
Also, reduce sbflow searching with uuidset instead of
searching through all lflow table.
It generates the following results in a scenario with:
- LSPs: 56548
- LRPs: 27304
- LRs: 7922
- LSs: 28602

Without the commit:
ovn-nbctl --no-leader-only --print-wait-time --wait=sb lr-add lr9-1
Time spent on processing nb_cfg 275438:
        ovn-northd delay before processing:     16069ms
        ovn-northd completion:                  32828ms
ovn-nbctl --no-leader-only --print-wait-time --wait=sb ls-add bar9-1
Time spent on processing nb_cfg 275439:
        ovn-northd delay before processing:     15019ms
        ovn-northd completion:                  33207ms

ovn-nbctl --no-leader-only --print-wait-time --wait=sb lrp-add lr9-1 rp9-1-bar 
00:01:ff:22:00:01 192.168.10.1/24 2801:80:3eaf:4401::1/64
Time spent on processing nb_cfg 275440:
        ovn-northd delay before processing:     14784ms
        ovn-northd completion:                  33577ms

ovn-nbctl --no-leader-only --print-wait-time --wait=sb lsp-add bar9-1 bar-rp9-1 
-- set Logical_Switch_Port bar-rp9-1 type=router options:router-port=rp9-1-bar 
-- lsp-set-addresses bar-rp9-1 router
Time spent on processing nb_cfg 275441:
        ovn-northd delay before processing:     14598ms
        ovn-northd completion:                  31942ms

With the commit:
ovn-nbctl --no-leader-only --print-wait-time --wait=sb lr-add lr9-1
Time spent on processing nb_cfg 275401:
        ovn-northd delay before processing:     12602ms
        ovn-northd completion:                  26103ms
ovn-nbctl --no-leader-only --print-wait-time --wait=sb ls-add bar9-1
Time spent on processing nb_cfg 275402:
        ovn-northd delay before processing:     12639ms
         ovn-northd completion:                 26759ms
ovn-nbctl --no-leader-only --print-wait-time --wait=sb lrp-add lr9-1 rp9-1-bar 
00:01:ff:22:00:01 192.168.10.1/24 2801:80:3eaf:4401::1/64
Time spent on processing nb_cfg 275403:
        ovn-northd delay before processing:     11874ms
        ovn-northd completion:                  29733ms
ovn-nbctl --no-leader-only --print-wait-time --wait=sb lsp-add bar9-1 bar-rp9-1 
-- set Logical_Switch_Port bar-rp9-1 type=router options:router-port=rp9-1-bar 
-- lsp-set-addresses bar-rp9-1 router
Time spent on processing nb_cfg 275404:
        ovn-northd delay before processing:     4058ms
        ovn-northd completion:                  17323ms

Signed-off-by: Lucas Vargas Dias <[email protected]>
---
  northd/en-lflow.c  |  2 +-
  northd/lflow-mgr.c | 91 ++++++++++++++++++++++++++--------------------
  northd/lflow-mgr.h |  2 +-
  3 files changed, 54 insertions(+), 41 deletions(-)

diff --git a/northd/en-lflow.c b/northd/en-lflow.c
index f903f5e3a..4309259f6 100644
--- a/northd/en-lflow.c
+++ b/northd/en-lflow.c
@@ -122,7 +122,7 @@ en_lflow_run(struct engine_node *node, void *data)
      stopwatch_start(BUILD_LFLOWS_STOPWATCH_NAME, time_msec());
struct lflow_data *lflow_data = data;
-    lflow_table_clear(lflow_data->lflow_table);
+    lflow_table_clear(lflow_data->lflow_table, false);
      lflow_reset_northd_refs(&lflow_input);
      lflow_ref_clear(lflow_input.igmp_lflow_ref);
diff --git a/northd/lflow-mgr.c b/northd/lflow-mgr.c
index 31476286a..f772afc87 100644
--- a/northd/lflow-mgr.c
+++ b/northd/lflow-mgr.c
@@ -26,6 +26,7 @@
  #include "lflow-mgr.h"
  #include "lib/ovn-parallel-hmap.h"
  #include "lib/ovn-util.h"
+#include "lib/uuidset.h"
VLOG_DEFINE_THIS_MODULE(lflow_mgr); @@ -37,7 +38,8 @@ static void ovn_lflow_init(struct ovn_lflow *, struct ovn_datapath *od,
                             uint16_t priority, char *match,
                             char *actions, char *io_port,
                             char *ctrl_meter, char *stage_hint,
-                           const char *where, const char *flow_desc);
+                           const char *where, const char *flow_desc,
+                           struct uuid sbuuid);
  static struct ovn_lflow *ovn_lflow_find(const struct hmap *lflows,
                                          enum ovn_stage stage,
                                          uint16_t priority, const char *match,
@@ -181,6 +183,8 @@ struct ovn_lflow {
      struct hmap dp_refcnts_map; /* Maintains the number of times this 
ovn_lflow
                                   * is referenced by a given datapath.
                                   * Contains 'struct dp_refcnt' in the map. */
+    bool is_old;
+    bool is_to_install;

Let's talk through these two booleans. The "is_old" boolean is true for lflows that have been synced to the SB DB. If it is false, then the lflow only exists in memory.

The "is_to_install" boolean is true for lflows that need to be synced to the SB DB. If it is false, then the flow should eventually be deleted since it does not need to be synced to the SB DB.

As such, we end up with the following truth table for these:
is_old == false, is_to_install == false: The lflow has been created in memory and is not synced with the SB DB. It should be deleted since there is no need to sync it with the SB DB. In theory, this combination should never happen.

is_old == false, is_to_install == true: The lflow has been created in memory and is not synced with the SB DB. We need to sync this with the SB DB.

is_old == true, is_to_install == false: The lflow exists in the SB DB, but it should be deleted since it is no longer relevant.

is_old == true, is_to_install == true: The lflow is synced to the SB DB.

Instead of using two booleans, I think this should be an enum that has three possible values.

LFLOW_STALE: The lflow is in the SB DB but is no longer relevant. It needs to be deleted from the SB DB and from the in-memory lflow table. (equivalent to is_old == true, is_to_install == false). This is the initial state of all lflows when beginning a run. LFLOW_TO_SYNC: The lflow is updated in memory and needs to be synced with the SB DB (equivalent to is_old == false, is_to_install == true). This is what lflows get set to when they are inserted into the lflow table. LFLOW_SYNCED: The lflow has been synced with the SB DB fully (equivalent to is_old == true, is_to_install == true). This is what lflows get set to after being synced.

The advantages of the enum are:
* The state names are more explicit than trying to combine two booleans to understand the lflow sync state.
* It prevents the "impossible" combination (both false) from occurring.

When a run starts, all lflows in the table will be set to LFLOW_STALE. As lflows are added, the LFLOW_STALE flows will be replaced with flows in the LFLOW_TO_SYNC state. When it is time to sync with the SB DB, all LFLOW_TO_SYNC flows will be synced and have their state changed to LFLOW_SYNCED. All flows in the LFLOW_STALE state will be deleted.

  };
/* Logical flow table. */
@@ -210,10 +214,14 @@ lflow_table_init(struct lflow_table *lflow_table)
  }
void
-lflow_table_clear(struct lflow_table *lflow_table)
+lflow_table_clear(struct lflow_table *lflow_table, bool destroy_all)
  {
      struct ovn_lflow *lflow;
      HMAP_FOR_EACH_SAFE (lflow, hmap_node, &lflow_table->entries) {
+        if (!destroy_all && lflow->is_old) {
+            lflow->is_to_install = false;

Using the enum suggestion, we would set lflow->sync_state = LFLOW_STALE here.

Also, I don't think we need to check the old lflow state here since in theory, after an lflow run, all lflows in the lflow_table should be in the LFLOW_SYNCED state. If you wanted, you could change the if check to be ovs_assert(lflow->sync_state == LFLOW_SYNCED). But even if, for some reason, the lflow were not in the LFLOW_SYNCED state, we would still want to set the state to LFLOW_STALE here.

+            continue;
+        }
          ovn_lflow_destroy(lflow_table, lflow);
      }
@@ -224,7 +232,7 @@ lflow_table_clear(struct lflow_table *lflow_table)
  void
  lflow_table_destroy(struct lflow_table *lflow_table)
  {
-    lflow_table_clear(lflow_table);
+    lflow_table_clear(lflow_table, true);
      hmap_destroy(&lflow_table->entries);
      ovn_dp_groups_destroy(&lflow_table->ls_dp_groups);
      ovn_dp_groups_destroy(&lflow_table->lr_dp_groups);
@@ -257,16 +265,43 @@ lflow_table_sync_to_sb(struct lflow_table *lflow_table,
                         const struct sbrec_logical_flow_table *sb_flow_table,
                         const struct sbrec_logical_dp_group_table *dpgrp_table)
  {
+    struct uuidset sb_uuid_set = UUIDSET_INITIALIZER(&sb_uuid_set);
      struct hmap lflows_temp = HMAP_INITIALIZER(&lflows_temp);
      struct hmap *lflows = &lflow_table->entries;
      struct ovn_lflow *lflow;
+    const struct sbrec_logical_flow *sbflow;
fast_hmap_size_for(&lflows_temp,
                         lflow_table->max_seen_lflow_size);
+ HMAP_FOR_EACH_SAFE (lflow, hmap_node, lflows) {
+        if (!lflow->is_to_install) {

Using the enum suggestion, we would check lflow->sync_state == LFLOW_STALE here.

+            ovn_lflow_destroy(lflow_table, lflow);
+            continue;
+        }
+        sbflow = NULL;
+        if (!uuid_is_zero(&lflow->sb_uuid)) {
+            sbflow = sbrec_logical_flow_table_get_for_uuid(sb_flow_table,
+                                                           &lflow->sb_uuid);
+        }
+        sync_lflow_to_sb(lflow, ovnsb_txn, lflow_table, ls_datapaths,
+                         lr_datapaths, ovn_internal_version_changed,
+                         sbflow, dpgrp_table);
+        lflow->is_old = true;

Using the enum suggestion, we would set lflow->sync_state = LFLOW_SYNCED here.

Since this is required after all calls to sync_lflow_to_sb(), you can move this assignment to the end of sync_lflow_to_sb() instead.

+        uuidset_insert(&sb_uuid_set, &lflow->sb_uuid);
+        hmap_remove(lflows, &lflow->hmap_node);
+        hmap_insert(&lflows_temp, &lflow->hmap_node,
+                    hmap_node_hash(&lflow->hmap_node));
+    }
      /* Push changes to the Logical_Flow table to database. */
-    const struct sbrec_logical_flow *sbflow;
      SBREC_LOGICAL_FLOW_TABLE_FOR_EACH_SAFE (sbflow, sb_flow_table) {
+        struct uuidset_node *node = uuidset_find(&sb_uuid_set,
+                                                 &sbflow->header_.uuid);
+        if (!node) {
+            sbrec_logical_flow_delete(sbflow);
+            continue;
+        }
+        uuidset_delete(&sb_uuid_set, node);
          struct sbrec_logical_dp_group *dp_group = sbflow->logical_dp_group;
          struct ovn_datapath *logical_datapath_od = NULL;
          size_t i;
@@ -297,38 +332,8 @@ lflow_table_sync_to_sb(struct lflow_table *lflow_table,
              sbrec_logical_flow_delete(sbflow);
              continue;
          }
-
-        enum ovn_pipeline pipeline
-            = !strcmp(sbflow->pipeline, "ingress") ? P_IN : P_OUT;
-
-        lflow = ovn_lflow_find(
-            lflows,
-            ovn_stage_build(ovn_datapath_get_type(logical_datapath_od),
-                            pipeline, sbflow->table_id),
-            sbflow->priority, sbflow->match, sbflow->actions,
-            sbflow->controller_meter, sbflow->hash);
-        if (lflow) {
-            sync_lflow_to_sb(lflow, ovnsb_txn, lflow_table, ls_datapaths,
-                             lr_datapaths, ovn_internal_version_changed,
-                             sbflow, dpgrp_table);
-
-            hmap_remove(lflows, &lflow->hmap_node);
-            hmap_insert(&lflows_temp, &lflow->hmap_node,
-                        hmap_node_hash(&lflow->hmap_node));
-        } else {
-            sbrec_logical_flow_delete(sbflow);
-        }
-    }
-
-    HMAP_FOR_EACH_SAFE (lflow, hmap_node, lflows) {
-        sync_lflow_to_sb(lflow, ovnsb_txn, lflow_table, ls_datapaths,
-                         lr_datapaths, ovn_internal_version_changed,
-                         NULL, dpgrp_table);
-
-        hmap_remove(lflows, &lflow->hmap_node);
-        hmap_insert(&lflows_temp, &lflow->hmap_node,
-                    hmap_node_hash(&lflow->hmap_node));
      }
+    uuidset_destroy(&sb_uuid_set);
      hmap_swap(lflows, &lflows_temp);
      hmap_destroy(&lflows_temp);
  }
@@ -847,7 +852,7 @@ ovn_lflow_init(struct ovn_lflow *lflow, struct ovn_datapath 
*od,
                 size_t dp_bitmap_len, enum ovn_stage stage, uint16_t priority,
                 char *match, char *actions, char *io_port, char *ctrl_meter,
                 char *stage_hint, const char *where,
-               const char *flow_desc)
+               const char *flow_desc, struct uuid sbuuid)
  {
      lflow->dpg_bitmap = bitmap_allocate(dp_bitmap_len);
      lflow->od = od;
@@ -861,7 +866,9 @@ ovn_lflow_init(struct ovn_lflow *lflow, struct ovn_datapath 
*od,
      lflow->flow_desc = flow_desc;
      lflow->dpg = NULL;
      lflow->where = where;
-    lflow->sb_uuid = UUID_ZERO;
+    lflow->sb_uuid = sbuuid;
+    lflow->is_old = false;
+    lflow->is_to_install = true;

Using the enum suggestion, we would set lflow->sync_state = LFLOW_TO_SYNC here.

      hmap_init(&lflow->dp_refcnts_map);
      ovs_list_init(&lflow->referenced_by);
  }
@@ -957,13 +964,18 @@ do_ovn_lflow_add(struct lflow_table *lflow_table, size_t 
dp_bitmap_len,
  {
      struct ovn_lflow *old_lflow;
      struct ovn_lflow *lflow;
+    struct uuid sbuuid = UUID_ZERO;
ovs_assert(dp_bitmap_len); old_lflow = ovn_lflow_find(&lflow_table->entries, stage,
                                 priority, match, actions, ctrl_meter, hash);
      if (old_lflow) {
-        return old_lflow;
+        if (old_lflow->is_to_install) {

If using the enum suggestion, then we'd check for old_lflow->sync_state == LFLOW_TO_SYNC

+            return old_lflow;
+        }
+        sbuuid = old_lflow->sb_uuid;
+        ovn_lflow_destroy(lflow_table, old_lflow);
      }

I wasn't sure where to put this, but there is a flaw with the logic here. When running ovn-northd with parallelization enabled, we keep track of the number of lflows added to the lflow_table with a thread-local variable called thread_lflow_counter. When the parallelized portion completes, the lflow_table hmap's size is set to the sum of the thread_lflow_counters on each thread. The reason for this is because the hmap count can potentially get messed up when multiple threads are adding to the hmap simultaneously. So we use this as a means to set the size correctly.

This logic works because the lflow_table is empty at the beginning of the parallelized run, and every hmap_insert_fast() results in the size of the hmap increasing by 1. Therefore, if we count the number of inserts in each thread, we can add those all up and get the correct hmap size after. You can see this being done in northd/northd.c in the fix_flow_table_size() function.

With this new change, the lflow_table is populated with stale lflows at the beginning of the parallelized run. However, the thread_lflow_counter in each thread starts counting from 0. This means that we only increase the thread_lflow_counter each time we replace a stale lflow or add a new lflow to the lflow_table. Any leftover stale lflows in the table after a run will be in the lflow_table but will not be accounted for by the thread_lflow_counter. This means there is a possibility that after a parallelized run, we could set the size of the lflow_table smaller than it should be since we are not accounting for the stale lflows that started off in the table. This could lead to obscure bugs.

Following this algorithm should fix the problem:
* In the main thread, before starting a parallelized run, store the hmap_count() of the lflow_table. * Start the parallelized run. Each thread initializes their thread_lflow_counter to 0. * Each thread should only increase their thread_lflow_counter in the case where old_lflow is not found in the lflow_table. If old_lflow is found and replaced, thread_lflow_counter remains the same. * After parallelization is complete, fix_flow_table_size() should sum all of the thread_lflow_counters from each thread and then add that to the count that was saved in the first step.

This should work since the parallelized run will only ever add new lflows or replace stale lflows.

lflow = xzalloc(sizeof *lflow);
@@ -975,7 +987,7 @@ do_ovn_lflow_add(struct lflow_table *lflow_table, size_t 
dp_bitmap_len,
                     io_port ? xstrdup(io_port) : NULL,
                     nullable_xstrdup(ctrl_meter),
                     ovn_lflow_hint(stage_hint), where,
-                   flow_desc);
+                   flow_desc, sbuuid);
if (parallelization_state != STATE_USE_PARALLELIZATION) {
          hmap_insert(&lflow_table->entries, &lflow->hmap_node, hash);
@@ -1294,6 +1306,7 @@ lflow_ref_sync_lflows__(struct lflow_ref  *lflow_ref,
                                    sblflow, dpgrp_table)) {
                  return false;
              }
+            lflow->is_old = true;

If using the enum suggestion, we would set lflow->sync_state = LFLOW_SYNCED here. See my earlier note about moving this assignment to the end of sync_lflow_to_sb().

          }
if (!lrn->linked) {
diff --git a/northd/lflow-mgr.h b/northd/lflow-mgr.h
index 1521270d6..c25b0185e 100644
--- a/northd/lflow-mgr.h
+++ b/northd/lflow-mgr.h
@@ -29,7 +29,7 @@ struct ovsdb_idl_row;
  struct lflow_table;
  struct lflow_table *lflow_table_alloc(void);
  void lflow_table_init(struct lflow_table *);
-void lflow_table_clear(struct lflow_table *);
+void lflow_table_clear(struct lflow_table *, bool);
  void lflow_table_destroy(struct lflow_table *);
  void lflow_table_expand(struct lflow_table *);
  void lflow_table_set_size(struct lflow_table *, size_t);

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to