Hello Lucas, thanks for the contribution.
I think this idea makes a lot of sense. The gist of it is:
* Keep lflows in memory between engine runs and only destroy lflows if
they are not going to be synced to the SB DB.
* Use in-memory lflows to find corresponding SB logical_flows by UUID
instead of the other way around.
For the most part, I think this works. However, there are some issues
with the patch as it is. I have comments in-line to address these.
On 9/3/25 8:49 AM, Lucas Vargas Dias via dev wrote:
Change the logic to save sbflow uuid and just update if
the lflow is reused. Otherwise, it's removed.
Also, reduce sbflow searching with uuidset instead of
searching through all lflow table.
It generates the following results in a scenario with:
- LSPs: 56548
- LRPs: 27304
- LRs: 7922
- LSs: 28602
Without the commit:
ovn-nbctl --no-leader-only --print-wait-time --wait=sb lr-add lr9-1
Time spent on processing nb_cfg 275438:
ovn-northd delay before processing: 16069ms
ovn-northd completion: 32828ms
ovn-nbctl --no-leader-only --print-wait-time --wait=sb ls-add bar9-1
Time spent on processing nb_cfg 275439:
ovn-northd delay before processing: 15019ms
ovn-northd completion: 33207ms
ovn-nbctl --no-leader-only --print-wait-time --wait=sb lrp-add lr9-1 rp9-1-bar
00:01:ff:22:00:01 192.168.10.1/24 2801:80:3eaf:4401::1/64
Time spent on processing nb_cfg 275440:
ovn-northd delay before processing: 14784ms
ovn-northd completion: 33577ms
ovn-nbctl --no-leader-only --print-wait-time --wait=sb lsp-add bar9-1 bar-rp9-1
-- set Logical_Switch_Port bar-rp9-1 type=router options:router-port=rp9-1-bar
-- lsp-set-addresses bar-rp9-1 router
Time spent on processing nb_cfg 275441:
ovn-northd delay before processing: 14598ms
ovn-northd completion: 31942ms
With the commit:
ovn-nbctl --no-leader-only --print-wait-time --wait=sb lr-add lr9-1
Time spent on processing nb_cfg 275401:
ovn-northd delay before processing: 12602ms
ovn-northd completion: 26103ms
ovn-nbctl --no-leader-only --print-wait-time --wait=sb ls-add bar9-1
Time spent on processing nb_cfg 275402:
ovn-northd delay before processing: 12639ms
ovn-northd completion: 26759ms
ovn-nbctl --no-leader-only --print-wait-time --wait=sb lrp-add lr9-1 rp9-1-bar
00:01:ff:22:00:01 192.168.10.1/24 2801:80:3eaf:4401::1/64
Time spent on processing nb_cfg 275403:
ovn-northd delay before processing: 11874ms
ovn-northd completion: 29733ms
ovn-nbctl --no-leader-only --print-wait-time --wait=sb lsp-add bar9-1 bar-rp9-1
-- set Logical_Switch_Port bar-rp9-1 type=router options:router-port=rp9-1-bar
-- lsp-set-addresses bar-rp9-1 router
Time spent on processing nb_cfg 275404:
ovn-northd delay before processing: 4058ms
ovn-northd completion: 17323ms
Signed-off-by: Lucas Vargas Dias <[email protected]>
---
northd/en-lflow.c | 2 +-
northd/lflow-mgr.c | 91 ++++++++++++++++++++++++++--------------------
northd/lflow-mgr.h | 2 +-
3 files changed, 54 insertions(+), 41 deletions(-)
diff --git a/northd/en-lflow.c b/northd/en-lflow.c
index f903f5e3a..4309259f6 100644
--- a/northd/en-lflow.c
+++ b/northd/en-lflow.c
@@ -122,7 +122,7 @@ en_lflow_run(struct engine_node *node, void *data)
stopwatch_start(BUILD_LFLOWS_STOPWATCH_NAME, time_msec());
struct lflow_data *lflow_data = data;
- lflow_table_clear(lflow_data->lflow_table);
+ lflow_table_clear(lflow_data->lflow_table, false);
lflow_reset_northd_refs(&lflow_input);
lflow_ref_clear(lflow_input.igmp_lflow_ref);
diff --git a/northd/lflow-mgr.c b/northd/lflow-mgr.c
index 31476286a..f772afc87 100644
--- a/northd/lflow-mgr.c
+++ b/northd/lflow-mgr.c
@@ -26,6 +26,7 @@
#include "lflow-mgr.h"
#include "lib/ovn-parallel-hmap.h"
#include "lib/ovn-util.h"
+#include "lib/uuidset.h"
VLOG_DEFINE_THIS_MODULE(lflow_mgr);
@@ -37,7 +38,8 @@ static void ovn_lflow_init(struct ovn_lflow *, struct ovn_datapath *od,
uint16_t priority, char *match,
char *actions, char *io_port,
char *ctrl_meter, char *stage_hint,
- const char *where, const char *flow_desc);
+ const char *where, const char *flow_desc,
+ struct uuid sbuuid);
static struct ovn_lflow *ovn_lflow_find(const struct hmap *lflows,
enum ovn_stage stage,
uint16_t priority, const char *match,
@@ -181,6 +183,8 @@ struct ovn_lflow {
struct hmap dp_refcnts_map; /* Maintains the number of times this
ovn_lflow
* is referenced by a given datapath.
* Contains 'struct dp_refcnt' in the map. */
+ bool is_old;
+ bool is_to_install;
Let's talk through these two booleans. The "is_old" boolean is true for
lflows that have been synced to the SB DB. If it is false, then the
lflow only exists in memory.
The "is_to_install" boolean is true for lflows that need to be synced to
the SB DB. If it is false, then the flow should eventually be deleted
since it does not need to be synced to the SB DB.
As such, we end up with the following truth table for these:
is_old == false, is_to_install == false: The lflow has been created in
memory and is not synced with the SB DB. It should be deleted since
there is no need to sync it with the SB DB. In theory, this combination
should never happen.
is_old == false, is_to_install == true: The lflow has been created in
memory and is not synced with the SB DB. We need to sync this with the
SB DB.
is_old == true, is_to_install == false: The lflow exists in the SB DB,
but it should be deleted since it is no longer relevant.
is_old == true, is_to_install == true: The lflow is synced to the SB DB.
Instead of using two booleans, I think this should be an enum that has
three possible values.
LFLOW_STALE: The lflow is in the SB DB but is no longer relevant. It
needs to be deleted from the SB DB and from the in-memory lflow table.
(equivalent to is_old == true, is_to_install == false). This is the
initial state of all lflows when beginning a run.
LFLOW_TO_SYNC: The lflow is updated in memory and needs to be synced
with the SB DB (equivalent to is_old == false, is_to_install == true).
This is what lflows get set to when they are inserted into the lflow table.
LFLOW_SYNCED: The lflow has been synced with the SB DB fully (equivalent
to is_old == true, is_to_install == true). This is what lflows get set
to after being synced.
The advantages of the enum are:
* The state names are more explicit than trying to combine two booleans
to understand the lflow sync state.
* It prevents the "impossible" combination (both false) from occurring.
When a run starts, all lflows in the table will be set to LFLOW_STALE.
As lflows are added, the LFLOW_STALE flows will be replaced with flows
in the LFLOW_TO_SYNC state. When it is time to sync with the SB DB, all
LFLOW_TO_SYNC flows will be synced and have their state changed to
LFLOW_SYNCED. All flows in the LFLOW_STALE state will be deleted.
};
/* Logical flow table. */
@@ -210,10 +214,14 @@ lflow_table_init(struct lflow_table *lflow_table)
}
void
-lflow_table_clear(struct lflow_table *lflow_table)
+lflow_table_clear(struct lflow_table *lflow_table, bool destroy_all)
{
struct ovn_lflow *lflow;
HMAP_FOR_EACH_SAFE (lflow, hmap_node, &lflow_table->entries) {
+ if (!destroy_all && lflow->is_old) {
+ lflow->is_to_install = false;
Using the enum suggestion, we would set lflow->sync_state = LFLOW_STALE
here.
Also, I don't think we need to check the old lflow state here since in
theory, after an lflow run, all lflows in the lflow_table should be in
the LFLOW_SYNCED state. If you wanted, you could change the if check to
be ovs_assert(lflow->sync_state == LFLOW_SYNCED). But even if, for some
reason, the lflow were not in the LFLOW_SYNCED state, we would still
want to set the state to LFLOW_STALE here.
+ continue;
+ }
ovn_lflow_destroy(lflow_table, lflow);
}
@@ -224,7 +232,7 @@ lflow_table_clear(struct lflow_table *lflow_table)
void
lflow_table_destroy(struct lflow_table *lflow_table)
{
- lflow_table_clear(lflow_table);
+ lflow_table_clear(lflow_table, true);
hmap_destroy(&lflow_table->entries);
ovn_dp_groups_destroy(&lflow_table->ls_dp_groups);
ovn_dp_groups_destroy(&lflow_table->lr_dp_groups);
@@ -257,16 +265,43 @@ lflow_table_sync_to_sb(struct lflow_table *lflow_table,
const struct sbrec_logical_flow_table *sb_flow_table,
const struct sbrec_logical_dp_group_table *dpgrp_table)
{
+ struct uuidset sb_uuid_set = UUIDSET_INITIALIZER(&sb_uuid_set);
struct hmap lflows_temp = HMAP_INITIALIZER(&lflows_temp);
struct hmap *lflows = &lflow_table->entries;
struct ovn_lflow *lflow;
+ const struct sbrec_logical_flow *sbflow;
fast_hmap_size_for(&lflows_temp,
lflow_table->max_seen_lflow_size);
+ HMAP_FOR_EACH_SAFE (lflow, hmap_node, lflows) {
+ if (!lflow->is_to_install) {
Using the enum suggestion, we would check lflow->sync_state ==
LFLOW_STALE here.
+ ovn_lflow_destroy(lflow_table, lflow);
+ continue;
+ }
+ sbflow = NULL;
+ if (!uuid_is_zero(&lflow->sb_uuid)) {
+ sbflow = sbrec_logical_flow_table_get_for_uuid(sb_flow_table,
+ &lflow->sb_uuid);
+ }
+ sync_lflow_to_sb(lflow, ovnsb_txn, lflow_table, ls_datapaths,
+ lr_datapaths, ovn_internal_version_changed,
+ sbflow, dpgrp_table);
+ lflow->is_old = true;
Using the enum suggestion, we would set lflow->sync_state = LFLOW_SYNCED
here.
Since this is required after all calls to sync_lflow_to_sb(), you can
move this assignment to the end of sync_lflow_to_sb() instead.
+ uuidset_insert(&sb_uuid_set, &lflow->sb_uuid);
+ hmap_remove(lflows, &lflow->hmap_node);
+ hmap_insert(&lflows_temp, &lflow->hmap_node,
+ hmap_node_hash(&lflow->hmap_node));
+ }
/* Push changes to the Logical_Flow table to database. */
- const struct sbrec_logical_flow *sbflow;
SBREC_LOGICAL_FLOW_TABLE_FOR_EACH_SAFE (sbflow, sb_flow_table) {
+ struct uuidset_node *node = uuidset_find(&sb_uuid_set,
+ &sbflow->header_.uuid);
+ if (!node) {
+ sbrec_logical_flow_delete(sbflow);
+ continue;
+ }
+ uuidset_delete(&sb_uuid_set, node);
struct sbrec_logical_dp_group *dp_group = sbflow->logical_dp_group;
struct ovn_datapath *logical_datapath_od = NULL;
size_t i;
@@ -297,38 +332,8 @@ lflow_table_sync_to_sb(struct lflow_table *lflow_table,
sbrec_logical_flow_delete(sbflow);
continue;
}
-
- enum ovn_pipeline pipeline
- = !strcmp(sbflow->pipeline, "ingress") ? P_IN : P_OUT;
-
- lflow = ovn_lflow_find(
- lflows,
- ovn_stage_build(ovn_datapath_get_type(logical_datapath_od),
- pipeline, sbflow->table_id),
- sbflow->priority, sbflow->match, sbflow->actions,
- sbflow->controller_meter, sbflow->hash);
- if (lflow) {
- sync_lflow_to_sb(lflow, ovnsb_txn, lflow_table, ls_datapaths,
- lr_datapaths, ovn_internal_version_changed,
- sbflow, dpgrp_table);
-
- hmap_remove(lflows, &lflow->hmap_node);
- hmap_insert(&lflows_temp, &lflow->hmap_node,
- hmap_node_hash(&lflow->hmap_node));
- } else {
- sbrec_logical_flow_delete(sbflow);
- }
- }
-
- HMAP_FOR_EACH_SAFE (lflow, hmap_node, lflows) {
- sync_lflow_to_sb(lflow, ovnsb_txn, lflow_table, ls_datapaths,
- lr_datapaths, ovn_internal_version_changed,
- NULL, dpgrp_table);
-
- hmap_remove(lflows, &lflow->hmap_node);
- hmap_insert(&lflows_temp, &lflow->hmap_node,
- hmap_node_hash(&lflow->hmap_node));
}
+ uuidset_destroy(&sb_uuid_set);
hmap_swap(lflows, &lflows_temp);
hmap_destroy(&lflows_temp);
}
@@ -847,7 +852,7 @@ ovn_lflow_init(struct ovn_lflow *lflow, struct ovn_datapath
*od,
size_t dp_bitmap_len, enum ovn_stage stage, uint16_t priority,
char *match, char *actions, char *io_port, char *ctrl_meter,
char *stage_hint, const char *where,
- const char *flow_desc)
+ const char *flow_desc, struct uuid sbuuid)
{
lflow->dpg_bitmap = bitmap_allocate(dp_bitmap_len);
lflow->od = od;
@@ -861,7 +866,9 @@ ovn_lflow_init(struct ovn_lflow *lflow, struct ovn_datapath
*od,
lflow->flow_desc = flow_desc;
lflow->dpg = NULL;
lflow->where = where;
- lflow->sb_uuid = UUID_ZERO;
+ lflow->sb_uuid = sbuuid;
+ lflow->is_old = false;
+ lflow->is_to_install = true;
Using the enum suggestion, we would set lflow->sync_state =
LFLOW_TO_SYNC here.
hmap_init(&lflow->dp_refcnts_map);
ovs_list_init(&lflow->referenced_by);
}
@@ -957,13 +964,18 @@ do_ovn_lflow_add(struct lflow_table *lflow_table, size_t
dp_bitmap_len,
{
struct ovn_lflow *old_lflow;
struct ovn_lflow *lflow;
+ struct uuid sbuuid = UUID_ZERO;
ovs_assert(dp_bitmap_len);
old_lflow = ovn_lflow_find(&lflow_table->entries, stage,
priority, match, actions, ctrl_meter, hash);
if (old_lflow) {
- return old_lflow;
+ if (old_lflow->is_to_install) {
If using the enum suggestion, then we'd check for old_lflow->sync_state
== LFLOW_TO_SYNC
+ return old_lflow;
+ }
+ sbuuid = old_lflow->sb_uuid;
+ ovn_lflow_destroy(lflow_table, old_lflow);
}
I wasn't sure where to put this, but there is a flaw with the logic
here. When running ovn-northd with parallelization enabled, we keep
track of the number of lflows added to the lflow_table with a
thread-local variable called thread_lflow_counter. When the parallelized
portion completes, the lflow_table hmap's size is set to the sum of the
thread_lflow_counters on each thread. The reason for this is because the
hmap count can potentially get messed up when multiple threads are
adding to the hmap simultaneously. So we use this as a means to set the
size correctly.
This logic works because the lflow_table is empty at the beginning of
the parallelized run, and every hmap_insert_fast() results in the size
of the hmap increasing by 1. Therefore, if we count the number of
inserts in each thread, we can add those all up and get the correct hmap
size after. You can see this being done in northd/northd.c in the
fix_flow_table_size() function.
With this new change, the lflow_table is populated with stale lflows at
the beginning of the parallelized run. However, the thread_lflow_counter
in each thread starts counting from 0. This means that we only increase
the thread_lflow_counter each time we replace a stale lflow or add a new
lflow to the lflow_table. Any leftover stale lflows in the table after a
run will be in the lflow_table but will not be accounted for by the
thread_lflow_counter. This means there is a possibility that after a
parallelized run, we could set the size of the lflow_table smaller than
it should be since we are not accounting for the stale lflows that
started off in the table. This could lead to obscure bugs.
Following this algorithm should fix the problem:
* In the main thread, before starting a parallelized run, store the
hmap_count() of the lflow_table.
* Start the parallelized run. Each thread initializes their
thread_lflow_counter to 0.
* Each thread should only increase their thread_lflow_counter in the
case where old_lflow is not found in the lflow_table. If old_lflow is
found and replaced, thread_lflow_counter remains the same.
* After parallelization is complete, fix_flow_table_size() should sum
all of the thread_lflow_counters from each thread and then add that to
the count that was saved in the first step.
This should work since the parallelized run will only ever add new
lflows or replace stale lflows.
lflow = xzalloc(sizeof *lflow);
@@ -975,7 +987,7 @@ do_ovn_lflow_add(struct lflow_table *lflow_table, size_t
dp_bitmap_len,
io_port ? xstrdup(io_port) : NULL,
nullable_xstrdup(ctrl_meter),
ovn_lflow_hint(stage_hint), where,
- flow_desc);
+ flow_desc, sbuuid);
if (parallelization_state != STATE_USE_PARALLELIZATION) {
hmap_insert(&lflow_table->entries, &lflow->hmap_node, hash);
@@ -1294,6 +1306,7 @@ lflow_ref_sync_lflows__(struct lflow_ref *lflow_ref,
sblflow, dpgrp_table)) {
return false;
}
+ lflow->is_old = true;
If using the enum suggestion, we would set lflow->sync_state =
LFLOW_SYNCED here. See my earlier note about moving this assignment to
the end of sync_lflow_to_sb().
}
if (!lrn->linked) {
diff --git a/northd/lflow-mgr.h b/northd/lflow-mgr.h
index 1521270d6..c25b0185e 100644
--- a/northd/lflow-mgr.h
+++ b/northd/lflow-mgr.h
@@ -29,7 +29,7 @@ struct ovsdb_idl_row;
struct lflow_table;
struct lflow_table *lflow_table_alloc(void);
void lflow_table_init(struct lflow_table *);
-void lflow_table_clear(struct lflow_table *);
+void lflow_table_clear(struct lflow_table *, bool);
void lflow_table_destroy(struct lflow_table *);
void lflow_table_expand(struct lflow_table *);
void lflow_table_set_size(struct lflow_table *, size_t);
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev