Hi Aleksandr, thanks for the patch.
This patch needs a rebase because it conflicts with tests/ovn-northd.at
in current main.
See below for my findings.
On 2/5/25 10:21, Aleksandr Smirnov wrote:
According to the ovn-architecture(7) sb_cfg counter should be
propagated at the moment northd completed transaction of related changes
to the southbound db. However, a processing engine call happened right
between two events, a sb transaction commitment, and initiating
the sb_cfg write. Normally, that processing engine call has nothing
to do, because it has only update from sb db that fires back result
of recent transaction from northd. But if, by some reason, engine
change handler falls into full recompute there will be big delay
before sb_cfg propagated to nb db, and in fact it really happened
in old release, f.e. 22.09
The fix proposed defers engine run for one iteration (of main loop)
if we have sb_cfg ready to propagate.
Signed-off-by: Aleksandr Smirnov <[email protected]>
---
northd/inc-proc-northd.c | 1 +
northd/ovn-northd.c | 22 ++++++++++++++++++++--
tests/ovn-northd.at | 23 +++++++++++++++++++++++
3 files changed, 44 insertions(+), 2 deletions(-)
diff --git a/northd/inc-proc-northd.c b/northd/inc-proc-northd.c
index 4a1d4eac9..6f348e610 100644
--- a/northd/inc-proc-northd.c
+++ b/northd/inc-proc-northd.c
@@ -480,6 +480,7 @@ bool inc_proc_northd_run(struct ovsdb_idl_txn *ovnnb_txn,
VLOG_DBG("engine was canceled, force recompute next time.");
engine_set_force_recompute_immediate();
} else {
+ VLOG_DBG("Engine has ran successfully.");
I think you can test your change without the need for this new debug
message.
In lib/inc-proc-eng.c, there is a debug message that prints for every
run of the engine: "Initializing new run". I think it makes more sense
to use the existing message since it prints for all engine runs, not
just successful ones. This is important since your test is trying to
ensure the engine does not attempt to run when we have a pending change
to NB_Global:sb_cfg .
engine_clear_force_recompute();
}
diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
index d3470d94c..e6adb76d7 100644
--- a/northd/ovn-northd.c
+++ b/northd/ovn-northd.c
@@ -798,6 +798,16 @@ run_memory_trimmer(struct ovsdb_idl *ovnnb_idl, bool
activity)
memory_trimmer_wait(mt);
}
+static bool
+sb_cfg_is_updated(struct ovsdb_idl *ovnnb_idl,
+ struct ovsdb_idl_loop *sb_loop)
+{
+ const struct nbrec_nb_global *nb = nbrec_nb_global_first(ovnnb_idl);
+
+ return (nb && sb_loop->cur_cfg && nb->sb_cfg != sb_loop->cur_cfg)
+ ? true : false;
+}
+
int
main(int argc, char *argv[])
{
@@ -1052,8 +1062,16 @@ main(int argc, char *argv[])
if (ovnnb_txn && ovnsb_txn &&
inc_proc_northd_can_run(&eng_ctx)) {
int64_t loop_start_time = time_wall_msec();
- activity = inc_proc_northd_run(ovnnb_txn, ovnsb_txn,
- &eng_ctx);
+
+ if (!sb_cfg_is_updated(ovnnb_idl_loop.idl,
+ &ovnsb_idl_loop)) {
+ activity = inc_proc_northd_run(ovnnb_txn,
+ ovnsb_txn,
+ &eng_ctx);
+ } else {
+ clear_idl_track = false;
I think you should add a call to poll_immediate_wake() here. In theory,
we should write the new sb_cfg value to the northbound database. Then
that should trigger an update from the northbound database that will
wake up northd and process what was skipped in this loop. However, since
we also know that we definitely have data at hand that the incremental
engine needs to process, we should not rely on the database transactions
to work as expected before we process that data. Calling
poll_immediate_wake() will ensure the loop wakes up immediately and
performs an incremental engine run.
+ }
+
check_and_add_supported_dhcp_opts_to_sb_db(
ovnsb_txn, ovnsb_idl_loop.idl);
check_and_add_supported_dhcpv6_opts_to_sb_db(
diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
index 7abd4d3f3..ae6c999f9 100644
--- a/tests/ovn-northd.at
+++ b/tests/ovn-northd.at
@@ -14579,3 +14579,26 @@ wait_row_count Multicast_Group 0 name=239.0.1.68
OVN_CLEANUP([hv1], [hv2])
AT_CLEANUP
])
+
+
+
+###################################333
+#####################################33333
Please use a single blank line to separate test cases. There is no need
for custom separators.
+OVN_FOR_EACH_NORTHD_NO_HV([
+AT_SETUP([sb_cfg propagation])
+
This test could use a comment explaining what its purpose is, because
the code itself is fairly esoteric without the accompanying commit message.
+ovn_start
+
+rm -f northd/ovn-northd.log
+check as northd ovn-appctl -t ovn-northd vlog/reopen
+check as northd ovn-appctl -t ovn-northd vlog/set jsonrpc:dbg
+check as northd ovn-appctl -t ovn-northd vlog/set inc_proc_northd:dbg
+
+check ovn-nbctl ls-add sw1 -- set nb_global . nb_cfg=1
+wait_column 1 nb:nb_global sb_cfg
Instead of using wait_column here, you should be able to use the
--wait=sb option in the previous ovn-nbctl command.
check ovn-nbctl --wait=sb ls-add sw1 -- set nb_global . nb_cfg=1
Also, out of curiosity, why are you adding a switch?
+
+call_seq=`grep -E "(transact.*sb_cfg)|(transact.*Southbound)|(has ran)"
northd/ovn-northd.log | cut -d "|" -f 5- | cut -b 1 | tr -d '\n'`
+AT_CHECK([test x`echo $call_seq` = xEEuuE])
I'm worried that the "E's" in this string could end up changing due to
unrelated code changes that happen either in northd or in the
incremental engine. It's possible we could end up seeing more or fewer
engine runs than what the test is checking for.
What's most important is that there are no "E's" between the two "u's".
So maybe instead of looking for an exact match of "EEuuE", you could
simply ensure that you see "uu" together in the output string?
+
+AT_CLEANUP
+])
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev