Hi Mark, I posted version 2 with all your comments addressed https://patchwork.ozlabs.org/project/ovn/patch/[email protected]/
Also, please find my answers to your questions below. best regards, Alexander On 2/17/25 11:33 PM, Mark Michelson wrote: > > > On Mon, Feb 10, 2025 at 4:40 AM Aleksandr Smirnov (K2 Cloud) > <[email protected]> wrote: > > Hi Mark, > > Thank you for your valuable comments! > Could you please answer my main question that I originally raised to > Dumitru : > > Do you want to have this fix in main at all? > > I mean this is a real problem in the branch 22 but mostly is not > affected the main branch. > So, I will provide this fix for our running environment (which is > v22) > but not sure you want to have it in main. > > > I think to answer your question, I need to know why this is not a > problem in the main branch, currently. If the reason is that engine > runs tend to be quicker, then it may be worth puting in main anyway. > If there are other mitigating circumstances, then I'm not as sure. Yes, the reason is that engine runs quicker because change handlers are cutting off a unnecessary work. Thus, I decide to present my fix as candidate to put to main. > > best regards, > > Alexander > > On 2/7/25 9:19 PM, Mark Michelson wrote: > > Hi Aleksandr, thanks for the patch. > > > > This patch needs a rebase because it conflicts with > > tests/ovn-northd.at <http://ovn-northd.at> in current main. > > > > See below for my findings. > > > > On 2/5/25 10:21, Aleksandr Smirnov wrote: > >> According to the ovn-architecture(7) sb_cfg counter should be > >> propagated at the moment northd completed transaction of > related changes > >> to the southbound db. However, a processing engine call > happened right > >> between two events, a sb transaction commitment, and initiating > >> the sb_cfg write. Normally, that processing engine call has nothing > >> to do, because it has only update from sb db that fires back result > >> of recent transaction from northd. But if, by some reason, engine > >> change handler falls into full recompute there will be big delay > >> before sb_cfg propagated to nb db, and in fact it really happened > >> in old release, f.e. 22.09 > >> > >> The fix proposed defers engine run for one iteration (of main loop) > >> if we have sb_cfg ready to propagate. > >> > >> Signed-off-by: Aleksandr Smirnov <[email protected]> > >> --- > >> northd/inc-proc-northd.c | 1 + > >> northd/ovn-northd.c | 22 ++++++++++++++++++++-- > >> tests/ovn-northd.at <http://ovn-northd.at> | 23 > +++++++++++++++++++++++ > >> 3 files changed, 44 insertions(+), 2 deletions(-) > >> > >> diff --git a/northd/inc-proc-northd.c b/northd/inc-proc-northd.c > >> index 4a1d4eac9..6f348e610 100644 > >> --- a/northd/inc-proc-northd.c > >> +++ b/northd/inc-proc-northd.c > >> @@ -480,6 +480,7 @@ bool inc_proc_northd_run(struct ovsdb_idl_txn > >> *ovnnb_txn, > >> VLOG_DBG("engine was canceled, force recompute next > time."); > >> engine_set_force_recompute_immediate(); > >> } else { > >> + VLOG_DBG("Engine has ran successfully."); > > > > I think you can test your change without the need for this new > debug > > message. > > > > In lib/inc-proc-eng.c, there is a debug message that prints for > every > > run of the engine: "Initializing new run". I think it makes more > sense > > to use the existing message since it prints for all engine runs, > not > > just successful ones. This is important since your test is > trying to > > ensure the engine does not attempt to run when we have a pending > > change to NB_Global:sb_cfg . > > > >> engine_clear_force_recompute(); > >> } > >> diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c > >> index d3470d94c..e6adb76d7 100644 > >> --- a/northd/ovn-northd.c > >> +++ b/northd/ovn-northd.c > >> @@ -798,6 +798,16 @@ run_memory_trimmer(struct ovsdb_idl > *ovnnb_idl, > >> bool activity) > >> memory_trimmer_wait(mt); > >> } > >> +static bool > >> +sb_cfg_is_updated(struct ovsdb_idl *ovnnb_idl, > >> + struct ovsdb_idl_loop *sb_loop) > >> +{ > >> + const struct nbrec_nb_global *nb = > >> nbrec_nb_global_first(ovnnb_idl); > >> + > >> + return (nb && sb_loop->cur_cfg && nb->sb_cfg != > sb_loop->cur_cfg) > >> + ? true : false; > >> +} > >> + > >> int > >> main(int argc, char *argv[]) > >> { > >> @@ -1052,8 +1062,16 @@ main(int argc, char *argv[]) > >> if (ovnnb_txn && ovnsb_txn && > >> inc_proc_northd_can_run(&eng_ctx)) { > >> int64_t loop_start_time = time_wall_msec(); > >> - activity = inc_proc_northd_run(ovnnb_txn, > >> ovnsb_txn, > >> - &eng_ctx); > >> + > >> + if (!sb_cfg_is_updated(ovnnb_idl_loop.idl, > >> + &ovnsb_idl_loop)) { > >> + activity = inc_proc_northd_run(ovnnb_txn, > >> + ovnsb_txn, > >> + &eng_ctx); > >> + } else { > >> + clear_idl_track = false; > > > > I think you should add a call to poll_immediate_wake() here. In > > theory, we should write the new sb_cfg value to the northbound > > database. Then that should trigger an update from the northbound > > database that will wake up northd and process what was skipped > in this > > loop. However, since we also know that we definitely have data > at hand > > that the incremental engine needs to process, we should not rely on > > the database transactions to work as expected before we process > that > > data. Calling poll_immediate_wake() will ensure the loop wakes up > > immediately and performs an incremental engine run. > > > >> + } > >> + > >> check_and_add_supported_dhcp_opts_to_sb_db( > >> ovnsb_txn, ovnsb_idl_loop.idl); > >> check_and_add_supported_dhcpv6_opts_to_sb_db( > >> diff --git a/tests/ovn-northd.at <http://ovn-northd.at> > b/tests/ovn-northd.at <http://ovn-northd.at> > >> index 7abd4d3f3..ae6c999f9 100644 > >> --- a/tests/ovn-northd.at <http://ovn-northd.at> > >> +++ b/tests/ovn-northd.at <http://ovn-northd.at> > >> @@ -14579,3 +14579,26 @@ wait_row_count Multicast_Group 0 > >> name=239.0.1.68 > >> OVN_CLEANUP([hv1], [hv2]) > >> AT_CLEANUP > >> ]) > >> + > >> + > >> + > >> +###################################333 > >> +#####################################33333 > > > > Please use a single blank line to separate test cases. There is no > > need for custom separators. > > > >> +OVN_FOR_EACH_NORTHD_NO_HV([ > >> +AT_SETUP([sb_cfg propagation]) > >> + > > > > This test could use a comment explaining what its purpose is, > because > > the code itself is fairly esoteric without the accompanying commit > > message. > > > >> +ovn_start > >> + > >> +rm -f northd/ovn-northd.log > >> +check as northd ovn-appctl -t ovn-northd vlog/reopen > >> +check as northd ovn-appctl -t ovn-northd vlog/set jsonrpc:dbg > >> +check as northd ovn-appctl -t ovn-northd vlog/set > inc_proc_northd:dbg > >> + > >> +check ovn-nbctl ls-add sw1 -- set nb_global . nb_cfg=1 > >> +wait_column 1 nb:nb_global sb_cfg > > > > Instead of using wait_column here, you should be able to use the > > --wait=sb option in the previous ovn-nbctl command. > > > > check ovn-nbctl --wait=sb ls-add sw1 -- set nb_global . nb_cfg=1 > > > > Also, out of curiosity, why are you adding a switch? > There is no solid reason to use merely logical switch, it just close to our internal problem description that says 'adding logical switch port takes enormous time'. > > > >> + > >> +call_seq=`grep -E "(transact.*sb_cfg)|(transact.*Southbound)|(has > >> ran)" northd/ovn-northd.log | cut -d "|" -f 5- | cut -b 1 | tr > -d '\n'` > >> +AT_CHECK([test x`echo $call_seq` = xEEuuE]) > > > > I'm worried that the "E's" in this string could end up changing > due to > > unrelated code changes that happen either in northd or in the > > incremental engine. It's possible we could end up seeing more or > fewer > > engine runs than what the test is checking for. > > > > What's most important is that there are no "E's" between the two > > "u's". So maybe instead of looking for an exact match of > "EEuuE", you > > could simply ensure that you see "uu" together in the output string? > > > >> + > >> +AT_CLEANUP > >> +]) > > > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
