Hi Mark,

I posted version 2 with all your comments addressed 
https://patchwork.ozlabs.org/project/ovn/patch/[email protected]/

Also, please find my answers to your questions below.

best regards,

Alexander

On 2/17/25 11:33 PM, Mark Michelson wrote:
>
>
> On Mon, Feb 10, 2025 at 4:40 AM Aleksandr Smirnov (K2 Cloud) 
> <[email protected]> wrote:
>
>     Hi Mark,
>
>     Thank you for your valuable comments!
>     Could you please answer my main question that I originally raised to
>     Dumitru :
>
>     Do you want to have this fix in main at all?
>
>     I mean this is a real problem in the branch 22 but mostly is not
>     affected the main branch.
>     So, I will provide this fix for our running environment (which is
>     v22)
>     but not sure you want to have it in main.
>
>
> I think to answer your question, I need to know why this is not a 
> problem in the main branch, currently. If the reason is that engine 
> runs tend to be quicker, then it may be worth puting in main anyway. 
> If there are other mitigating circumstances, then I'm not as sure.

Yes, the reason is that engine runs quicker because change handlers are 
cutting off a unnecessary work.
Thus, I decide to present my fix as candidate to put to main.

>
>     best regards,
>
>     Alexander
>
>     On 2/7/25 9:19 PM, Mark Michelson wrote:
>     > Hi Aleksandr, thanks for the patch.
>     >
>     > This patch needs a rebase because it conflicts with
>     > tests/ovn-northd.at <http://ovn-northd.at> in current main.
>     >
>     > See below for my findings.
>     >
>     > On 2/5/25 10:21, Aleksandr Smirnov wrote:
>     >> According to the ovn-architecture(7) sb_cfg counter should be
>     >> propagated at the moment northd completed transaction of
>     related changes
>     >> to the southbound db. However, a processing engine call
>     happened right
>     >> between two events, a sb transaction commitment, and initiating
>     >> the sb_cfg write. Normally, that processing engine call has nothing
>     >> to do, because it has only update from sb db that fires back result
>     >> of recent transaction from northd. But if, by some reason, engine
>     >> change handler falls into full recompute there will be big delay
>     >> before sb_cfg propagated to nb db, and in fact it really happened
>     >> in old release, f.e. 22.09
>     >>
>     >> The fix proposed defers engine run for one iteration (of main loop)
>     >> if we have sb_cfg ready to propagate.
>     >>
>     >> Signed-off-by: Aleksandr Smirnov <[email protected]>
>     >> ---
>     >>   northd/inc-proc-northd.c |  1 +
>     >>   northd/ovn-northd.c      | 22 ++++++++++++++++++++--
>     >>   tests/ovn-northd.at <http://ovn-northd.at> | 23
>     +++++++++++++++++++++++
>     >>   3 files changed, 44 insertions(+), 2 deletions(-)
>     >>
>     >> diff --git a/northd/inc-proc-northd.c b/northd/inc-proc-northd.c
>     >> index 4a1d4eac9..6f348e610 100644
>     >> --- a/northd/inc-proc-northd.c
>     >> +++ b/northd/inc-proc-northd.c
>     >> @@ -480,6 +480,7 @@ bool inc_proc_northd_run(struct ovsdb_idl_txn
>     >> *ovnnb_txn,
>     >>           VLOG_DBG("engine was canceled, force recompute next
>     time.");
>     >>           engine_set_force_recompute_immediate();
>     >>       } else {
>     >> +        VLOG_DBG("Engine has ran successfully.");
>     >
>     > I think you can test your change without the need for this new
>     debug
>     > message.
>     >
>     > In lib/inc-proc-eng.c, there is a debug message that prints for
>     every
>     > run of the engine: "Initializing new run". I think it makes more
>     sense
>     > to use the existing message since it prints for all engine runs,
>     not
>     > just successful ones. This is important since your test is
>     trying to
>     > ensure the engine does not attempt to run when we have a pending
>     > change to NB_Global:sb_cfg .
>     >
>     >>           engine_clear_force_recompute();
>     >>       }
>     >>   diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
>     >> index d3470d94c..e6adb76d7 100644
>     >> --- a/northd/ovn-northd.c
>     >> +++ b/northd/ovn-northd.c
>     >> @@ -798,6 +798,16 @@ run_memory_trimmer(struct ovsdb_idl
>     *ovnnb_idl,
>     >> bool activity)
>     >>       memory_trimmer_wait(mt);
>     >>   }
>     >>   +static bool
>     >> +sb_cfg_is_updated(struct ovsdb_idl *ovnnb_idl,
>     >> +                  struct ovsdb_idl_loop *sb_loop)
>     >> +{
>     >> +    const struct nbrec_nb_global *nb =
>     >> nbrec_nb_global_first(ovnnb_idl);
>     >> +
>     >> +    return (nb && sb_loop->cur_cfg && nb->sb_cfg !=
>     sb_loop->cur_cfg)
>     >> +           ? true : false;
>     >> +}
>     >> +
>     >>   int
>     >>   main(int argc, char *argv[])
>     >>   {
>     >> @@ -1052,8 +1062,16 @@ main(int argc, char *argv[])
>     >>                   if (ovnnb_txn && ovnsb_txn &&
>     >> inc_proc_northd_can_run(&eng_ctx)) {
>     >>                       int64_t loop_start_time = time_wall_msec();
>     >> -                    activity = inc_proc_northd_run(ovnnb_txn,
>     >> ovnsb_txn,
>     >> - &eng_ctx);
>     >> +
>     >> +                    if (!sb_cfg_is_updated(ovnnb_idl_loop.idl,
>     >> + &ovnsb_idl_loop)) {
>     >> +                        activity = inc_proc_northd_run(ovnnb_txn,
>     >> + ovnsb_txn,
>     >> + &eng_ctx);
>     >> +                    } else {
>     >> +                        clear_idl_track = false;
>     >
>     > I think you should add a call to poll_immediate_wake() here. In
>     > theory, we should write the new sb_cfg value to the northbound
>     > database. Then that should trigger an update from the northbound
>     > database that will wake up northd and process what was skipped
>     in this
>     > loop. However, since we also know that we definitely have data
>     at hand
>     > that the incremental engine needs to process, we should not rely on
>     > the database transactions to work as expected before we process
>     that
>     > data. Calling poll_immediate_wake() will ensure the loop wakes up
>     > immediately and performs an incremental engine run.
>     >
>     >> +                    }
>     >> +
>     >> check_and_add_supported_dhcp_opts_to_sb_db(
>     >>                                    ovnsb_txn, ovnsb_idl_loop.idl);
>     >> check_and_add_supported_dhcpv6_opts_to_sb_db(
>     >> diff --git a/tests/ovn-northd.at <http://ovn-northd.at>
>     b/tests/ovn-northd.at <http://ovn-northd.at>
>     >> index 7abd4d3f3..ae6c999f9 100644
>     >> --- a/tests/ovn-northd.at <http://ovn-northd.at>
>     >> +++ b/tests/ovn-northd.at <http://ovn-northd.at>
>     >> @@ -14579,3 +14579,26 @@ wait_row_count Multicast_Group 0
>     >> name=239.0.1.68
>     >>   OVN_CLEANUP([hv1], [hv2])
>     >>   AT_CLEANUP
>     >>   ])
>     >> +
>     >> +
>     >> +
>     >> +###################################333
>     >> +#####################################33333
>     >
>     > Please use a single blank line to separate test cases. There is no
>     > need for custom separators.
>     >
>     >> +OVN_FOR_EACH_NORTHD_NO_HV([
>     >> +AT_SETUP([sb_cfg propagation])
>     >> +
>     >
>     > This test could use a comment explaining what its purpose is,
>     because
>     > the code itself is fairly esoteric without the accompanying commit
>     > message.
>     >
>     >> +ovn_start
>     >> +
>     >> +rm -f northd/ovn-northd.log
>     >> +check as northd ovn-appctl -t ovn-northd vlog/reopen
>     >> +check as northd ovn-appctl -t ovn-northd vlog/set jsonrpc:dbg
>     >> +check as northd ovn-appctl -t ovn-northd vlog/set
>     inc_proc_northd:dbg
>     >> +
>     >> +check ovn-nbctl ls-add sw1 -- set nb_global . nb_cfg=1
>     >> +wait_column 1 nb:nb_global sb_cfg
>     >
>     > Instead of using wait_column here, you should be able to use the
>     > --wait=sb option in the previous ovn-nbctl command.
>     >
>     > check ovn-nbctl --wait=sb ls-add sw1 -- set nb_global . nb_cfg=1
>     >
>     > Also, out of curiosity, why are you adding a switch?
>
There is no solid reason to use merely logical switch, it just close to our
internal problem description that says
'adding logical switch port takes enormous time'.


>     >
>     >> +
>     >> +call_seq=`grep -E "(transact.*sb_cfg)|(transact.*Southbound)|(has
>     >> ran)" northd/ovn-northd.log | cut -d "|" -f 5- | cut -b 1 | tr
>     -d '\n'`
>     >> +AT_CHECK([test x`echo $call_seq` = xEEuuE])
>     >
>     > I'm worried that the "E's" in this string could end up changing
>     due to
>     > unrelated code changes that happen either in northd or in the
>     > incremental engine. It's possible we could end up seeing more or
>     fewer
>     > engine runs than what the test is checking for.
>     >
>     > What's most important is that there are no "E's" between the two
>     > "u's". So maybe instead of looking for an exact match of
>     "EEuuE", you
>     > could simply ensure that you see "uu" together in the output string?
>     >
>     >> +
>     >> +AT_CLEANUP
>     >> +])
>     >
>

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to