On 3/10/25 11:40 AM, Aleksandr Smirnov wrote:
> According to the ovn-architecture(7) sb_cfg counter should be
> propagated at the moment northd completed transaction of related changes
> to the southbound db. However, a processing engine call happened right
> between two events, a sb transaction commitment, and initiating
> the sb_cfg write. Normally, that processing engine call has nothing
> to do, because it has only update from sb db that fires back result
> of recent transaction from northd. But if, by some reason, engine
> change handler falls into full recompute there will be big delay
> before sb_cfg propagated to nb db, and in fact it really happened
> in old release, for example 22.09.
>
> The fix defers engine run for one iteration (of main loop)
> if we have sb_cfg ready to propagate.
>
> Signed-off-by: Aleksandr Smirnov <[email protected]>
> ---
Hi Aleksandr,
Thanks for the patch and sorry for the delay in reviewing.
> v2: Address Mark's comments.
> v3: Address Vladislav's comments.
> ---
> northd/ovn-northd.c | 22 ++++++++++++++++++++--
> tests/ovn-northd.at | 37 +++++++++++++++++++++++++++++++++++++
> 2 files changed, 57 insertions(+), 2 deletions(-)
>
> diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
> index 72eedbfdb..68a31d0b0 100644
> --- a/northd/ovn-northd.c
> +++ b/northd/ovn-northd.c
> @@ -798,6 +798,15 @@ run_memory_trimmer(struct ovsdb_idl *ovnnb_idl, bool
> activity)
> memory_trimmer_wait(mt);
> }
>
> +static bool
> +sb_cfg_is_in_sync(struct ovsdb_idl *ovnnb_idl,
> + struct ovsdb_idl_loop *sb_loop)
> +{
> + const struct nbrec_nb_global *nb = nbrec_nb_global_first(ovnnb_idl);
> +
> + return !(nb && sb_loop->cur_cfg && nb->sb_cfg != sb_loop->cur_cfg);
> +}
> +
> int
> main(int argc, char *argv[])
> {
> @@ -1064,8 +1073,17 @@ main(int argc, char *argv[])
> if (ovnnb_txn && ovnsb_txn &&
> inc_proc_northd_can_run(&eng_ctx)) {
> int64_t loop_start_time = time_wall_msec();
> - activity = inc_proc_northd_run(ovnnb_txn, ovnsb_txn,
> - &eng_ctx);
> +
> + if (sb_cfg_is_in_sync(ovnnb_idl_loop.idl,
> + &ovnsb_idl_loop)) {
> + activity = inc_proc_northd_run(ovnnb_txn,
> + ovnsb_txn,
> + &eng_ctx);
> + } else {
> + poll_immediate_wake();
I guess this call to poll_immediate_wake() was added after Mark's
comment on v1 [0]:
> I think you should add a call to poll_immediate_wake() here. In
> theory, we should write the new sb_cfg value to the northbound
> database. Then that should trigger an update from the northbound
> database that will wake up northd and process what was skipped in this
> loop. However, since we also know that we definitely have data at hand
> that the incremental engine needs to process, we should not rely on
> the database transactions to work as expected before we process that
> data. Calling poll_immediate_wake() will ensure the loop wakes up
> immediately and performs an incremental engine run.
[0] https://mail.openvswitch.org/pipermail/ovs-dev/2025-February/420812.html
However, even without it we should wake up for the next incoming event,
whichever happens first; that is one of:
- northd's SB transaction to bump SB_Global.nb_cfg (done by
update_sequence_numbers() executed just below) completes
- a different NB/SB update wakes us
- a different scheduled poll wake timeout (e.g., mac binding refresh)
expires
Calling poll_immediate_wake() seems a little excessive but I might be
wrong. Am I missing a case here?
Regards,
Dumitru
> + clear_idl_track = false;
> + }
> +
> check_and_add_supported_dhcp_opts_to_sb_db(
> ovnsb_txn, ovnsb_idl_loop.idl);
> check_and_add_supported_dhcpv6_opts_to_sb_db(
> diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
> index cfaba19bf..5db6d4984 100644
> --- a/tests/ovn-northd.at
> +++ b/tests/ovn-northd.at
> @@ -16748,3 +16748,40 @@ check grep -q "Bad configuration: The peer of the
> switch port 'ls-lrp1' (LRP pee
>
> AT_CLEANUP
> ])
> +
> +OVN_FOR_EACH_NORTHD_NO_HV([
> +AT_SETUP([sb_cfg propagation])
> +ovn_start
> +
> +#
> +# Test engine call does not happen between sb db transaction
> +# commitment and sb_cfg write to the nb db (if have to)
> +#
> +> northd/ovn-northd.log
> +check as northd ovn-appctl -t ovn-northd vlog/set jsonrpc:dbg
> +check as northd ovn-appctl -t ovn-northd vlog/set inc_proc_eng:dbg
> +# Any change that invoke engine processing, for example add logical switch.
> +# nb_cfg bump must be present in order to get feedback as sb_cfg.
> +check ovn-nbctl --wait=sb ls-add sw1
> +#
> +# Get following messages from log file:
> +# 'unix... transact ... Southbound' --> Indicates the pack of updates has
> been
> +# committed to the sb db.
> +# 'unix... transact ... sb_cfg' --> Indicates the sb_cfg has been committed
> to
> +# the sb db.
> +# 'Initializing new run' --> Indicates the engine has been called.
> +#
> +# Then, take first letter of messages, here 'u' or 'I' and form them into one
> +# string.
> +#
> +# Finally, a 'good' string should have two 'u' conjuncted with several 'I'
> +# both as prefix and suffix, while 'bad' string will have 'I' between two
> 'u'.
> +#
> +call_seq=`grep -E \
> + "(transact.*sb_cfg)|(transact.*Southbound)|(Initializing new run)" \
> + northd/ovn-northd.log | cut -d "|" -f 5- | cut -b 1 | tr -d '\n'`
> +
> +AT_CHECK([echo $call_seq | grep -qE "^I*uuI*$"], [0])
> +
> +AT_CLEANUP
> +])
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev