On Fri, May 23, 2025 at 10:02:59AM +0000, Smirnov Aleksandr (K2 Cloud) wrote:
> Hi Felix,
> 
> Answering your propose, yes, I can reuse this solution instead of direct 
> check seqno vs expected seqno.
> However, I have inspected what you are doing in your patch and want to 
> point out to one mistake (at least by my opinion).

Hi Alexander,

thanks for the review. 

> When I tried to play with seqno/expected seqno numbers I realized that 
> they only set to meaningful values BELOW the
> line
> ovnsb_expected_cond_seqno =
>                                  update_sb_monitors(
>                                      ovnsb_idl_loop.idl, chassis,
> &runtime_data->local_lports,
> &runtime_data->lbinding_data.bindings,
> &runtime_data->local_datapaths,
>                                      sb_monitor_all);

So per default ovnsb_expected_cond_seqno is initialized to UINT_MAX.
Then i see two locations where it might be set to an actual value:

1. At the start of the main loop in update_sb_db, but only if
   monitor_all is set
2. The point you mentioned that happens after the engine computed
   completely.

> 
> So, if you want to make meaningful comparison of seqno(s) to answer the 
> question 'are there upcoming updates' you have to that check after 
> update_sb_monitors() call and before next iteration of main loop. 

Or at the start of the main loop if we are sure that ovnsb_expected_cond_seqno
is valid. I did this in my patch with a
`ovnsb_expected_cond_seqno != UINT_MAX`.

> Otherwise, (again by my opinion) your code will always catch guard 
> checks (time,  max number, etc)

I am not sure if i get that point correctly. I want to share an example
case based on my understanding of what you mean. You can tell me if i got
it right:

We assume a case where monitor-all=false.
When starting ovn-controller we will connect to the local ovsdb and
southbound. We then run the engine the first time and set the initial
monitoring conditions.
At this point ovnsb_expected_cond_seqno is valid and larger than
ovnsb_cond_seqno.

Before we now get the initial monitor results from southbound something
happens that prevents the engine from running (e.g. northd_version_match
is false).
At some point the initial monitor updates arrive which will cause
ovnsb_expected_cond_seqno == ovnsb_cond_seqno. At least in my patch any
further main loop iteration would now cause daemon_started_recently_countdown
to be triggered, even though the engine never had the chance to update
and define a new ovnsb_expected_cond_seqno.

If the engine can then start running again daemon_started_recently()
might return true, even though we did not yet allow for up to 20 times
of initial condition changes.

> This is still dark area for me, but at least I need to share my 
> understanding of seqno life cycle.

Same for me and i think you found a valid point.

However it would be great if you could confirm the above understading.
If you meant a different point then maybe i need to fix something else
as well.

I will try to fix the above and send a new version of the patchset if
you confirm that this was the point you meant.

So thank you very much for finding that.

Thanks,
Felix

> 
> Thank you,
> 
> Alexander
> 
> On 5/7/25 5:26 PM, Felix Huettner wrote:
> > On Tue, Mar 18, 2025 at 01:08:12PM +0300, Aleksandr Smirnov wrote:
> >> Delay initial flow upload to OVS until all monitored updates received.
> >> This is a kind of replacement of the wait-before-clear parameter.
> >> Instead of waiting a certain amount of time we will wait
> >> until the final monitored update comes from SB DB.
> >> An event we watch in the controller's main loop is current condition
> >> sequence number compared vs expected condition sequence number.
> >> If they are not equal we still have to receive updates in response
> >> to recent monitor condition change request. This check makes sense
> >> only in code that lies after update_sb_monitors() function call.
> >>
> >> Note, this update will only work if wait-before-clear == 0,
> >> i.e. you can still rely on wait-before-clear behavior.
> >>
> >> Signed-off-by: Aleksandr Smirnov <[email protected]>
> >> ---
> >>   controller/ofctrl.c         | 29 ++++++++++++++++-
> >>   controller/ofctrl.h         |  3 +-
> >>   controller/ovn-controller.c |  4 ++-
> >>   tests/ovn-controller.at     | 62 +++++++++++++++++++++++++++++++++++++
> >>   4 files changed, 95 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/controller/ofctrl.c b/controller/ofctrl.c
> >> index 4a3d35b97..304b9bbc8 100644
> >> --- a/controller/ofctrl.c
> >> +++ b/controller/ofctrl.c
> >> @@ -349,6 +349,9 @@ static uint64_t cur_cfg;
> >>   /* Current state. */
> >>   static enum ofctrl_state state;
> >>   
> >> +/* Release wait before clear stage. */
> >> +static bool wait_before_clear_proceed = false;
> >> +
> >>   /* The time (ms) to stay in the state S_WAIT_BEFORE_CLEAR. Read from
> >>    * external_ids: ovn-ofctrl-wait-before-clear. */
> >>   static unsigned int wait_before_clear_time = 0;
> >> @@ -446,6 +449,7 @@ run_S_NEW(void)
> >>       struct ofpbuf *buf = ofpraw_alloc(OFPRAW_NXT_TLV_TABLE_REQUEST,
> >>                                         rconn_get_version(swconn), 0);
> >>       xid = queue_msg(buf);
> >> +    wait_before_clear_proceed = false;
> >>       state = S_TLV_TABLE_REQUESTED;
> >>   }
> >>   
> >> @@ -638,6 +642,14 @@ error:
> >>   static void
> >>   run_S_WAIT_BEFORE_CLEAR(void)
> >>   {
> >> +    if (wait_before_clear_time == 0) {
> >> +        if (wait_before_clear_proceed) {
> >> +            state = S_CLEAR_FLOWS;
> >> +        }
> >> +
> >> +        return;
> >> +    }
> >> +
> >>       if (!wait_before_clear_time ||
> >>           (wait_before_clear_expire &&
> >>            time_msec() >= wait_before_clear_expire)) {
> >> @@ -2695,11 +2707,26 @@ ofctrl_put(struct ovn_desired_flow_table 
> >> *lflow_table,
> >>              uint64_t req_cfg,
> >>              bool lflows_changed,
> >>              bool pflows_changed,
> >> -           struct tracked_acl_ids *tracked_acl_ids)
> >> +           struct tracked_acl_ids *tracked_acl_ids,
> >> +           bool monitor_cond_complete)
> >>   {
> >>       static bool skipped_last_time = false;
> >>       static uint64_t old_req_cfg = 0;
> >>       bool need_put = false;
> >> +
> >> +    if (state == S_WAIT_BEFORE_CLEAR) {
> >> +        /* If no more monitored condition changes expected
> >> +           Release wait before clear stage and skip
> >> +           over poll wait. */
> >> +        if (monitor_cond_complete) {
> >> +            wait_before_clear_proceed = true;
> >> +            poll_immediate_wake();
> >> +        }
> >> +
> >> +        skipped_last_time = true;
> >> +        return;
> >> +    }
> >> +
> >>       if (lflows_changed || pflows_changed || skipped_last_time ||
> >>           ofctrl_initial_clear) {
> >>           need_put = true;
> >> diff --git a/controller/ofctrl.h b/controller/ofctrl.h
> >> index d1ee69cb0..76e2fbece 100644
> >> --- a/controller/ofctrl.h
> >> +++ b/controller/ofctrl.h
> >> @@ -68,7 +68,8 @@ void ofctrl_put(struct ovn_desired_flow_table 
> >> *lflow_table,
> >>                   uint64_t nb_cfg,
> >>                   bool lflow_changed,
> >>                   bool pflow_changed,
> >> -                struct tracked_acl_ids *tracked_acl_ids);
> >> +                struct tracked_acl_ids *tracked_acl_ids,
> >> +                bool monitor_cond_complete);
> >>   bool ofctrl_has_backlog(void);
> >>   void ofctrl_wait(void);
> >>   void ofctrl_destroy(void);
> >> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
> >> index bc8acddf1..2623fc758 100644
> >> --- a/controller/ovn-controller.c
> >> +++ b/controller/ovn-controller.c
> >> @@ -6384,7 +6384,9 @@ main(int argc, char *argv[])
> >>                                      ofctrl_seqno_get_req_cfg(),
> >>                                      engine_node_changed(&en_lflow_output),
> >>                                      engine_node_changed(&en_pflow_output),
> >> -                                   tracked_acl_ids);
> >> +                                   tracked_acl_ids,
> >> +                                   ovnsb_cond_seqno
> >> +                                   == ovnsb_expected_cond_seqno);
> > Hi Aleksandr,
> >
> > i just wanted to let you know that we could probably combine this with
> > https://patchwork.ozlabs.org/project/ovn/patch/67010dca0a8dcc390e873349d662c9163e01050f.1746623091.git.felix.huettner@stackit.cloud/
> > then you could use "!daemon_started_recently()" here.
> >
> > But that is just for information.
> >
> > Thanks a lot,
> > Felix
> >
> >>                           stopwatch_stop(OFCTRL_PUT_STOPWATCH_NAME, 
> >> time_msec());
> >>                       }
> >>                       stopwatch_start(OFCTRL_SEQNO_RUN_STOPWATCH_NAME,
> >> diff --git a/tests/ovn-controller.at b/tests/ovn-controller.at
> >> index efb0a1741..812616711 100644
> >> --- a/tests/ovn-controller.at
> >> +++ b/tests/ovn-controller.at
> >> @@ -2463,6 +2463,68 @@ AT_CHECK([ovs-ofctl dump-flows br-int | grep group 
> >> -c], [0], [3
> >>   OVN_CLEANUP([hv1])
> >>   AT_CLEANUP
> >>   
> >> +OVN_FOR_EACH_NORTHD([
> >> +AT_SETUP([ovn-controller - ofctrl delay until all monitored updates come])
> >> +
> >> +# Prepare testing configuration
> >> +ovn_start
> >> +
> >> +net_add n1
> >> +sim_add hv1
> >> +as hv1
> >> +check ovs-vsctl add-br br-phys
> >> +ovn_attach n1 br-phys 192.168.0.1
> >> +check ovs-vsctl -- add-port br-int hv1-vif1 -- \
> >> +    set interface hv1-vif1 external-ids:iface-id=ls1-lp1
> >> +
> >> +check ovn-nbctl ls-add ls1
> >> +
> >> +check ovn-nbctl lsp-add ls1 ls1-lp1 \
> >> +-- lsp-set-addresses ls1-lp1 "f0:00:00:00:00:01 10.1.2.3"
> >> +
> >> +check ovn-nbctl lb-add lb1 1.1.1.1 10.1.2.3 \
> >> +-- ls-lb-add ls1 lb1
> >> +
> >> +check ovn-nbctl lb-add lb2 2.2.2.2 10.1.2.4 \
> >> +-- ls-lb-add ls1 lb2
> >> +
> >> +check ovn-nbctl --wait=hv sync
> >> +
> >> +# Stop ovn-controller
> >> +OVS_APP_EXIT_AND_WAIT([ovn-controller])
> >> +
> >> +# Sine we test initial flow upload controller is restarted.
> >> +# Clear log file and start controller.
> >> +rm -f hv1/ovn-controller.log
> >> +start_daemon ovn-controller -vfile:jsonrpc:dbg -vfile:ofctrl:dbg
> >> +
> >> +# Monitor log file until flow finally uploaded to OVS
> >> +OVS_WAIT_UNTIL([grep -q 'Setting lport.*in OVS' hv1/ovn-controller.log])
> >> +
> >> +# Analyse log file, select records about:
> >> +# 1. monitor_cond changes made for SB DB (message class is 'jsonrpc')
> >> +# 2. 'clearing all flows' message which is issued after 'wait before
> >> +#    clear' stage released (message class is 'ofctrl')
> >> +#
> >> +# We expect here that all monitoring condition changes should be made 
> >> before
> >> +# OVS flow cleared / uploaded.
> >> +# For now all monitoring updates comes in three iterations: initial,
> >> +# direct dps, indirect dps that corresponds to
> >> +# three messages of type 1 followed by one message of type 2
> >> +#
> >> +# For monitor-all=true one message of type 1 followed by one message of 
> >> type 2
> >> +#
> >> +# Then we cut off message class and take first letter
> >> +# (j for jsonrpc and o for ofctrl)
> >> +#
> >> +call_seq=`grep -E \
> >> + "(clearing all flows)|(monitor_cond.*South)" \
> >> + hv1/ovn-controller.log | cut -d "|" -f 3- | cut -b 1 | tr -d '\n'`
> >> +AT_CHECK([echo $call_seq | grep -qE "^j+o$"], [0])
> >> +
> >> +OVN_CLEANUP([hv1])
> >> +AT_CLEANUP
> >> +])
> >>   
> >>   AT_SETUP([ovn-controller - check ovn-chassis-mac-mappings])
> >>   
> >> -- 
> >> 2.47.0
> >>
> >> _______________________________________________
> >> dev mailing list
> >> [email protected]
> >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> 
> 
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to