On 5/27/25 5:54 PM, Felix Huettner wrote: Hi Felix,
I am sorry, I misunderstood your plan with countdown. If you want to trigger it after first update is utilized all is correct. So, I cancel my review complain and will use in my code suggested function instead of direct compare of ovnsb_expected_cond_seqno vs ovnsb_cond_seqno Thank you, Alexander > On Fri, May 23, 2025 at 10:02:59AM +0000, Smirnov Aleksandr (K2 Cloud) wrote: >> Hi Felix, >> >> Answering your propose, yes, I can reuse this solution instead of direct >> check seqno vs expected seqno. >> However, I have inspected what you are doing in your patch and want to >> point out to one mistake (at least by my opinion). > Hi Alexander, > > thanks for the review. > >> When I tried to play with seqno/expected seqno numbers I realized that >> they only set to meaningful values BELOW the >> line >> ovnsb_expected_cond_seqno = >> update_sb_monitors( >> ovnsb_idl_loop.idl, chassis, >> &runtime_data->local_lports, >> &runtime_data->lbinding_data.bindings, >> &runtime_data->local_datapaths, >> sb_monitor_all); > So per default ovnsb_expected_cond_seqno is initialized to UINT_MAX. > Then i see two locations where it might be set to an actual value: > > 1. At the start of the main loop in update_sb_db, but only if > monitor_all is set > 2. The point you mentioned that happens after the engine computed > completely. > >> So, if you want to make meaningful comparison of seqno(s) to answer the >> question 'are there upcoming updates' you have to that check after >> update_sb_monitors() call and before next iteration of main loop. > Or at the start of the main loop if we are sure that ovnsb_expected_cond_seqno > is valid. I did this in my patch with a > `ovnsb_expected_cond_seqno != UINT_MAX`. > >> Otherwise, (again by my opinion) your code will always catch guard >> checks (time, max number, etc) > I am not sure if i get that point correctly. I want to share an example > case based on my understanding of what you mean. You can tell me if i got > it right: > > We assume a case where monitor-all=false. > When starting ovn-controller we will connect to the local ovsdb and > southbound. We then run the engine the first time and set the initial > monitoring conditions. > At this point ovnsb_expected_cond_seqno is valid and larger than > ovnsb_cond_seqno. > > Before we now get the initial monitor results from southbound something > happens that prevents the engine from running (e.g. northd_version_match > is false). > At some point the initial monitor updates arrive which will cause > ovnsb_expected_cond_seqno == ovnsb_cond_seqno. At least in my patch any > further main loop iteration would now cause daemon_started_recently_countdown > to be triggered, even though the engine never had the chance to update > and define a new ovnsb_expected_cond_seqno. > > If the engine can then start running again daemon_started_recently() > might return true, even though we did not yet allow for up to 20 times > of initial condition changes. > >> This is still dark area for me, but at least I need to share my >> understanding of seqno life cycle. > Same for me and i think you found a valid point. > > However it would be great if you could confirm the above understading. > If you meant a different point then maybe i need to fix something else > as well. > > I will try to fix the above and send a new version of the patchset if > you confirm that this was the point you meant. > > So thank you very much for finding that. > > Thanks, > Felix > >> Thank you, >> >> Alexander >> >> On 5/7/25 5:26 PM, Felix Huettner wrote: >>> On Tue, Mar 18, 2025 at 01:08:12PM +0300, Aleksandr Smirnov wrote: >>>> Delay initial flow upload to OVS until all monitored updates received. >>>> This is a kind of replacement of the wait-before-clear parameter. >>>> Instead of waiting a certain amount of time we will wait >>>> until the final monitored update comes from SB DB. >>>> An event we watch in the controller's main loop is current condition >>>> sequence number compared vs expected condition sequence number. >>>> If they are not equal we still have to receive updates in response >>>> to recent monitor condition change request. This check makes sense >>>> only in code that lies after update_sb_monitors() function call. >>>> >>>> Note, this update will only work if wait-before-clear == 0, >>>> i.e. you can still rely on wait-before-clear behavior. >>>> >>>> Signed-off-by: Aleksandr Smirnov <[email protected]> >>>> --- >>>> controller/ofctrl.c | 29 ++++++++++++++++- >>>> controller/ofctrl.h | 3 +- >>>> controller/ovn-controller.c | 4 ++- >>>> tests/ovn-controller.at | 62 +++++++++++++++++++++++++++++++++++++ >>>> 4 files changed, 95 insertions(+), 3 deletions(-) >>>> >>>> diff --git a/controller/ofctrl.c b/controller/ofctrl.c >>>> index 4a3d35b97..304b9bbc8 100644 >>>> --- a/controller/ofctrl.c >>>> +++ b/controller/ofctrl.c >>>> @@ -349,6 +349,9 @@ static uint64_t cur_cfg; >>>> /* Current state. */ >>>> static enum ofctrl_state state; >>>> >>>> +/* Release wait before clear stage. */ >>>> +static bool wait_before_clear_proceed = false; >>>> + >>>> /* The time (ms) to stay in the state S_WAIT_BEFORE_CLEAR. Read from >>>> * external_ids: ovn-ofctrl-wait-before-clear. */ >>>> static unsigned int wait_before_clear_time = 0; >>>> @@ -446,6 +449,7 @@ run_S_NEW(void) >>>> struct ofpbuf *buf = ofpraw_alloc(OFPRAW_NXT_TLV_TABLE_REQUEST, >>>> rconn_get_version(swconn), 0); >>>> xid = queue_msg(buf); >>>> + wait_before_clear_proceed = false; >>>> state = S_TLV_TABLE_REQUESTED; >>>> } >>>> >>>> @@ -638,6 +642,14 @@ error: >>>> static void >>>> run_S_WAIT_BEFORE_CLEAR(void) >>>> { >>>> + if (wait_before_clear_time == 0) { >>>> + if (wait_before_clear_proceed) { >>>> + state = S_CLEAR_FLOWS; >>>> + } >>>> + >>>> + return; >>>> + } >>>> + >>>> if (!wait_before_clear_time || >>>> (wait_before_clear_expire && >>>> time_msec() >= wait_before_clear_expire)) { >>>> @@ -2695,11 +2707,26 @@ ofctrl_put(struct ovn_desired_flow_table >>>> *lflow_table, >>>> uint64_t req_cfg, >>>> bool lflows_changed, >>>> bool pflows_changed, >>>> - struct tracked_acl_ids *tracked_acl_ids) >>>> + struct tracked_acl_ids *tracked_acl_ids, >>>> + bool monitor_cond_complete) >>>> { >>>> static bool skipped_last_time = false; >>>> static uint64_t old_req_cfg = 0; >>>> bool need_put = false; >>>> + >>>> + if (state == S_WAIT_BEFORE_CLEAR) { >>>> + /* If no more monitored condition changes expected >>>> + Release wait before clear stage and skip >>>> + over poll wait. */ >>>> + if (monitor_cond_complete) { >>>> + wait_before_clear_proceed = true; >>>> + poll_immediate_wake(); >>>> + } >>>> + >>>> + skipped_last_time = true; >>>> + return; >>>> + } >>>> + >>>> if (lflows_changed || pflows_changed || skipped_last_time || >>>> ofctrl_initial_clear) { >>>> need_put = true; >>>> diff --git a/controller/ofctrl.h b/controller/ofctrl.h >>>> index d1ee69cb0..76e2fbece 100644 >>>> --- a/controller/ofctrl.h >>>> +++ b/controller/ofctrl.h >>>> @@ -68,7 +68,8 @@ void ofctrl_put(struct ovn_desired_flow_table >>>> *lflow_table, >>>> uint64_t nb_cfg, >>>> bool lflow_changed, >>>> bool pflow_changed, >>>> - struct tracked_acl_ids *tracked_acl_ids); >>>> + struct tracked_acl_ids *tracked_acl_ids, >>>> + bool monitor_cond_complete); >>>> bool ofctrl_has_backlog(void); >>>> void ofctrl_wait(void); >>>> void ofctrl_destroy(void); >>>> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c >>>> index bc8acddf1..2623fc758 100644 >>>> --- a/controller/ovn-controller.c >>>> +++ b/controller/ovn-controller.c >>>> @@ -6384,7 +6384,9 @@ main(int argc, char *argv[]) >>>> ofctrl_seqno_get_req_cfg(), >>>> >>>> engine_node_changed(&en_lflow_output), >>>> >>>> engine_node_changed(&en_pflow_output), >>>> - tracked_acl_ids); >>>> + tracked_acl_ids, >>>> + ovnsb_cond_seqno >>>> + == ovnsb_expected_cond_seqno); >>> Hi Aleksandr, >>> >>> i just wanted to let you know that we could probably combine this with >>> https://patchwork.ozlabs.org/project/ovn/patch/67010dca0a8dcc390e873349d662c9163e01050f.1746623091.git.felix.huettner@stackit.cloud/ >>> then you could use "!daemon_started_recently()" here. >>> >>> But that is just for information. >>> >>> Thanks a lot, >>> Felix >>> >>>> stopwatch_stop(OFCTRL_PUT_STOPWATCH_NAME, >>>> time_msec()); >>>> } >>>> stopwatch_start(OFCTRL_SEQNO_RUN_STOPWATCH_NAME, >>>> diff --git a/tests/ovn-controller.at b/tests/ovn-controller.at >>>> index efb0a1741..812616711 100644 >>>> --- a/tests/ovn-controller.at >>>> +++ b/tests/ovn-controller.at >>>> @@ -2463,6 +2463,68 @@ AT_CHECK([ovs-ofctl dump-flows br-int | grep group >>>> -c], [0], [3 >>>> OVN_CLEANUP([hv1]) >>>> AT_CLEANUP >>>> >>>> +OVN_FOR_EACH_NORTHD([ >>>> +AT_SETUP([ovn-controller - ofctrl delay until all monitored updates come]) >>>> + >>>> +# Prepare testing configuration >>>> +ovn_start >>>> + >>>> +net_add n1 >>>> +sim_add hv1 >>>> +as hv1 >>>> +check ovs-vsctl add-br br-phys >>>> +ovn_attach n1 br-phys 192.168.0.1 >>>> +check ovs-vsctl -- add-port br-int hv1-vif1 -- \ >>>> + set interface hv1-vif1 external-ids:iface-id=ls1-lp1 >>>> + >>>> +check ovn-nbctl ls-add ls1 >>>> + >>>> +check ovn-nbctl lsp-add ls1 ls1-lp1 \ >>>> +-- lsp-set-addresses ls1-lp1 "f0:00:00:00:00:01 10.1.2.3" >>>> + >>>> +check ovn-nbctl lb-add lb1 1.1.1.1 10.1.2.3 \ >>>> +-- ls-lb-add ls1 lb1 >>>> + >>>> +check ovn-nbctl lb-add lb2 2.2.2.2 10.1.2.4 \ >>>> +-- ls-lb-add ls1 lb2 >>>> + >>>> +check ovn-nbctl --wait=hv sync >>>> + >>>> +# Stop ovn-controller >>>> +OVS_APP_EXIT_AND_WAIT([ovn-controller]) >>>> + >>>> +# Sine we test initial flow upload controller is restarted. >>>> +# Clear log file and start controller. >>>> +rm -f hv1/ovn-controller.log >>>> +start_daemon ovn-controller -vfile:jsonrpc:dbg -vfile:ofctrl:dbg >>>> + >>>> +# Monitor log file until flow finally uploaded to OVS >>>> +OVS_WAIT_UNTIL([grep -q 'Setting lport.*in OVS' hv1/ovn-controller.log]) >>>> + >>>> +# Analyse log file, select records about: >>>> +# 1. monitor_cond changes made for SB DB (message class is 'jsonrpc') >>>> +# 2. 'clearing all flows' message which is issued after 'wait before >>>> +# clear' stage released (message class is 'ofctrl') >>>> +# >>>> +# We expect here that all monitoring condition changes should be made >>>> before >>>> +# OVS flow cleared / uploaded. >>>> +# For now all monitoring updates comes in three iterations: initial, >>>> +# direct dps, indirect dps that corresponds to >>>> +# three messages of type 1 followed by one message of type 2 >>>> +# >>>> +# For monitor-all=true one message of type 1 followed by one message of >>>> type 2 >>>> +# >>>> +# Then we cut off message class and take first letter >>>> +# (j for jsonrpc and o for ofctrl) >>>> +# >>>> +call_seq=`grep -E \ >>>> + "(clearing all flows)|(monitor_cond.*South)" \ >>>> + hv1/ovn-controller.log | cut -d "|" -f 3- | cut -b 1 | tr -d '\n'` >>>> +AT_CHECK([echo $call_seq | grep -qE "^j+o$"], [0]) >>>> + >>>> +OVN_CLEANUP([hv1]) >>>> +AT_CLEANUP >>>> +]) >>>> >>>> AT_SETUP([ovn-controller - check ovn-chassis-mac-mappings]) >>>> >>>> -- >>>> 2.47.0 >>>> >>>> _______________________________________________ >>>> dev mailing list >>>> [email protected] >>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >> _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
