On Wed, Mar 20, 2024 at 4:07 PM Han Zhou <hz...@ovn.org> wrote: > > > > On Mon, Mar 18, 2024 at 11:27 AM Mark Michelson <mmich...@redhat.com> wrote: > > > > Hi Han, > > > > I have a comment below > > > > On 3/5/24 01:27, Han Zhou wrote: > > > The ovn-ofctrl-wait-before-clear setting is designed to minimize > > > downtime during the initial start-up of the ovn-controller. For this > > > purpose, the ovn-controller should wait only once upon entering the > > > S_WAIT_BEFORE_CLEAR state for the first time. Subsequent reconnections > > > to the OVS, such as those occurring during an OVS restart/upgrade, > > > should not trigger this wait. However, the current implemention always > > > waits for the configured time in the S_WAIT_BEFORE_CLEAR state, which > > > can inadvertently delay flow installations during OVS restart/upgrade, > > > potentially causing more harm than good. (The extent of the impact > > > varies based on the method used to restart OVS, including whether flow > > > save/restore tools and the flow-restore-wait feature are employed.) > > > > > > This patch avoids the unnecessary wait after the initial one. > > > > > > Fixes: 896adfd2d8b3 ("ofctrl: Support ovn-ofctrl-wait-before-clear to reduce down time during upgrade.") > > > Signed-off-by: Han Zhou <hz...@ovn.org> > > > --- > > > controller/ofctrl.c | 1 - > > > tests/ovn-controller.at | 9 +++++++-- > > > 2 files changed, 7 insertions(+), 3 deletions(-) > > > > > > diff --git a/controller/ofctrl.c b/controller/ofctrl.c > > > index f14cd79a8dbb..0d72ecbaa167 100644 > > > --- a/controller/ofctrl.c > > > +++ b/controller/ofctrl.c > > > @@ -634,7 +634,6 @@ run_S_WAIT_BEFORE_CLEAR(void) > > > if (!wait_before_clear_time || > > > (wait_before_clear_expire && > > > time_msec() >= wait_before_clear_expire)) { > > > - wait_before_clear_expire = 0; > > > state = S_CLEAR_FLOWS; > > > return; > > > } > > > diff --git a/tests/ovn-controller.at b/tests/ovn-controller.at > > > index 37f1ded1bd26..b65e11722cbb 100644 > > > --- a/tests/ovn-controller.at > > > +++ b/tests/ovn-controller.at > > > @@ -2284,10 +2284,15 @@ lflow_run_2=$(ovn-appctl -t ovn-controller coverage/read-counter lflow_run) > > > AT_CHECK_UNQUOTED([echo $lflow_run_1], [0], [$lflow_run_2 > > > ]) > > > > > > -# Restart OVS this time, and wait until flows are reinstalled > > > +# Restart OVS this time. Flows should be reinstalled without waiting. > > > OVS_APP_EXIT_AND_WAIT([ovs-vswitchd]) > > > start_daemon ovs-vswitchd --enable-dummy=system -vvconn -vofproto_dpif -vunixctl > > > -OVS_WAIT_UNTIL([ovs-ofctl dump-flows br-int | grep -F 10.1.2.4 | grep -vF 2.2.2.2]) > > > + > > > +# Sleep for 3s, which is long enough for the flows to be installed, but > > > +# shorter than the wait-before-clear (5s), to make sure the flows are installed > > > +# without waiting. > > > +sleep 3 > > > > This change makes me nervous. The comment makes sense. However, I worry > > that on slow or loaded systems, relying on the flows to be written > > within 3 seconds may not always work out. > > > > If there were a way to peek into the ofctrl state machine and check that > > we have moved off of S_WAIT_BEFORE_CLEAR by this point, that might work > > better. But that is something that is hard to justify exposing. > > > > I came up with this possible idea: > > * set wait-before-clear to a time longer than OVS_CTL_TIMEOUT (e.g. 60 > > seconds) > > * Restart ovs > > * Use OVS_WAIT_UNTIL(...), just like the test used to do. > > > > This way, we get plenty of opportunities to ensure the flows were > > written. In most cases, this probably will actually be quicker than the > > 3 second sleep added in this patch. However, if it takes longer than 3 > > seconds, then the test can still pass. If the flows get written > > properly, then we know ovn-controller did not wait for the > > wait-before-clear time. > > > > Hi Mark, thanks for your comment! I agree with you that sleep for 3s is not very reliable. Your suggestion looks better, but I think there is still a potential problem. The approach assumes that ovn-controller will always apply the new settings of ofctrl-wait-before-clear. It is true for the current implementation, but there is nothing preventing us from removing this logic, so that ovn-controller ignores any ofctrl-wait-before-clear setting changes after startup. In fact, it is more reasonable to ignore it. So, let's assume ovn-controller doesn't take care of the changes of the settings. In this test case, the setting is initially 5s when ovn-controller starts, and later changing it to 60s doesn't take effect and ovn-controller still uses the 5s value. And then let's assume ovn-controller still waits for the 5s after OVS is restarted, and the test case will pass because OVS_WAIT_UNTIL can wait for much longer than 5s. So the test will be incorrect. > > To avoid this potential problem, I tried another approach. I figured that simply adding a "ovn-nbctl --wait=hv sync" is sufficient to ensure ovn-controller had the chance to install flows, without the need to sleep. So please see if the below diff looks good: > > --------------------------------------------------------------------------------------------------------------------------- > diff --git a/tests/ovn-controller.at b/tests/ovn-controller.at > index 0da6570c5882..3202f0beff46 100644 > --- a/tests/ovn-controller.at > +++ b/tests/ovn-controller.at > @@ -2328,10 +2328,10 @@ AT_CHECK_UNQUOTED([echo $lflow_run_1], [0], [$lflow_run_2 > OVS_APP_EXIT_AND_WAIT([ovs-vswitchd]) > start_daemon ovs-vswitchd --enable-dummy=system -vvconn -vofproto_dpif -vunixctl > > -# Sleep for 3s, which is long enough for the flows to be installed, but > -# shorter than the wait-before-clear (5s), to make sure the flows are installed > -# without waiting. > -sleep 3 > +# Sync to make sure ovn-controller is given enough time to install the flows. > +check ovn-nbctl --wait=hv sync > + > +# Flow should be installed without any extra waiting. > AT_CHECK([ovs-ofctl dump-flows br-int | grep -F 10.1.2.4 | grep -vF 2.2.2.2], [0], [ignore]) > > check ovn-nbctl --wait=hv lb-add lb3 3.3.3.3 10.1.2.5 \ > --------------------------------------------------------------------------------------------------------------------------- > > Thanks, > Han > I sent the above change as v2:
https://patchwork.ozlabs.org/project/ovn/patch/20240328065835.789596-1-hz...@ovn.org/ Thanks, Han > > > +AT_CHECK([ovs-ofctl dump-flows br-int | grep -F 10.1.2.4 | grep -vF 2.2.2.2], [0], [ignore]) > > > > > > check ovn-nbctl --wait=hv lb-add lb3 3.3.3.3 10.1.2.5 \ > > > -- ls-lb-add ls1 lb3 > > _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev