On Wed, Mar 20, 2024 at 4:07 PM Han Zhou <hz...@ovn.org> wrote:
>
>
>
> On Mon, Mar 18, 2024 at 11:27 AM Mark Michelson <mmich...@redhat.com>
wrote:
> >
> > Hi Han,
> >
> > I have a comment below
> >
> > On 3/5/24 01:27, Han Zhou wrote:
> > > The ovn-ofctrl-wait-before-clear setting is designed to minimize
> > > downtime during the initial start-up of the ovn-controller. For this
> > > purpose, the ovn-controller should wait only once upon entering the
> > > S_WAIT_BEFORE_CLEAR state for the first time. Subsequent reconnections
> > > to the OVS, such as those occurring during an OVS restart/upgrade,
> > > should not trigger this wait. However, the current implemention always
> > > waits for the configured time in the S_WAIT_BEFORE_CLEAR state, which
> > > can inadvertently delay flow installations during OVS restart/upgrade,
> > > potentially causing more harm than good. (The extent of the impact
> > > varies based on the method used to restart OVS, including whether flow
> > > save/restore tools and the flow-restore-wait feature are employed.)
> > >
> > > This patch avoids the unnecessary wait after the initial one.
> > >
> > > Fixes: 896adfd2d8b3 ("ofctrl: Support ovn-ofctrl-wait-before-clear to
reduce down time during upgrade.")
> > > Signed-off-by: Han Zhou <hz...@ovn.org>
> > > ---
> > >   controller/ofctrl.c     | 1 -
> > >   tests/ovn-controller.at | 9 +++++++--
> > >   2 files changed, 7 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/controller/ofctrl.c b/controller/ofctrl.c
> > > index f14cd79a8dbb..0d72ecbaa167 100644
> > > --- a/controller/ofctrl.c
> > > +++ b/controller/ofctrl.c
> > > @@ -634,7 +634,6 @@ run_S_WAIT_BEFORE_CLEAR(void)
> > >       if (!wait_before_clear_time ||
> > >           (wait_before_clear_expire &&
> > >            time_msec() >= wait_before_clear_expire)) {
> > > -        wait_before_clear_expire = 0;
> > >           state = S_CLEAR_FLOWS;
> > >           return;
> > >       }
> > > diff --git a/tests/ovn-controller.at b/tests/ovn-controller.at
> > > index 37f1ded1bd26..b65e11722cbb 100644
> > > --- a/tests/ovn-controller.at
> > > +++ b/tests/ovn-controller.at
> > > @@ -2284,10 +2284,15 @@ lflow_run_2=$(ovn-appctl -t ovn-controller
coverage/read-counter lflow_run)
> > >   AT_CHECK_UNQUOTED([echo $lflow_run_1], [0], [$lflow_run_2
> > >   ])
> > >
> > > -# Restart OVS this time, and wait until flows are reinstalled
> > > +# Restart OVS this time. Flows should be reinstalled without waiting.
> > >   OVS_APP_EXIT_AND_WAIT([ovs-vswitchd])
> > >   start_daemon ovs-vswitchd --enable-dummy=system -vvconn
-vofproto_dpif -vunixctl
> > > -OVS_WAIT_UNTIL([ovs-ofctl dump-flows br-int | grep -F 10.1.2.4 |
grep -vF 2.2.2.2])
> > > +
> > > +# Sleep for 3s, which is long enough for the flows to be installed,
but
> > > +# shorter than the wait-before-clear (5s), to make sure the flows
are installed
> > > +# without waiting.
> > > +sleep 3
> >
> > This change makes me nervous. The comment makes sense. However, I worry
> > that on slow or loaded systems, relying on the flows to be written
> > within 3 seconds may not always work out.
> >
> > If there were a way to peek into the ofctrl state machine and check that
> > we have moved off of S_WAIT_BEFORE_CLEAR by this point, that might work
> > better. But that is something that is hard to justify exposing.
> >
> > I came up with this possible idea:
> >   * set wait-before-clear to a time longer than OVS_CTL_TIMEOUT (e.g. 60
> > seconds)
> >   * Restart ovs
> >   * Use OVS_WAIT_UNTIL(...), just like the test used to do.
> >
> > This way, we get plenty of opportunities to ensure the flows were
> > written. In most cases, this probably will actually be quicker than the
> > 3 second sleep added in this patch. However, if it takes longer than 3
> > seconds, then the test can still pass. If the flows get written
> > properly, then we know ovn-controller did not wait for the
> > wait-before-clear time.
> >
>
> Hi Mark, thanks for your comment! I agree with you that sleep for 3s is
not very reliable. Your suggestion looks better, but I think there is still
a potential problem. The approach assumes that ovn-controller will always
apply the new settings of ofctrl-wait-before-clear. It is true for the
current implementation, but there is nothing preventing us from removing
this logic, so that ovn-controller ignores any ofctrl-wait-before-clear
setting changes after startup. In fact, it is more reasonable to ignore it.
So, let's assume ovn-controller doesn't take care of the changes of the
settings. In this test case, the setting is initially 5s when
ovn-controller starts, and later changing it to 60s doesn't take effect and
ovn-controller still uses the 5s value. And then let's assume
ovn-controller still waits for the 5s after OVS is restarted, and the test
case will pass because OVS_WAIT_UNTIL can wait for much longer than 5s. So
the test will be incorrect.
>
> To avoid this potential problem, I tried another approach. I figured that
simply adding a "ovn-nbctl --wait=hv sync" is sufficient to ensure
ovn-controller had the chance to install flows, without the need to sleep.
So please see if the below diff looks good:
>
>
---------------------------------------------------------------------------------------------------------------------------
> diff --git a/tests/ovn-controller.at b/tests/ovn-controller.at
> index 0da6570c5882..3202f0beff46 100644
> --- a/tests/ovn-controller.at
> +++ b/tests/ovn-controller.at
> @@ -2328,10 +2328,10 @@ AT_CHECK_UNQUOTED([echo $lflow_run_1], [0],
[$lflow_run_2
>  OVS_APP_EXIT_AND_WAIT([ovs-vswitchd])
>  start_daemon ovs-vswitchd --enable-dummy=system -vvconn -vofproto_dpif
-vunixctl
>
> -# Sleep for 3s, which is long enough for the flows to be installed, but
> -# shorter than the wait-before-clear (5s), to make sure the flows are
installed
> -# without waiting.
> -sleep 3
> +# Sync to make sure ovn-controller is given enough time to install the
flows.
> +check ovn-nbctl --wait=hv sync
> +
> +# Flow should be installed without any extra waiting.
>  AT_CHECK([ovs-ofctl dump-flows br-int | grep -F 10.1.2.4 | grep -vF
2.2.2.2], [0], [ignore])
>
>  check ovn-nbctl --wait=hv lb-add lb3 3.3.3.3 10.1.2.5 \
>
---------------------------------------------------------------------------------------------------------------------------
>
> Thanks,
> Han
>
I sent the above change as v2:

https://patchwork.ozlabs.org/project/ovn/patch/20240328065835.789596-1-hz...@ovn.org/

Thanks,
Han

> > > +AT_CHECK([ovs-ofctl dump-flows br-int | grep -F 10.1.2.4 | grep -vF
2.2.2.2], [0], [ignore])
> > >
> > >   check ovn-nbctl --wait=hv lb-add lb3 3.3.3.3 10.1.2.5 \
> > >   -- ls-lb-add ls1 lb3
> >
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to