On Thu, Apr 29, 2021 at 9:25 AM Numan Siddique <num...@ovn.org> wrote:
>
> On Wed, Apr 28, 2021 at 3:10 PM Han Zhou <hz...@ovn.org> wrote:
> >
> > On Wed, Apr 28, 2021 at 3:34 AM Numan Siddique <num...@ovn.org> wrote:
> > >
> > > On Wed, Apr 28, 2021 at 3:31 AM Han Zhou <hz...@ovn.org> wrote:
> > > >
> > > > Thanks Numan for working on this issue!
> > > >
> > > > On Mon, Apr 26, 2021 at 3:53 AM <num...@ovn.org> wrote:
> > > > >
> > > > > From: Numan Siddique <num...@ovn.org>
> > > > >
> > > > > Presently we do ovs_assert(del_f->installed_flows) in
> > > > > merged_tracked_flows() of ofctrl.c if 'del_f' and flow 'f' are
equal
> > > > > and if 'del_f' is not installed.  But there are a couple of
scenarios
> > > > > this can happen:
> > > > >
> > > > > 1. A flow 'F1' was added to the desired flows, but ofctrl_put()
> > couldn't
> > > > > install the flow (because of the rconn queue is full) and an SB
update
> > > > > caused 'F1' to be removed and re-added as 'F2'.
> > > > >
> > > >
> > > > This seems not possible, because del_f->installed_flow is set
whenever a
> > > > "installed_flow" is added to the installed_flows table, regardless
of
> > > > whether it is sent to OVS or not.
> > >
> > > This can happen if ofctrl_can_put() returns false in which case
> > > ovn-controller will
> > > not call ofctrl_put() at all. There is a theoretical  possibility here
> > for sure.
> > >
> > > Let me know if you think otherwise.
> > >
> >
> > Oh, sorry that I misunderstood. You are right that ofctrl_put() may
return
> > immediately without installing the desired flow F1, but if that happens,
> > F1->installed_flow should be NULL, and then it is the same case as in
> > scenario 2) below: when F1 is deleted, it would be destroyed in
> > track_flow_del(). So we should never end up with a tracked deleted flow
> > with installed_flow being NULL, right? The logic is, in theory, when a
> > "desired but not yet installed" flow is being deleted, we should simply
> > destroy it and remove it from tracking. Maybe we should check if there
are
> > other possibilities.
>
> You're right.  F1 should be destroyed in track_flow_del().
>
> I got access to the core dump and I can see below details of "f" and
"del_f".
>
> Please let me know if this rings any bell
>
> -----
> (gdb) frame 6
> #6  0x00005569459bfbaa in merge_tracked_flows
> (flow_table=0x5569470c5b40) at
>
/usr/src/debug/ovn2.13-20.12.0-24.el8fdp.x86_64/openvswitch-2.14.90/include/openvswitch/hmap.h:283
> 283             hmap_expand_at(hmap, where);
> (gdb) print del_f
> $1 = (struct desired_flow *) 0x556948c670b0
> (gdb) print *del_f
> $2 = {flow = {table_id = 37 '%', priority = 100, match = {{{flow =
> 0x55694909a540, mask = 0x55694909a560}, flows = {0x55694909a540,
> 0x55694909a560}}, tun_md = 0x0}, hash = 2338012780, ofpacts =
> 0x556948d96460, ofpacts_len = 432, cookie = 1418516517},
>   match_hmap_node = {hash = 2338012780, next = 0x0}, list_node = {prev
> = 0x556948c67100, next = 0x556948c67100}, references = {prev =
> 0x556948c67110, next = 0x556948c67110}, installed_flow = 0x0,
> installed_ref_list_node = {prev = 0x5569495bcd90,
>     next = 0x5569495bcd90}, track_list_node = {prev = 0x5569470c5b88,
> next = 0x556948d2bfc8}, is_deleted = true}
>
Looking at the content of the del_f, the installed_ref_list_node is not
pointing to the field of the flow itself, which means at least it was in
some installed_flow's desired_refs list, which means it was installed at
least at some point before. Now that the "installed_flow" is NULL, it is
possible that the desired flow was installed but then unlinked from the
installed flow. But by reviewing the code I didn't find any path that
"unlink_installed_to_desired()" or "replace_installed_to_desired()" could
happen to this flow before this point. If it is unlinked it should also be
removed from the track list and also from the deleted_flows hmap and
destroyed so it should never be found again. Of course it is also possible
that this desired_flow data structure is corrupted so the fields
nstalled_ref_list_node and installed_flow just have inconsistent data. I
will keep looking into the code. At the same time, is this reproducible? We
could add some more logs to help debugging if this happens again.

>
> (gdb) print f
> $7 = (struct desired_flow *) 0x556948d2bf40
>
> (gdb) print *f
> $6 = {flow = {table_id = 37 '%', priority = 100, match = {{{flow =
> 0x556948eaf1a0, mask = 0x556948eaf1c0}, flows = {0x556948eaf1a0,
> 0x556948eaf1c0}}, tun_md = 0x0}, hash = 2338012780, ofpacts =
> 0x5569472a15e0, ofpacts_len = 432, cookie = 1418516517},
>   match_hmap_node = {hash = 2338012780, next = 0x556948da8070},
> list_node = {prev = 0x556948d2bf90, next = 0x556948d2bf90}, references
> = {prev = 0x556947b35980, next = 0x556947b35980}, installed_flow =
> 0x0, installed_ref_list_node = {prev = 0x556948d2bfb8,
>     next = 0x556948d2bfb8}, track_list_node = {prev = 0x556948c67138,
> next = 0x556948e68658}, is_deleted = false}
>
> ------
>
> Looks like there are some issues with physical flow handling.  Seems
> to me an installed flow F1 is deleted and in the same loop, F2 is
> added
> to the tracked flow which has the same match, actions and cookie.
>
> Table id is 37 which is OFTABLE_REMOTE_OUTPUT.  The SB resource can be
> either port binding or multicast group.
>
> Thanks for the comments so far. I will continue debugging.  This patch
> can be dropped.
>
> Numan
>
> >
> > > We observed this assertion during the upgrades of openshift to an ovn
> > version
> > > and there were a lot of "unreasonable timeout" warnings.
> > >
> > > The updated OVN version had the commit [1] which led me to think that
> > > commit [1] can also cause this.
> > >
> > > >
> > > > > 2. In a single engine run, a flow 'F1' was added to the desired
flow,
> > > > > removed from the desired flow and re-added as 'F2'.  This could
> > > > > happen after the commit [1].
> > > >
> > > > In theory this should not happen either, because if F1 was added and
> > then
> > > > removed before it was installed, it would have been removed in
> > > > track_flow_del():
> > > >
> > > >         if (!f->installed_flow) {
> > > >             /* If it is not installed yet, simply destroy it. */
> > > >             desired_flow_destroy(f);
> > > >             return;
> > > >         }
> > > >
> > > > In fact this is what the comment is supposed to say, but the comment
> > had a
> > > > typo (my bad):
> > > >
> > > >             /* del_f must have been installed, otherwise it should
have
> > been
> > > >             * removed during track_flow_add_or_modify. */
> > > >
> > > > s/track_flow_add_or_modify/track_flow_del
> > >
> > > This typo really confused me :). From your comments the scenario 2
seems
> > > not possible.
> > >
> > > But I still think scenario 1 can happen if ofctrl_can_put() returns
FALSE.
> > > What do you think ?
> > >
> > > Thanks
> > > Numan
> > >
> > > >
> > > > In practice, if commit [1] did trigger this bt, then I'd look deeper
> > into
> > > > the logic, but I think removing the assert may just hide the
problem.
> > > >
> > > > >
> > > > > The likelihood of (2) seems to be higher with datapath groups
enabled.
> > > > >
> > > > > The assertion - ovs_assert(del_f->installed_flows) is observed in
> > > > > few deployments after the commit [1].  This patch addressed this
issue
> > > > > by removing that assertion.  Removing this assertion should not
have
> > > > > any side effects as the flow 'F2' will be installed.
> > > > >
> > > > > This patch is proposed based on the code inspection and the
> > possibility
> > > > > that the above mentioned scenarios can happen.  The patch doesn't
have
> > > > > any test cases to reproduce these scenarios.
> > > > >
> > > > > [1] - c6c61b4e3462("ofctrl: Fix the assert seen when flood
removing
> > flows
> > > > with conj actions.")
> > > > >
> > > > > Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1951502
> > > > > Signed-off-by: Numan Siddique <num...@ovn.org>
> > > > > ---
> > > > >  controller/ofctrl.c | 26 ++++++++++++++++++++++----
> > > > >  1 file changed, 22 insertions(+), 4 deletions(-)
> > > > >
> > > > > diff --git a/controller/ofctrl.c b/controller/ofctrl.c
> > > > > index c29c3d180..9e960b034 100644
> > > > > --- a/controller/ofctrl.c
> > > > > +++ b/controller/ofctrl.c
> > > > > @@ -1931,11 +1931,29 @@ merge_tracked_flows(struct
> > ovn_desired_flow_table
> > > > *flow_table)
> > > > >                  continue;
> > > > >              }
> > > > >
> > > > > -            /* del_f must have been installed, otherwise it
should
> > have
> > > > been
> > > > > -             * removed during track_flow_add_or_modify. */
> > > > > -            ovs_assert(del_f->installed_flow);
> > > > > +            if (!del_f->installed_flow) {
> > > > > +                /* del_f must have been installed, otherwise it
> > should
> > > > have
> > > > > +                 * been removed during track_flow_add_or_modify.
> > > > > +                 *
> > > > > +                 * But however there are a couple of scenarios
this
> > may
> > > > not
> > > > > +                 * happen.
> > > > > +                 *
> > > > > +                 * Scenario 1:  A flow was added to the desired
> > flows,
> > > > but
> > > > > +                 *              ofctrl_put() couldn't install the
> > flow
> > > > and
> > > > > +                 *              an SB update caused the 'del_f'
to be
> > > > removed
> > > > > +                 *              and re-added as 'f'.
> > > > > +                 *
> > > > > +                 * Scenario 2:  In a  single engine run, a flow
> > 'del_f'
> > > > was
> > > > > +                 *              added to the desired flow,
removed
> > from
> > > > the
> > > > > +                 *              desired flow and re-added as 'f'.
> > This
> > > > could
> > > > > +                 *              happen after the commit
c6c61b4e3462.
> > > > > +                 *
> > > > > +                 * Treat both the scenarios as valid scenarios
and
> > just
> > > > remove
> > > > > +                 * 'del_f' from the hmap - deleted_flows.
> > > > > +                 * update_installed_flows_by_track() will install
> > 'f'.
> > > > > +                 */
> > > > >
> > > > > -            if (!f->installed_flow) {
> > > > > +            } else if (!f->installed_flow) {
> > > > >                  /* f is not installed yet. */
> > > > >
 replace_installed_to_desired(del_f->installed_flow,
> > > > del_f, f);
> > > > >              } else {
> > > > > --
> > > > > 2.29.2
> > > > >
> > > > > _______________________________________________
> > > > > dev mailing list
> > > > > d...@openvswitch.org
> > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > > > _______________________________________________
> > > > dev mailing list
> > > > d...@openvswitch.org
> > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > > >
> > _______________________________________________
> > dev mailing list
> > d...@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to