On Thu, Apr 29, 2021 at 9:25 AM Numan Siddique <num...@ovn.org> wrote: > > On Wed, Apr 28, 2021 at 3:10 PM Han Zhou <hz...@ovn.org> wrote: > > > > On Wed, Apr 28, 2021 at 3:34 AM Numan Siddique <num...@ovn.org> wrote: > > > > > > On Wed, Apr 28, 2021 at 3:31 AM Han Zhou <hz...@ovn.org> wrote: > > > > > > > > Thanks Numan for working on this issue! > > > > > > > > On Mon, Apr 26, 2021 at 3:53 AM <num...@ovn.org> wrote: > > > > > > > > > > From: Numan Siddique <num...@ovn.org> > > > > > > > > > > Presently we do ovs_assert(del_f->installed_flows) in > > > > > merged_tracked_flows() of ofctrl.c if 'del_f' and flow 'f' are equal > > > > > and if 'del_f' is not installed. But there are a couple of scenarios > > > > > this can happen: > > > > > > > > > > 1. A flow 'F1' was added to the desired flows, but ofctrl_put() > > couldn't > > > > > install the flow (because of the rconn queue is full) and an SB update > > > > > caused 'F1' to be removed and re-added as 'F2'. > > > > > > > > > > > > > This seems not possible, because del_f->installed_flow is set whenever a > > > > "installed_flow" is added to the installed_flows table, regardless of > > > > whether it is sent to OVS or not. > > > > > > This can happen if ofctrl_can_put() returns false in which case > > > ovn-controller will > > > not call ofctrl_put() at all. There is a theoretical possibility here > > for sure. > > > > > > Let me know if you think otherwise. > > > > > > > Oh, sorry that I misunderstood. You are right that ofctrl_put() may return > > immediately without installing the desired flow F1, but if that happens, > > F1->installed_flow should be NULL, and then it is the same case as in > > scenario 2) below: when F1 is deleted, it would be destroyed in > > track_flow_del(). So we should never end up with a tracked deleted flow > > with installed_flow being NULL, right? The logic is, in theory, when a > > "desired but not yet installed" flow is being deleted, we should simply > > destroy it and remove it from tracking. Maybe we should check if there are > > other possibilities. > > You're right. F1 should be destroyed in track_flow_del(). > > I got access to the core dump and I can see below details of "f" and "del_f". > > Please let me know if this rings any bell > > ----- > (gdb) frame 6 > #6 0x00005569459bfbaa in merge_tracked_flows > (flow_table=0x5569470c5b40) at > /usr/src/debug/ovn2.13-20.12.0-24.el8fdp.x86_64/openvswitch-2.14.90/include/openvswitch/hmap.h:283 > 283 hmap_expand_at(hmap, where); > (gdb) print del_f > $1 = (struct desired_flow *) 0x556948c670b0 > (gdb) print *del_f > $2 = {flow = {table_id = 37 '%', priority = 100, match = {{{flow = > 0x55694909a540, mask = 0x55694909a560}, flows = {0x55694909a540, > 0x55694909a560}}, tun_md = 0x0}, hash = 2338012780, ofpacts = > 0x556948d96460, ofpacts_len = 432, cookie = 1418516517}, > match_hmap_node = {hash = 2338012780, next = 0x0}, list_node = {prev > = 0x556948c67100, next = 0x556948c67100}, references = {prev = > 0x556948c67110, next = 0x556948c67110}, installed_flow = 0x0, > installed_ref_list_node = {prev = 0x5569495bcd90, > next = 0x5569495bcd90}, track_list_node = {prev = 0x5569470c5b88, > next = 0x556948d2bfc8}, is_deleted = true} > Looking at the content of the del_f, the installed_ref_list_node is not pointing to the field of the flow itself, which means at least it was in some installed_flow's desired_refs list, which means it was installed at least at some point before. Now that the "installed_flow" is NULL, it is possible that the desired flow was installed but then unlinked from the installed flow. But by reviewing the code I didn't find any path that "unlink_installed_to_desired()" or "replace_installed_to_desired()" could happen to this flow before this point. If it is unlinked it should also be removed from the track list and also from the deleted_flows hmap and destroyed so it should never be found again. Of course it is also possible that this desired_flow data structure is corrupted so the fields nstalled_ref_list_node and installed_flow just have inconsistent data. I will keep looking into the code. At the same time, is this reproducible? We could add some more logs to help debugging if this happens again.
> > (gdb) print f > $7 = (struct desired_flow *) 0x556948d2bf40 > > (gdb) print *f > $6 = {flow = {table_id = 37 '%', priority = 100, match = {{{flow = > 0x556948eaf1a0, mask = 0x556948eaf1c0}, flows = {0x556948eaf1a0, > 0x556948eaf1c0}}, tun_md = 0x0}, hash = 2338012780, ofpacts = > 0x5569472a15e0, ofpacts_len = 432, cookie = 1418516517}, > match_hmap_node = {hash = 2338012780, next = 0x556948da8070}, > list_node = {prev = 0x556948d2bf90, next = 0x556948d2bf90}, references > = {prev = 0x556947b35980, next = 0x556947b35980}, installed_flow = > 0x0, installed_ref_list_node = {prev = 0x556948d2bfb8, > next = 0x556948d2bfb8}, track_list_node = {prev = 0x556948c67138, > next = 0x556948e68658}, is_deleted = false} > > ------ > > Looks like there are some issues with physical flow handling. Seems > to me an installed flow F1 is deleted and in the same loop, F2 is > added > to the tracked flow which has the same match, actions and cookie. > > Table id is 37 which is OFTABLE_REMOTE_OUTPUT. The SB resource can be > either port binding or multicast group. > > Thanks for the comments so far. I will continue debugging. This patch > can be dropped. > > Numan > > > > > > We observed this assertion during the upgrades of openshift to an ovn > > version > > > and there were a lot of "unreasonable timeout" warnings. > > > > > > The updated OVN version had the commit [1] which led me to think that > > > commit [1] can also cause this. > > > > > > > > > > > > 2. In a single engine run, a flow 'F1' was added to the desired flow, > > > > > removed from the desired flow and re-added as 'F2'. This could > > > > > happen after the commit [1]. > > > > > > > > In theory this should not happen either, because if F1 was added and > > then > > > > removed before it was installed, it would have been removed in > > > > track_flow_del(): > > > > > > > > if (!f->installed_flow) { > > > > /* If it is not installed yet, simply destroy it. */ > > > > desired_flow_destroy(f); > > > > return; > > > > } > > > > > > > > In fact this is what the comment is supposed to say, but the comment > > had a > > > > typo (my bad): > > > > > > > > /* del_f must have been installed, otherwise it should have > > been > > > > * removed during track_flow_add_or_modify. */ > > > > > > > > s/track_flow_add_or_modify/track_flow_del > > > > > > This typo really confused me :). From your comments the scenario 2 seems > > > not possible. > > > > > > But I still think scenario 1 can happen if ofctrl_can_put() returns FALSE. > > > What do you think ? > > > > > > Thanks > > > Numan > > > > > > > > > > > In practice, if commit [1] did trigger this bt, then I'd look deeper > > into > > > > the logic, but I think removing the assert may just hide the problem. > > > > > > > > > > > > > > The likelihood of (2) seems to be higher with datapath groups enabled. > > > > > > > > > > The assertion - ovs_assert(del_f->installed_flows) is observed in > > > > > few deployments after the commit [1]. This patch addressed this issue > > > > > by removing that assertion. Removing this assertion should not have > > > > > any side effects as the flow 'F2' will be installed. > > > > > > > > > > This patch is proposed based on the code inspection and the > > possibility > > > > > that the above mentioned scenarios can happen. The patch doesn't have > > > > > any test cases to reproduce these scenarios. > > > > > > > > > > [1] - c6c61b4e3462("ofctrl: Fix the assert seen when flood removing > > flows > > > > with conj actions.") > > > > > > > > > > Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1951502 > > > > > Signed-off-by: Numan Siddique <num...@ovn.org> > > > > > --- > > > > > controller/ofctrl.c | 26 ++++++++++++++++++++++---- > > > > > 1 file changed, 22 insertions(+), 4 deletions(-) > > > > > > > > > > diff --git a/controller/ofctrl.c b/controller/ofctrl.c > > > > > index c29c3d180..9e960b034 100644 > > > > > --- a/controller/ofctrl.c > > > > > +++ b/controller/ofctrl.c > > > > > @@ -1931,11 +1931,29 @@ merge_tracked_flows(struct > > ovn_desired_flow_table > > > > *flow_table) > > > > > continue; > > > > > } > > > > > > > > > > - /* del_f must have been installed, otherwise it should > > have > > > > been > > > > > - * removed during track_flow_add_or_modify. */ > > > > > - ovs_assert(del_f->installed_flow); > > > > > + if (!del_f->installed_flow) { > > > > > + /* del_f must have been installed, otherwise it > > should > > > > have > > > > > + * been removed during track_flow_add_or_modify. > > > > > + * > > > > > + * But however there are a couple of scenarios this > > may > > > > not > > > > > + * happen. > > > > > + * > > > > > + * Scenario 1: A flow was added to the desired > > flows, > > > > but > > > > > + * ofctrl_put() couldn't install the > > flow > > > > and > > > > > + * an SB update caused the 'del_f' to be > > > > removed > > > > > + * and re-added as 'f'. > > > > > + * > > > > > + * Scenario 2: In a single engine run, a flow > > 'del_f' > > > > was > > > > > + * added to the desired flow, removed > > from > > > > the > > > > > + * desired flow and re-added as 'f'. > > This > > > > could > > > > > + * happen after the commit c6c61b4e3462. > > > > > + * > > > > > + * Treat both the scenarios as valid scenarios and > > just > > > > remove > > > > > + * 'del_f' from the hmap - deleted_flows. > > > > > + * update_installed_flows_by_track() will install > > 'f'. > > > > > + */ > > > > > > > > > > - if (!f->installed_flow) { > > > > > + } else if (!f->installed_flow) { > > > > > /* f is not installed yet. */ > > > > > replace_installed_to_desired(del_f->installed_flow, > > > > del_f, f); > > > > > } else { > > > > > -- > > > > > 2.29.2 > > > > > > > > > > _______________________________________________ > > > > > dev mailing list > > > > > d...@openvswitch.org > > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > > > _______________________________________________ > > > > dev mailing list > > > > d...@openvswitch.org > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > > > > > _______________________________________________ > > dev mailing list > > d...@openvswitch.org > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev