On Tue, Oct 25, 2022 at 6:20 AM Mark Michelson <mmich...@redhat.com> wrote:
>
> On 10/24/22 23:51, Han Zhou wrote:
> >
> >
> > On Sun, Oct 23, 2022 at 10:21 PM Ales Musil <amu...@redhat.com
> > <mailto:amu...@redhat.com>> wrote:
> >  >
> >  >
> >  >
> >  > On Sun, Oct 23, 2022 at 10:16 PM Han Zhou <zhou...@gmail.com
> > <mailto:zhou...@gmail.com>> wrote:
> >  >>
> >  >>
> >  >>
> >  >> On Fri, Oct 21, 2022 at 1:37 AM Ales Musil <amu...@redhat.com
> > <mailto:amu...@redhat.com>> wrote:
> >  >> >
> >  >> > Run ovs_list_init on list_node from desired flow.
> >  >> > Not doing that can potentially cause crash on
> >  >> > ovs_assert(ovs_list_is_empty(&f->list_node))
> >  >> > this might happen when we have conditional monitoring
> >  >> > enabled and have some flow like below being processed
> >  >> > by ofctrl in a single run.
> >  >> >
> >  >> > flowA is removed, flowA is added, flowA is removed.
> >  >> >
> >  >>
> >  >
> >  > Hi Han,
> >  >
> >  >>
> >  >> Thanks Ales for the patch. First of all I agree this patch is not
> > harmful and is the right thing to do, so I am going to apply it.
> >  >> At the same time, I still don't understand the root cause analysis
> > above and like you said, I am not sure if it would fix the bug. Two
> > questions here:
> >  >
> >  >
> >  > The description is based on some assumptions and might not be the
> > correct way to get into that state.
> >  >
> >  >>
> >  >> 1. How would it relate to conditional monitoring?
> >  >
> >  >
> >  > It is more likely to receive multiple updates at once with
> > conditional monitoring or am I wrong?
> >  >
> > I don't see a reason for this. I think it only depends on what changes
> > happened in SB and notified and pending to ovn-controller during an
> > iteration of ovn-controller control loop, which doesn't seem to have
> > relationship with conditional monitoring. Do you have more details on
> > the reasoning?
> >
> >  >>
> >  >> 2. For the example "flowA is removed, flowA is added, flowA is
> > removed", I assume flowA is "a desired OVS flow", not a logical flow,
> > correct? So when flowA is removed, it should be removed from the desired
> > flow table and added to the "tracked_flows". Then when flowA is added,
> > it would in fact be a newly allocated desired_flow, and no chance to
> > find the earlier tracked deleted flowA from the desired flow table. The
> > 2nd flowA deletion would find the 2nd flowA instance and destroy it, but
> > the first deleted desired_flow instance of flowA would not be referenced
> > any more and have no chance to use the list_node before it is finally
> > destroyed, right?
> >  >
> >  >
> >  > That makes sense, as I said, I was not able to reproduce it, the
> > issue seemed to make sense under those conditions. Anyway I can update
> > the commit message leaving the flows part out.
> >
> > Understand, please update the commit message.
> > I tend to believe that the core dump was caused by some other bug that
> > may result in corruption of the list/memory (in such case this change
> > won't help), and also not sure if it is a problem only with this old
> > version v20.12 or still a problem of the latest code.
> >
> >  >
> >  > Also one thing to keep in mind, it was originally reported against
> > v20.12. We should also backport it if possible.
> >
> > I am not sure if it is a candidate for backporting. Usually we backport
> > critical bug fixes only, especially for such an old version. But in this
> > case the change itself is so simple and we are almost sure it won't
> > cause any harm, although no evidence of any benefit, either.
> >
> > What's your opinion of backporting this? Dumitru, Mark, Numan?
>
> Currently, 21.12 and forward are supported for all bug fixes, but I
> agree that for anything older, it should probably be for
> security/critical fixes only. I think the question is whether a crash
> fix is considered critical. Personally, in this particular case, I'd be
> fine with doing backports further back since
>
> 1) It's fixing a crash.
> 2) The patch is minimally invasive.
>
> That's just my opinion. Others can feel free to counter.

Thanks Mark for the input.
Just to clarify that there is no evidence or solid theory that this is
going to fix a crash.
Anyway, given the feedback so far, there is no objection to backporting it.
I will wait for updates from Ales.

Thanks,
Han

>
> >
> > Thanks,
> > Han
> >  >
> >  > Thanks,
> >  > Ales
> >  >
> >  >>
> >  >>
> >  >> Thanks,
> >  >> Han
> >  >>
> >  >> > This is most likely related to BZ[0].
> >  >> >
> >  >> > [0] https://bugzilla.redhat.com/2125723
> > <https://bugzilla.redhat.com/2125723>
> >  >> > Suggested-By: Dumitru Ceara <dce...@redhat.com
> > <mailto:dce...@redhat.com>>
> >  >> > Signed-off-by: Ales Musil <amu...@redhat.com
> > <mailto:amu...@redhat.com>>
> >  >> > ---
> >  >> > I'm not adding the BZ as Reported-At because this might
> >  >> > not fix the issue they are facing. It seems to be very hard
> >  >> > to reproduce, however it still is a good idea to apply this
> >  >> > fix as in the context it's the right to do and shouldn't cause
> >  >> > any harm.
> >  >> > ---
> >  >> >  controller/ofctrl.c | 1 +
> >  >> >  1 file changed, 1 insertion(+)
> >  >> >
> >  >> > diff --git a/controller/ofctrl.c b/controller/ofctrl.c
> >  >> > index c77991258..fb0e0869c 100644
> >  >> > --- a/controller/ofctrl.c
> >  >> > +++ b/controller/ofctrl.c
> >  >> > @@ -1535,6 +1535,7 @@ remove_flows_from_sb_to_flow(struct
> > ovn_desired_flow_table *flow_table,
> >  >> >              free(sfr);
> >  >> >          }
> >  >> >          ovs_list_remove(&f->list_node);
> >  >> > +        ovs_list_init(&f->list_node);
> >  >> >          if (log_msg) {
> >  >> >              ovn_flow_log(&f->flow, log_msg);
> >  >> >          }
> >  >> > --
> >  >> > 2.37.3
> >  >> >
> >  >> > _______________________________________________
> >  >> > dev mailing list
> >  >> > d...@openvswitch.org <mailto:d...@openvswitch.org>
> >  >> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > <https://mail.openvswitch.org/mailman/listinfo/ovs-dev>
> >  >
> >  >
> >  >
> >  > --
> >  >
> >  > Ales Musil
> >  >
> >  > Senior Software Engineer - OVN Core
> >  >
> >  > Red Hat EMEA
> >  >
> >  > amu...@redhat.com <mailto:amu...@redhat.com>    IM: amusil
>
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to