On Tue, Oct 25, 2022 at 2:53 AM Dumitru Ceara <dce...@redhat.com> wrote:
>
> On 10/25/22 07:38, Ales Musil wrote:
> > On Tue, Oct 25, 2022 at 5:51 AM Han Zhou <zhou...@gmail.com> wrote:
> >
> >>
> >>
> >> On Sun, Oct 23, 2022 at 10:21 PM Ales Musil <amu...@redhat.com> wrote:
> >>>
> >>>
> >>>
> >>> On Sun, Oct 23, 2022 at 10:16 PM Han Zhou <zhou...@gmail.com> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On Fri, Oct 21, 2022 at 1:37 AM Ales Musil <amu...@redhat.com> wrote:
> >>>>>
> >>>>> Run ovs_list_init on list_node from desired flow.
> >>>>> Not doing that can potentially cause crash on
> >>>>> ovs_assert(ovs_list_is_empty(&f->list_node))
> >>>>> this might happen when we have conditional monitoring
> >>>>> enabled and have some flow like below being processed
> >>>>> by ofctrl in a single run.
> >>>>>
> >>>>> flowA is removed, flowA is added, flowA is removed.
> >>>>>
> >>>>
> >>>
> >>> Hi Han,
> >>>
> >>>>
> >>>> Thanks Ales for the patch. First of all I agree this patch is not
> >> harmful and is the right thing to do, so I am going to apply it.
> >>>> At the same time, I still don't understand the root cause analysis
> >> above and like you said, I am not sure if it would fix the bug. Two
> >> questions here:
> >>>
> >>>
> >>> The description is based on some assumptions and might not be the
> >> correct way to get into that state.
> >>>
> >>>>
> >>>> 1. How would it relate to conditional monitoring?
> >>>
> >>>
> >>> It is more likely to receive multiple updates at once with conditional
> >> monitoring or am I wrong?
> >>>
> >> I don't see a reason for this. I think it only depends on what changes
> >> happened in SB and notified and pending to ovn-controller during an
> >> iteration of ovn-controller control loop, which doesn't seem to have
> >> relationship with conditional monitoring. Do you have more details on
the
> >> reasoning?
> >>
> >
> > It was based on the configuration, however it was confirmed that it
still
> > happens even with conditional monitoring disabled.
> >
> >
> >>
> >>>>
> >>>> 2. For the example "flowA is removed, flowA is added, flowA is
> >> removed", I assume flowA is "a desired OVS flow", not a logical flow,
> >> correct? So when flowA is removed, it should be removed from the
desired
> >> flow table and added to the "tracked_flows". Then when flowA is added,
it
> >> would in fact be a newly allocated desired_flow, and no chance to find
the
> >> earlier tracked deleted flowA from the desired flow table. The 2nd
flowA
> >> deletion would find the 2nd flowA instance and destroy it, but the
first
> >> deleted desired_flow instance of flowA would not be referenced any
more and
> >> have no chance to use the list_node before it is finally destroyed,
right?
> >>>
> >>>
> >>> That makes sense, as I said, I was not able to reproduce it, the issue
> >> seemed to make sense under those conditions. Anyway I can update the
commit
> >> message leaving the flows part out.
> >>
> >> Understand, please update the commit message.
> >> I tend to believe that the core dump was caused by some other bug that
may
> >> result in corruption of the list/memory (in such case this change won't
> >> help), and also not sure if it is a problem only with this old version
> >> v20.12 or still a problem of the latest code.
> >>
> >
> > Ack, I will update the commit message in  v2. We have also provided a
build
> > so we will get feedback if it still happens with this patch or not. I
will
> > postpone v2 until then.
> >
> >
> >>
> >>>
> >>> Also one thing to keep in mind, it was originally reported against
> >> v20.12. We should also backport it if possible.
> >>
> >> I am not sure if it is a candidate for backporting. Usually we backport
> >> critical bug fixes only, especially for such an old version. But in
this
> >> case the change itself is so simple and we are almost sure it won't
cause
> >> any harm, although no evidence of any benefit, either.
> >>
> >> What's your opinion of backporting this? Dumitru, Mark, Numan?
> >>
>
> Hi Ales, Han,
>
> Regardless if we think this is the root cause of the reported crash or
> not, I still think there's a possibility of failing the assertion
> because we didn't reinitialize the list_node field while computing
> incremental OF updates.

Thanks Dumitru for the feedback. Maybe I am missing something, but from the
code I didn't find any possibility that the list_node would be accessed
again after the node is removed from the to_be_removed list. So I didn't
see the possibility of failing the assertion, unless there is a bug that
would cause the list_node to be accessed, probably after it is freed.

>
> As you said, it won't cause any harm but it hardens the ovn-controller
> code so I would backport it.
>
Ack.

> Regards,
> Dumitru
>
> >> Thanks,
> >> Han
> >>>
> >>> Thanks,
> >>> Ales
> >>>
> >>>>
> >>>>
> >>>> Thanks,
> >>>> Han
> >>>>
> >>>>> This is most likely related to BZ[0].
> >>>>>
> >>>>> [0] https://bugzilla.redhat.com/2125723
> >>>>> Suggested-By: Dumitru Ceara <dce...@redhat.com>
> >>>>> Signed-off-by: Ales Musil <amu...@redhat.com>
> >>>>> ---
> >>>>> I'm not adding the BZ as Reported-At because this might
> >>>>> not fix the issue they are facing. It seems to be very hard
> >>>>> to reproduce, however it still is a good idea to apply this
> >>>>> fix as in the context it's the right to do and shouldn't cause
> >>>>> any harm.
> >>>>> ---
> >>>>>  controller/ofctrl.c | 1 +
> >>>>>  1 file changed, 1 insertion(+)
> >>>>>
> >>>>> diff --git a/controller/ofctrl.c b/controller/ofctrl.c
> >>>>> index c77991258..fb0e0869c 100644
> >>>>> --- a/controller/ofctrl.c
> >>>>> +++ b/controller/ofctrl.c
> >>>>> @@ -1535,6 +1535,7 @@ remove_flows_from_sb_to_flow(struct
> >> ovn_desired_flow_table *flow_table,
> >>>>>              free(sfr);
> >>>>>          }
> >>>>>          ovs_list_remove(&f->list_node);
> >>>>> +        ovs_list_init(&f->list_node);
> >>>>>          if (log_msg) {
> >>>>>              ovn_flow_log(&f->flow, log_msg);
> >>>>>          }
> >>>>> --
> >>>>> 2.37.3
> >>>>>
> >>>>> _______________________________________________
> >>>>> dev mailing list
> >>>>> d...@openvswitch.org
> >>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >>>
> >>>
> >>>
> >>> --
> >>>
> >>> Ales Musil
> >>>
> >>> Senior Software Engineer - OVN Core
> >>>
> >>> Red Hat EMEA
> >>>
> >>> amu...@redhat.com    IM: amusil
> >>
> >
> >
> > Thanks,
> > Ales
> >
>
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to