On Tue, Oct 25, 2022 at 2:53 AM Dumitru Ceara <dce...@redhat.com> wrote: > > On 10/25/22 07:38, Ales Musil wrote: > > On Tue, Oct 25, 2022 at 5:51 AM Han Zhou <zhou...@gmail.com> wrote: > > > >> > >> > >> On Sun, Oct 23, 2022 at 10:21 PM Ales Musil <amu...@redhat.com> wrote: > >>> > >>> > >>> > >>> On Sun, Oct 23, 2022 at 10:16 PM Han Zhou <zhou...@gmail.com> wrote: > >>>> > >>>> > >>>> > >>>> On Fri, Oct 21, 2022 at 1:37 AM Ales Musil <amu...@redhat.com> wrote: > >>>>> > >>>>> Run ovs_list_init on list_node from desired flow. > >>>>> Not doing that can potentially cause crash on > >>>>> ovs_assert(ovs_list_is_empty(&f->list_node)) > >>>>> this might happen when we have conditional monitoring > >>>>> enabled and have some flow like below being processed > >>>>> by ofctrl in a single run. > >>>>> > >>>>> flowA is removed, flowA is added, flowA is removed. > >>>>> > >>>> > >>> > >>> Hi Han, > >>> > >>>> > >>>> Thanks Ales for the patch. First of all I agree this patch is not > >> harmful and is the right thing to do, so I am going to apply it. > >>>> At the same time, I still don't understand the root cause analysis > >> above and like you said, I am not sure if it would fix the bug. Two > >> questions here: > >>> > >>> > >>> The description is based on some assumptions and might not be the > >> correct way to get into that state. > >>> > >>>> > >>>> 1. How would it relate to conditional monitoring? > >>> > >>> > >>> It is more likely to receive multiple updates at once with conditional > >> monitoring or am I wrong? > >>> > >> I don't see a reason for this. I think it only depends on what changes > >> happened in SB and notified and pending to ovn-controller during an > >> iteration of ovn-controller control loop, which doesn't seem to have > >> relationship with conditional monitoring. Do you have more details on the > >> reasoning? > >> > > > > It was based on the configuration, however it was confirmed that it still > > happens even with conditional monitoring disabled. > > > > > >> > >>>> > >>>> 2. For the example "flowA is removed, flowA is added, flowA is > >> removed", I assume flowA is "a desired OVS flow", not a logical flow, > >> correct? So when flowA is removed, it should be removed from the desired > >> flow table and added to the "tracked_flows". Then when flowA is added, it > >> would in fact be a newly allocated desired_flow, and no chance to find the > >> earlier tracked deleted flowA from the desired flow table. The 2nd flowA > >> deletion would find the 2nd flowA instance and destroy it, but the first > >> deleted desired_flow instance of flowA would not be referenced any more and > >> have no chance to use the list_node before it is finally destroyed, right? > >>> > >>> > >>> That makes sense, as I said, I was not able to reproduce it, the issue > >> seemed to make sense under those conditions. Anyway I can update the commit > >> message leaving the flows part out. > >> > >> Understand, please update the commit message. > >> I tend to believe that the core dump was caused by some other bug that may > >> result in corruption of the list/memory (in such case this change won't > >> help), and also not sure if it is a problem only with this old version > >> v20.12 or still a problem of the latest code. > >> > > > > Ack, I will update the commit message in v2. We have also provided a build > > so we will get feedback if it still happens with this patch or not. I will > > postpone v2 until then. > > > > > >> > >>> > >>> Also one thing to keep in mind, it was originally reported against > >> v20.12. We should also backport it if possible. > >> > >> I am not sure if it is a candidate for backporting. Usually we backport > >> critical bug fixes only, especially for such an old version. But in this > >> case the change itself is so simple and we are almost sure it won't cause > >> any harm, although no evidence of any benefit, either. > >> > >> What's your opinion of backporting this? Dumitru, Mark, Numan? > >> > > Hi Ales, Han, > > Regardless if we think this is the root cause of the reported crash or > not, I still think there's a possibility of failing the assertion > because we didn't reinitialize the list_node field while computing > incremental OF updates.
Thanks Dumitru for the feedback. Maybe I am missing something, but from the code I didn't find any possibility that the list_node would be accessed again after the node is removed from the to_be_removed list. So I didn't see the possibility of failing the assertion, unless there is a bug that would cause the list_node to be accessed, probably after it is freed. > > As you said, it won't cause any harm but it hardens the ovn-controller > code so I would backport it. > Ack. > Regards, > Dumitru > > >> Thanks, > >> Han > >>> > >>> Thanks, > >>> Ales > >>> > >>>> > >>>> > >>>> Thanks, > >>>> Han > >>>> > >>>>> This is most likely related to BZ[0]. > >>>>> > >>>>> [0] https://bugzilla.redhat.com/2125723 > >>>>> Suggested-By: Dumitru Ceara <dce...@redhat.com> > >>>>> Signed-off-by: Ales Musil <amu...@redhat.com> > >>>>> --- > >>>>> I'm not adding the BZ as Reported-At because this might > >>>>> not fix the issue they are facing. It seems to be very hard > >>>>> to reproduce, however it still is a good idea to apply this > >>>>> fix as in the context it's the right to do and shouldn't cause > >>>>> any harm. > >>>>> --- > >>>>> controller/ofctrl.c | 1 + > >>>>> 1 file changed, 1 insertion(+) > >>>>> > >>>>> diff --git a/controller/ofctrl.c b/controller/ofctrl.c > >>>>> index c77991258..fb0e0869c 100644 > >>>>> --- a/controller/ofctrl.c > >>>>> +++ b/controller/ofctrl.c > >>>>> @@ -1535,6 +1535,7 @@ remove_flows_from_sb_to_flow(struct > >> ovn_desired_flow_table *flow_table, > >>>>> free(sfr); > >>>>> } > >>>>> ovs_list_remove(&f->list_node); > >>>>> + ovs_list_init(&f->list_node); > >>>>> if (log_msg) { > >>>>> ovn_flow_log(&f->flow, log_msg); > >>>>> } > >>>>> -- > >>>>> 2.37.3 > >>>>> > >>>>> _______________________________________________ > >>>>> dev mailing list > >>>>> d...@openvswitch.org > >>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev > >>> > >>> > >>> > >>> -- > >>> > >>> Ales Musil > >>> > >>> Senior Software Engineer - OVN Core > >>> > >>> Red Hat EMEA > >>> > >>> amu...@redhat.com IM: amusil > >> > > > > > > Thanks, > > Ales > > > _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev