On 10/25/22 20:13, Han Zhou wrote:
> On Tue, Oct 25, 2022 at 2:53 AM Dumitru Ceara <dce...@redhat.com> wrote:
>>
>> On 10/25/22 07:38, Ales Musil wrote:
>>> On Tue, Oct 25, 2022 at 5:51 AM Han Zhou <zhou...@gmail.com> wrote:
>>>
>>>>
>>>>
>>>> On Sun, Oct 23, 2022 at 10:21 PM Ales Musil <amu...@redhat.com> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On Sun, Oct 23, 2022 at 10:16 PM Han Zhou <zhou...@gmail.com> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Oct 21, 2022 at 1:37 AM Ales Musil <amu...@redhat.com> wrote:
>>>>>>>
>>>>>>> Run ovs_list_init on list_node from desired flow.
>>>>>>> Not doing that can potentially cause crash on
>>>>>>> ovs_assert(ovs_list_is_empty(&f->list_node))
>>>>>>> this might happen when we have conditional monitoring
>>>>>>> enabled and have some flow like below being processed
>>>>>>> by ofctrl in a single run.
>>>>>>>
>>>>>>> flowA is removed, flowA is added, flowA is removed.
>>>>>>>
>>>>>>
>>>>>
>>>>> Hi Han,
>>>>>
>>>>>>
>>>>>> Thanks Ales for the patch. First of all I agree this patch is not
>>>> harmful and is the right thing to do, so I am going to apply it.
>>>>>> At the same time, I still don't understand the root cause analysis
>>>> above and like you said, I am not sure if it would fix the bug. Two
>>>> questions here:
>>>>>
>>>>>
>>>>> The description is based on some assumptions and might not be the
>>>> correct way to get into that state.
>>>>>
>>>>>>
>>>>>> 1. How would it relate to conditional monitoring?
>>>>>
>>>>>
>>>>> It is more likely to receive multiple updates at once with conditional
>>>> monitoring or am I wrong?
>>>>>
>>>> I don't see a reason for this. I think it only depends on what changes
>>>> happened in SB and notified and pending to ovn-controller during an
>>>> iteration of ovn-controller control loop, which doesn't seem to have
>>>> relationship with conditional monitoring. Do you have more details on
> the
>>>> reasoning?
>>>>
>>>
>>> It was based on the configuration, however it was confirmed that it
> still
>>> happens even with conditional monitoring disabled.
>>>
>>>
>>>>
>>>>>>
>>>>>> 2. For the example "flowA is removed, flowA is added, flowA is
>>>> removed", I assume flowA is "a desired OVS flow", not a logical flow,
>>>> correct? So when flowA is removed, it should be removed from the
> desired
>>>> flow table and added to the "tracked_flows". Then when flowA is added,
> it
>>>> would in fact be a newly allocated desired_flow, and no chance to find
> the
>>>> earlier tracked deleted flowA from the desired flow table. The 2nd
> flowA
>>>> deletion would find the 2nd flowA instance and destroy it, but the
> first
>>>> deleted desired_flow instance of flowA would not be referenced any
> more and
>>>> have no chance to use the list_node before it is finally destroyed,
> right?
>>>>>
>>>>>
>>>>> That makes sense, as I said, I was not able to reproduce it, the issue
>>>> seemed to make sense under those conditions. Anyway I can update the
> commit
>>>> message leaving the flows part out.
>>>>
>>>> Understand, please update the commit message.
>>>> I tend to believe that the core dump was caused by some other bug that
> may
>>>> result in corruption of the list/memory (in such case this change won't
>>>> help), and also not sure if it is a problem only with this old version
>>>> v20.12 or still a problem of the latest code.
>>>>
>>>
>>> Ack, I will update the commit message in  v2. We have also provided a
> build
>>> so we will get feedback if it still happens with this patch or not. I
> will
>>> postpone v2 until then.
>>>
>>>
>>>>
>>>>>
>>>>> Also one thing to keep in mind, it was originally reported against
>>>> v20.12. We should also backport it if possible.
>>>>
>>>> I am not sure if it is a candidate for backporting. Usually we backport
>>>> critical bug fixes only, especially for such an old version. But in
> this
>>>> case the change itself is so simple and we are almost sure it won't
> cause
>>>> any harm, although no evidence of any benefit, either.
>>>>
>>>> What's your opinion of backporting this? Dumitru, Mark, Numan?
>>>>
>>
>> Hi Ales, Han,
>>
>> Regardless if we think this is the root cause of the reported crash or
>> not, I still think there's a possibility of failing the assertion
>> because we didn't reinitialize the list_node field while computing
>> incremental OF updates.
> 
> Thanks Dumitru for the feedback. Maybe I am missing something, but from the
> code I didn't find any possibility that the list_node would be accessed
> again after the node is removed from the to_be_removed list. So I didn't
> see the possibility of failing the assertion, unless there is a bug that
> would cause the list_node to be accessed, probably after it is freed.
> 

I spent some more time looking into this and I realized I had missed the
fact that when remove_flows_from_sb_to_flow() empties to_be_removed also
removes all the items from the flow_table->match_flow_table.  So I think
you're right, at this point there should be no way to access list_node
anymore.

We probably need to get some more data.  Ales, would it be an option to
run an address-sanitizer enabled build for a test?

>>
>> As you said, it won't cause any harm but it hardens the ovn-controller
>> code so I would backport it.
>>
> Ack.
> 
>> Regards,
>> Dumitru
>>
>>>> Thanks,
>>>> Han
>>>>>
>>>>> Thanks,
>>>>> Ales
>>>>>
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Han
>>>>>>
>>>>>>> This is most likely related to BZ[0].
>>>>>>>
>>>>>>> [0] https://bugzilla.redhat.com/2125723
>>>>>>> Suggested-By: Dumitru Ceara <dce...@redhat.com>
>>>>>>> Signed-off-by: Ales Musil <amu...@redhat.com>
>>>>>>> ---
>>>>>>> I'm not adding the BZ as Reported-At because this might
>>>>>>> not fix the issue they are facing. It seems to be very hard
>>>>>>> to reproduce, however it still is a good idea to apply this
>>>>>>> fix as in the context it's the right to do and shouldn't cause
>>>>>>> any harm.
>>>>>>> ---
>>>>>>>  controller/ofctrl.c | 1 +
>>>>>>>  1 file changed, 1 insertion(+)
>>>>>>>
>>>>>>> diff --git a/controller/ofctrl.c b/controller/ofctrl.c
>>>>>>> index c77991258..fb0e0869c 100644
>>>>>>> --- a/controller/ofctrl.c
>>>>>>> +++ b/controller/ofctrl.c
>>>>>>> @@ -1535,6 +1535,7 @@ remove_flows_from_sb_to_flow(struct
>>>> ovn_desired_flow_table *flow_table,
>>>>>>>              free(sfr);
>>>>>>>          }
>>>>>>>          ovs_list_remove(&f->list_node);
>>>>>>> +        ovs_list_init(&f->list_node);
>>>>>>>          if (log_msg) {
>>>>>>>              ovn_flow_log(&f->flow, log_msg);
>>>>>>>          }
>>>>>>> --
>>>>>>> 2.37.3
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> dev mailing list
>>>>>>> d...@openvswitch.org
>>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Ales Musil
>>>>>
>>>>> Senior Software Engineer - OVN Core
>>>>>
>>>>> Red Hat EMEA
>>>>>
>>>>> amu...@redhat.com    IM: amusil
>>>>
>>>
>>>
>>> Thanks,
>>> Ales
>>>
>>
> 

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to