On 18 Jun 2024, at 8:05, Roi Dayan wrote:

> On 03/06/2024 16:29, Eelco Chaudron wrote:
>>
>>
>> On 3 Jun 2024, at 10:07, Roi Dayan wrote:
>>
>>> On 03/06/2024 10:18, Roi Dayan wrote:
>>>>
>>>>
>>>> On 30/05/2024 18:48, Eelco Chaudron wrote:
>>>>>
>>>>>
>>>>> On 23 May 2024, at 12:46, Roi Dayan via dev wrote:
>>>>>
>>>>>> It is observed in some environments that there are much more ukeys than
>>>>>> actual DP flows. For example:
>>>>>>
>>>>>> $ ovs-appctl upcall/show
>>>>>> system@ovs-system:
>>>>>> flows : (current 7) (avg 6) (max 117) (limit 2125)
>>>>>> offloaded flows : 525
>>>>>> dump duration : 1063ms
>>>>>> ufid enabled : true
>>>>>>
>>>>>> 23: (keys 3612)
>>>>>> 24: (keys 3625)
>>>>>> 25: (keys 3485)
>>>>>>
>>>>>> The revalidator threads are busy revalidating the stale ukeys leading to
>>>>>> high CPU and long dump duration.
>>>>>>
>>>>>> There are some possible situations that may result in stale ukeys that
>>>>>> have no corresponding DP flows.
>>>>>>
>>>>>> In revalidator, push_dp_ops() doesn't check error if the op type is not
>>>>>> DEL. It is possible that a PUT(MODIFY) fails, especially for tc offload
>>>>>> case, where the old flow is deleted first and then the new one is
>>>>>> created. If the creation fails, the ukey will be stale (no corresponding
>>>>>> DP flow). This patch adds a warning in such case.
>>>>>
>>>>> Not sure I understand, this behavior should be captured by the 
>>>>> UKEY_INCONSISTENT state.
>>>>
>>>> Hi Eelco,
>>>>
>>>> thanks for reviewing.
>>>> We started the patch on older branch as we didn't rebase for some time
>>>> and got a little later on sending it.
>>>> I see the addition now for UKEY_INCOSISTENT in the following patch:
>>>>
>>>> cf11766cbcf1 ofproto-dpif-upcall: Fix push_dp_ops to handle all errors.
>>>>
>>>> Looks like it handles the same situation we tried to handle in this patch.
>>>>
>>>>>
>>>>>> Another possible scenario is in handle_upcalls() if a PUT operation did
>>>>>> not succeed and op->error attribute was not set correctly it can lead to
>>>>>> stale ukey in operational state.
>>>>>
>>>>>
>>>>> Guess we need to figure out these cases, as these are the root cause of 
>>>>> your problem.
>>>>
>>>> right. maybe. but this could help keep the system alive/clean while 
>>>> logging the
>>>> bad situation instead of having increase in those stale/inconsistent ukeys.
>>>> I understand if it's not nice to handle it like this.
>>>>
>>>
>>> Hi Eelco,
>>>
>>> I remember now one of the reproduction scenarios we did is do some traffic
>>> to get rules added using TC and then delete those from tc command line
>>> and it got to stale ukeys.
>>> The revalidator dump thread not seeing the rules so not updating anything
>>> and sweep over the ukeys skipping them as well.
>>> This is why we checked against the timing stats of the ukey.
>>>
>>> I remember I tested on the upstream code and not our development branch
>>> and it reproduces. I didn't notice if the commit adding UKEY_INCONSISTENT
>>> existed but it handles errors from adding flows so I dont think it matters.
>>>
>>> I'll try to check and verify again but I think it's still here.
>>> So while cases getting dop.error handled with UKEY_INCONSISTENT,
>>> this case I think is not.
>>
>> I think you are right this case is not covered by the UKEY_INCONSISTENT test 
>> below. See my suggestion on fixing this in revalidate_ukey().
>>
>> Maybe you could also add a test case for this in the offload test suite.
>>
>> //Eelco
>
> Hi Eelco,
>
> Thanks for the review. I didn't have time to check this but wanted to
> reply that it's in my queue to verify your suggestion and add a test.

Thanks for letting me know, and I understand as we are all busy :)

//Eelco

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to