On 03/07/2024 13:16, Eelco Chaudron wrote:
> 
> 
> On 3 Jul 2024, at 9:33, Roi Dayan wrote:
> 
>> It is observed in some environments that there are much more ukeys than
>> actual DP flows. For example:
>>
>> $ ovs-appctl upcall/show
>> system@ovs-system:
>> flows : (current 7) (avg 6) (max 117) (limit 2125)
>> offloaded flows : 525
>> dump duration : 1063ms
>> ufid enabled : true
>>
>> 23: (keys 3612)
>> 24: (keys 3625)
>> 25: (keys 3485)
>>
>> The revalidator threads are busy revalidating the stale ukeys leading to
>> high CPU and long dump duration.
>>
>> This patch adds checks in the sweep phase for such ukeys and move them
>> to DELETE so that they can be cleared eventually.
> 
> Thank Roi for the patch update, one small issue below. Let’s also discuss a 
> bit more a potential testcase before sending a v3.
> 
> Cheers,
> 
> Eelco
> 
> 

I also replied in v0 but replying here so we can continue the discussion here 
in v2.

I did this for testing this case:

- create bridge with 2 veth ports. configure ips.
- ping between the ports to have tc rules.
- repeated few times: clear the tc rules like this: tc filter del dev veth1 
ingress and also on the 2nd port.
- set max-idle to 1 and remove it to cause a flush of the rules.
- create another set of veth ports. add/del veth4 from the bridge a few times 
to cause a sweep.
- before the fix: ovs-appctl upcall/show will show ukeys.
- after the fix upcall/show will show 0 ukeys.


what do you think?
I think if there is a cleaner way to do a sweep with purge=false as just
will just skip seq mismatch check and just set every ukey to delete.



>> Co-authored-by: Han Zhou <hz...@ovn.org>
>> Signed-off-by: Han Zhou <hz...@ovn.org>
>> Signed-off-by: Roi Dayan <r...@nvidia.com>
>> ---
>>  ofproto/ofproto-dpif-upcall.c | 10 +++++++++-
>>  1 file changed, 9 insertions(+), 1 deletion(-)
>>
>> diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
>> index 83609ec62b63..08b3c70411aa 100644
>> --- a/ofproto/ofproto-dpif-upcall.c
>> +++ b/ofproto/ofproto-dpif-upcall.c
>> @@ -278,6 +278,7 @@ enum flow_del_reason {
>>      FDR_BAD_ODP_FIT,        /* Bad ODP flow fit. */
>>      FDR_FLOW_IDLE,          /* Flow idle timeout. */
>>      FDR_FLOW_LIMIT,         /* Kill all flows condition reached. */
>> +    FDR_FLOW_NO_STATS_IDLE, /* Flow idled out without receiving statistics. 
>> */
> 
> Guess you missed the part at the beginning on updating the related USDT 
> script:
> 
> ```
> /* Ukey delete reasons used by USDT probes.  Please keep in sync with the
>  * definition in utilities/usdt-scripts/flow_reval_monitor.py.  */
> enum flow_del_reason {
> ```
> 

right. thanks.

>>      FDR_FLOW_WILDCARDED,    /* Flow needs a narrower wildcard mask. */
>>      FDR_NO_OFPROTO,         /* Bridge not found. */
>>      FDR_PURGE,              /* User requested flow deletion. */
>> @@ -2450,7 +2451,14 @@ revalidate_ukey(struct udpif *udpif, struct udpif_key 
>> *ukey,
>>          log_unexpected_stats_jump(ukey, stats);
>>      }
>>
>> -    if (need_revalidate) {
>> +    if (!ukey->stats.used
>> +        && ukey->created < udpif->dpif->current_ms - ofproto_max_idle) {
>> +        /* If the flow has a 'used' value of 0, meaning no stats were 
>> received
>> +         * for this flow, and the configured idle time has elapsed, it might
>> +         * indicates a stale flow (i.e., a flow without an installed 
>> datapath
>> +         * rule). In this case, it is safe to mark this ukey for deletion. 
>> */
>> +        *del_reason = FDR_FLOW_NO_STATS_IDLE;
>> +    } else if (need_revalidate) {
>>          if (should_revalidate(udpif, ukey, push.n_packets)) {
>>              if (!ukey->xcache) {
>>                  ukey->xcache = xlate_cache_new();
>> -- 
>> 2.21.0
> 

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to