On 03/07/2024 13:16, Eelco Chaudron wrote: > > > On 3 Jul 2024, at 9:33, Roi Dayan wrote: > >> It is observed in some environments that there are much more ukeys than >> actual DP flows. For example: >> >> $ ovs-appctl upcall/show >> system@ovs-system: >> flows : (current 7) (avg 6) (max 117) (limit 2125) >> offloaded flows : 525 >> dump duration : 1063ms >> ufid enabled : true >> >> 23: (keys 3612) >> 24: (keys 3625) >> 25: (keys 3485) >> >> The revalidator threads are busy revalidating the stale ukeys leading to >> high CPU and long dump duration. >> >> This patch adds checks in the sweep phase for such ukeys and move them >> to DELETE so that they can be cleared eventually. > > Thank Roi for the patch update, one small issue below. Let’s also discuss a > bit more a potential testcase before sending a v3. > > Cheers, > > Eelco > >
I also replied in v0 but replying here so we can continue the discussion here in v2. I did this for testing this case: - create bridge with 2 veth ports. configure ips. - ping between the ports to have tc rules. - repeated few times: clear the tc rules like this: tc filter del dev veth1 ingress and also on the 2nd port. - set max-idle to 1 and remove it to cause a flush of the rules. - create another set of veth ports. add/del veth4 from the bridge a few times to cause a sweep. - before the fix: ovs-appctl upcall/show will show ukeys. - after the fix upcall/show will show 0 ukeys. what do you think? I think if there is a cleaner way to do a sweep with purge=false as just will just skip seq mismatch check and just set every ukey to delete. >> Co-authored-by: Han Zhou <hz...@ovn.org> >> Signed-off-by: Han Zhou <hz...@ovn.org> >> Signed-off-by: Roi Dayan <r...@nvidia.com> >> --- >> ofproto/ofproto-dpif-upcall.c | 10 +++++++++- >> 1 file changed, 9 insertions(+), 1 deletion(-) >> >> diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c >> index 83609ec62b63..08b3c70411aa 100644 >> --- a/ofproto/ofproto-dpif-upcall.c >> +++ b/ofproto/ofproto-dpif-upcall.c >> @@ -278,6 +278,7 @@ enum flow_del_reason { >> FDR_BAD_ODP_FIT, /* Bad ODP flow fit. */ >> FDR_FLOW_IDLE, /* Flow idle timeout. */ >> FDR_FLOW_LIMIT, /* Kill all flows condition reached. */ >> + FDR_FLOW_NO_STATS_IDLE, /* Flow idled out without receiving statistics. >> */ > > Guess you missed the part at the beginning on updating the related USDT > script: > > ``` > /* Ukey delete reasons used by USDT probes. Please keep in sync with the > * definition in utilities/usdt-scripts/flow_reval_monitor.py. */ > enum flow_del_reason { > ``` > right. thanks. >> FDR_FLOW_WILDCARDED, /* Flow needs a narrower wildcard mask. */ >> FDR_NO_OFPROTO, /* Bridge not found. */ >> FDR_PURGE, /* User requested flow deletion. */ >> @@ -2450,7 +2451,14 @@ revalidate_ukey(struct udpif *udpif, struct udpif_key >> *ukey, >> log_unexpected_stats_jump(ukey, stats); >> } >> >> - if (need_revalidate) { >> + if (!ukey->stats.used >> + && ukey->created < udpif->dpif->current_ms - ofproto_max_idle) { >> + /* If the flow has a 'used' value of 0, meaning no stats were >> received >> + * for this flow, and the configured idle time has elapsed, it might >> + * indicates a stale flow (i.e., a flow without an installed >> datapath >> + * rule). In this case, it is safe to mark this ukey for deletion. >> */ >> + *del_reason = FDR_FLOW_NO_STATS_IDLE; >> + } else if (need_revalidate) { >> if (should_revalidate(udpif, ukey, push.n_packets)) { >> if (!ukey->xcache) { >> ukey->xcache = xlate_cache_new(); >> -- >> 2.21.0 > _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev