On 10/07/2024 18:35, Eelco Chaudron wrote: > > > On 9 Jul 2024, at 23:38, Ilya Maximets wrote: > >> On 7/3/24 12:28, Roi Dayan via dev wrote: >>> >>> >>> On 03/07/2024 13:16, Eelco Chaudron wrote: >>>> >>>> >>>> On 3 Jul 2024, at 9:33, Roi Dayan wrote: >>>> >>>>> It is observed in some environments that there are much more ukeys than >>>>> actual DP flows. For example: >>>>> >>>>> $ ovs-appctl upcall/show >>>>> system@ovs-system: >>>>> flows : (current 7) (avg 6) (max 117) (limit 2125) >>>>> offloaded flows : 525 >>>>> dump duration : 1063ms >>>>> ufid enabled : true >>>>> >>>>> 23: (keys 3612) >>>>> 24: (keys 3625) >>>>> 25: (keys 3485) >>>>> >>>>> The revalidator threads are busy revalidating the stale ukeys leading to >>>>> high CPU and long dump duration. >>>>> >>>>> This patch adds checks in the sweep phase for such ukeys and move them >>>>> to DELETE so that they can be cleared eventually. >>>> >>>> Thank Roi for the patch update, one small issue below. Let’s also discuss >>>> a bit more a potential testcase before sending a v3. >>>> >>>> Cheers, >>>> >>>> Eelco >>>> >>>> >>> >>> I also replied in v0 but replying here so we can continue the discussion >>> here in v2. >>> >>> I did this for testing this case: >>> >>> - create bridge with 2 veth ports. configure ips. >>> - ping between the ports to have tc rules. >>> - repeated few times: clear the tc rules like this: tc filter del dev veth1 >>> ingress and also on the 2nd port. >>> - set max-idle to 1 and remove it to cause a flush of the rules. >>> - create another set of veth ports. add/del veth4 from the bridge a few >>> times to cause a sweep. >>> - before the fix: ovs-appctl upcall/show will show ukeys. >>> - after the fix upcall/show will show 0 ukeys. >>> >>> >>> what do you think? >>> I think if there is a cleaner way to do a sweep with purge=false as just >>> will just skip seq mismatch check and just set every ukey to delete. >> >> One problem here is that flow dumps are inherently racy. >> We can dump the same flow multiple times as well as not >> receive some flows that are actually in the datapath. >> So, we can't just remove ukeys if we missed a flow once >> or twice. If we didn't see the flow in the dumps for the >> max-idle interval, that might be a better indicator that >> something is wrong. >> >> One more problem here however is that ukey statistics can >> keep increasing if there is some traffic that matches. >> Since the datapath flow doesn't exist, all this traffic >> will go to userspace updating the ukey stats, but the >> flow will never be installed again. And we will not be >> able to fix that in this patch since the stats will grow. > > Yes, I noticed this when trying to add a quick testcase for this. The entries > were used, so it would never hit the code path. > > >> We need a way to tell that we haven't seen this flow in >> the dumps for a long time and stats are not a good indicator >> in this case. > > Agreed, I made a rough testcase and a patch that does this. Please take a > look and use whatever you need for your patch. I can also try to work on this > if you want, but it will be another three weeks as I’m about to go on PTO. > > Here are the patches (diffs): > > diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c > index 83609ec62..8b449537d 100644 > --- a/ofproto/ofproto-dpif-upcall.c > +++ b/ofproto/ofproto-dpif-upcall.c > @@ -57,6 +57,7 @@ COVERAGE_DEFINE(dumped_inconsistent_flow); > COVERAGE_DEFINE(dumped_new_flow); > COVERAGE_DEFINE(handler_duplicate_upcall); > COVERAGE_DEFINE(revalidate_missed_dp_flow); > +COVERAGE_DEFINE(revalidate_missing_dp_flow); > COVERAGE_DEFINE(ukey_dp_change); > COVERAGE_DEFINE(ukey_invalid_stat_reset); > COVERAGE_DEFINE(ukey_replace_contention); > @@ -278,6 +279,7 @@ enum flow_del_reason { > FDR_BAD_ODP_FIT, /* Bad ODP flow fit. */ > FDR_FLOW_IDLE, /* Flow idle timeout. */ > FDR_FLOW_LIMIT, /* Kill all flows condition reached. */ > + FDR_FLOW_MISSING_DP, /* Flow is missing from the datapath. */ > FDR_FLOW_WILDCARDED, /* Flow needs a narrower wildcard mask. */ > FDR_NO_OFPROTO, /* Bridge not found. */ > FDR_PURGE, /* User requested flow deletion. */ > @@ -315,6 +317,7 @@ struct udpif_key { > struct dpif_flow_stats stats OVS_GUARDED; /* Last known stats.*/ > const char *dp_layer OVS_GUARDED; /* Last known dp_layer. */ > long long int created OVS_GUARDED; /* Estimate of creation time. > */ > + long long int last_dumped OVS_GUARDED; /* Flow last dump time. */ > uint64_t dump_seq OVS_GUARDED; /* Tracks udpif->dump_seq. */ > uint64_t reval_seq OVS_GUARDED; /* Tracks udpif->reval_seq. */ > enum ukey_state state OVS_GUARDED; /* Tracks ukey lifetime. */ > @@ -1819,6 +1822,7 @@ ukey_create__(const struct nlattr *key, size_t key_len, > ukey->state_thread = ovsthread_id_self(); > ukey->state_where = OVS_SOURCE_LOCATOR; > ukey->created = ukey->flow_time = time_msec(); > + ukey->last_dumped = 0; > memset(&ukey->stats, 0, sizeof ukey->stats); > ukey->stats.used = used; > ukey->dp_layer = NULL; > @@ -2450,7 +2454,14 @@ revalidate_ukey(struct udpif *udpif, struct udpif_key > *ukey, > log_unexpected_stats_jump(ukey, stats); > } > > - if (need_revalidate) { > + if ((ukey->last_dumped ? ukey->last_dumped : ukey->created) > + < udpif->dpif->current_ms - (2 * ofproto_max_idle)) { > + /* If the flow was not dumped for at least twice the idle time, > + * we can assume the datapath flow now longer exists and the ukey > + * should be deleted. */ > + COVERAGE_INC(revalidate_missing_dp_flow); > + *del_reason = FDR_FLOW_MISSING_DP; > + } else if (need_revalidate) { > if (should_revalidate(udpif, ukey, push.n_packets)) { > if (!ukey->xcache) { > ukey->xcache = xlate_cache_new(); > @@ -2884,6 +2895,7 @@ revalidate(struct revalidator *revalidator) > continue; > } > > + ukey->last_dumped = now; > ukey->offloaded = f->attrs.offloaded; > if (!ukey->dp_layer > || (!dpif_synced_dp_layers(udpif->dpif) > diff --git a/tests/system-offloads-traffic.at > b/tests/system-offloads-traffic.at > index 7dbf0e97c..87a64d453 100644 > --- a/tests/system-offloads-traffic.at > +++ b/tests/system-offloads-traffic.at > @@ -132,8 +132,8 @@ dnl Wait until the ARP flow has timed out. > OVS_WAIT_UNTIL([test `ovs-appctl dpctl/dump-flows type=tc,offloaded | \ > grep "arp" | wc -l` -eq 0]) > > -dnl Set max-idle to 200 ms, so we are sure the max-idle * 2 has been reached. > -AT_CHECK([ovs-vsctl set Open_vSwitch . other_config:max-idle=200]) > +dnl Set max-idle to 10 ms, so we are sure the max-idle * 2 has been reached. > +AT_CHECK([ovs-vsctl set Open_vSwitch . other_config:max-idle=10]) > > dnl Add a port to force the revalidation. > REV_RECONFIGURE=$(ovs-appctl coverage/read-counter rev_reconfigure) > diff --git a/utilities/usdt-scripts/flow_reval_monitor.py > b/utilities/usdt-scripts/flow_reval_monitor.py > index 28479a565..1de06c73b 100755 > --- a/utilities/usdt-scripts/flow_reval_monitor.py > +++ b/utilities/usdt-scripts/flow_reval_monitor.py > @@ -249,6 +249,7 @@ FdrReasons = IntEnum( > "FDR_BAD_ODP_FIT", > "FDR_FLOW_IDLE", > "FDR_FLOW_LIMIT", > + "FDR_FLOW_MISSING_DP", > "FDR_FLOW_WILDCARDED", > "FDR_NO_OFPROTO", > "FDR_PURGE", > @@ -265,6 +266,7 @@ FdrReasonStrings = { > FdrReasons.FDR_BAD_ODP_FIT: "Bad ODP flow fit", > FdrReasons.FDR_FLOW_IDLE: "Flow idle timeout", > FdrReasons.FDR_FLOW_LIMIT: "Kill all flows condition reached", > + FdrReasons.FDR_FLOW_MISSING_DP: "Flow is missing from the datapath", > FdrReasons.FDR_FLOW_WILDCARDED: "Flow needs a narrower wildcard mask", > FdrReasons.FDR_NO_OFPROTO: "Bridge not found", > FdrReasons.FDR_PURGE: "User requested flow deletion", > > > diff --git a/tests/system-offloads-traffic.at > b/tests/system-offloads-traffic.at > index d1da33d96..7dbf0e97c 100644 > --- a/tests/system-offloads-traffic.at > +++ b/tests/system-offloads-traffic.at > @@ -93,6 +93,66 @@ AT_CHECK([ovs-appctl upcall/show | grep -E "offloaded > flows : [[1-9]]"], [0], [i > OVS_TRAFFIC_VSWITCHD_STOP > AT_CLEANUP > > +AT_SETUP([offloads - Forced removed datapath entries]) > +OVS_TRAFFIC_VSWITCHD_START([], [], [-- set Open_vSwitch . > other_config:hw-offload=true]) > + > +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"]) > + > +ADD_NAMESPACES(at_ns0, at_ns1) > + > +ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24") > +ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24") > +AT_CHECK([ovs-appctl dpctl/dump-flows], [0], [ignore]) > + > +AT_CHECK([ovs-appctl vlog/set ofproto_dpif_upcall:dbg]) > + > +dnl Set idle timeout to 1 second. > +AT_CHECK([ovs-vsctl set Open_vSwitch . other_config:max-idle=1000]) > + > +NS_CHECK_EXEC([at_ns0], [ping -q -c 10 -i 0.1 -W 2 10.1.1.2 | FORMAT_PING], > [0], [dnl > +10 packets transmitted, 10 received, 0% packet loss, time 0ms > +]) > + > +AT_CHECK([ovs-appctl dpctl/dump-flows type=tc,offloaded | grep > "eth_type(0x0800)" | DUMP_CLEAN_SORTED], [0], [dnl > +in_port(2),eth(macs),eth_type(0x0800),ipv4(frag=no), packets:9, bytes:756, > used:0.001s, actions:output > +in_port(3),eth(macs),eth_type(0x0800),ipv4(frag=no), packets:9, bytes:756, > used:0.001s, actions:output > +]) > + > +AT_CHECK([ovs-appctl dpctl/dump-flows type=tc,offloaded | grep -q "arp" ], > [0]) > + > +dnl Delete IPv4 entries, but keep the ARP ones. > +AT_CHECK([tc filter del dev ovs-p0 ingress protocol ip pref 2]) > +AT_CHECK([tc filter del dev ovs-p1 ingress protocol ip pref 2]) > + > +dnl Bring the remote ports down to avoid traffic. > +AT_CHECK([ip -n at_ns0 link set p0 down]) > +AT_CHECK([ip -n at_ns1 link set p1 down]) > + > +dnl Wait until the ARP flow has timed out. > +OVS_WAIT_UNTIL([test `ovs-appctl dpctl/dump-flows type=tc,offloaded | \ > + grep "arp" | wc -l` -eq 0]) > + > +dnl Set max-idle to 200 ms, so we are sure the max-idle * 2 has been reached. > +AT_CHECK([ovs-vsctl set Open_vSwitch . other_config:max-idle=200]) > + > +dnl Add a port to force the revalidation. > +REV_RECONFIGURE=$(ovs-appctl coverage/read-counter rev_reconfigure) > +AT_CHECK([ovs-vsctl add-port br0 ovs-p2 -- set interface ovs-p2 > type=internal]) > +OVS_WAIT_UNTIL([test `ovs-appctl coverage/read-counter rev_reconfigure` \ > + -gt $REV_RECONFIGURE]) > + > +dnl Wait for another full round of revalidation. After this all ukeys should > +dnl be gone. > +AT_CHECK([ovs-appctl revalidator/wait], [0]) > +AT_CHECK([ovs-appctl revalidator/wait], [0]) > + > +dnl Make sure no more ukeys exists. > +AT_CHECK([ovs-appctl upcall/show | grep '(keys' | awk '{print $3}' | \ > + grep -qv '0)'], [1]) > + > +OVS_TRAFFIC_VSWITCHD_STOP(["/failed to flow_del (No such file or > directory)/d"]) > +AT_CLEANUP > + > AT_SETUP([offloads - set ingress_policing_rate and ingress_policing_burst - > offloads disabled]) > AT_KEYWORDS([ingress_policing]) > OVS_CHECK_TC_QDISC() > > > >
Hi Eelco, Thanks for the patch. it works fine. But the test fails with ukeys still exists. ./system-offloads-traffic.at:150: ovs-appctl upcall/show | grep '(keys' | awk '{print $3}' | \ grep -qv '0)' ./system-offloads-traffic.at:150: exit code was 0, expected 1 If I'm changing the test a bit to do add-port del-port one more time then it works. I think the seq doesn't change and doing this multiple times is helping. You already did the entire patch with a test :) do you want to submit ? its fine by me. Thanks, Roi +++ b/tests/system-offloads-traffic.at @@ -138,8 +138,13 @@ AT_CHECK([ovs-vsctl set Open_vSwitch . other_config:max-idle=200]) dnl Add a port to force the revalidation. REV_RECONFIGURE=$(ovs-appctl coverage/read-counter rev_reconfigure) AT_CHECK([ovs-vsctl add-port br0 ovs-p2 -- set interface ovs-p2 type=internal]) +AT_CHECK([ovs-vsctl del-port ovs-p2 ]) OVS_WAIT_UNTIL([test `ovs-appctl coverage/read-counter rev_reconfigure` \ -gt $REV_RECONFIGURE]) +sleep 1 +AT_CHECK([ovs-vsctl add-port br0 ovs-p2 -- set interface ovs-p2 type=internal]) +AT_CHECK([ovs-vsctl del-port ovs-p2 ]) +sleep 1 dnl Wait for another full round of revalidation. After this all ukeys should _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev