It is observed in some environments that there are much more ukeys than actual DP flows. For example:
$ ovs-appctl upcall/show system@ovs-system: flows : (current 7) (avg 6) (max 117) (limit 2125) offloaded flows : 525 dump duration : 1063ms ufid enabled : true 23: (keys 3612) 24: (keys 3625) 25: (keys 3485) The revalidator threads are busy revalidating the stale ukeys leading to high CPU and long dump duration. There are some possible situations that may result in stale ukeys that have no corresponding DP flows. In revalidator, push_dp_ops() doesn't check error if the op type is not DEL. It is possible that a PUT(MODIFY) fails, especially for tc offload case, where the old flow is deleted first and then the new one is created. If the creation fails, the ukey will be stale (no corresponding DP flow). This patch adds a warning in such case. Another possible scenario is in handle_upcalls() if a PUT operation did not succeed and op->error attribute was not set correctly it can lead to stale ukey in operational state. This patch adds checks in the sweep phase for such ukeys and move them to DELETE so that they can be cleared eventually. Co-authored-by: Han Zhou <hz...@ovn.org> Signed-off-by: Han Zhou <hz...@ovn.org> Signed-off-by: Roi Dayan <r...@nvidia.com> --- ofproto/ofproto-dpif-upcall.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c index 83609ec62b63..e9520ebdf910 100644 --- a/ofproto/ofproto-dpif-upcall.c +++ b/ofproto/ofproto-dpif-upcall.c @@ -57,6 +57,7 @@ COVERAGE_DEFINE(dumped_inconsistent_flow); COVERAGE_DEFINE(dumped_new_flow); COVERAGE_DEFINE(handler_duplicate_upcall); COVERAGE_DEFINE(revalidate_missed_dp_flow); +COVERAGE_DEFINE(revalidate_missed_dp_flow_del); COVERAGE_DEFINE(ukey_dp_change); COVERAGE_DEFINE(ukey_invalid_stat_reset); COVERAGE_DEFINE(ukey_replace_contention); @@ -278,6 +279,7 @@ enum flow_del_reason { FDR_BAD_ODP_FIT, /* Bad ODP flow fit. */ FDR_FLOW_IDLE, /* Flow idle timeout. */ FDR_FLOW_LIMIT, /* Kill all flows condition reached. */ + FDR_FLOW_STALE, /* Flow stale detected. */ FDR_FLOW_WILDCARDED, /* Flow needs a narrower wildcard mask. */ FDR_NO_OFPROTO, /* Bridge not found. */ FDR_PURGE, /* User requested flow deletion. */ @@ -2557,6 +2559,10 @@ push_dp_ops(struct udpif *udpif, struct ukey_op *ops, size_t n_ops) if (op->dop.type != DPIF_OP_FLOW_DEL) { /* Only deleted flows need their stats pushed. */ + if (op->dop.error) { + VLOG_WARN_RL(&rl, "push_dp_ops: error %d in op type %d, ukey " + "%p", op->dop.error, op->dop.type, op->ukey); + } continue; } @@ -3027,6 +3033,15 @@ revalidator_sweep__(struct revalidator *revalidator, bool purge) del_reason = purge ? FDR_PURGE : FDR_UPDATE_FAIL; } else if (!seq_mismatch) { result = UKEY_KEEP; + } else if (!ukey->stats.used && + udpif_flow_time_delta(udpif, ukey) * 1000 > + ofproto_max_idle) { + COVERAGE_INC(revalidate_missed_dp_flow_del); + VLOG_WARN_RL(&rl, "revalidator_sweep__: Remove stale ukey " + "%p delta %llus", ukey, + udpif_flow_time_delta(udpif, ukey)); + result = UKEY_DELETE; + del_reason = FDR_FLOW_STALE; } else { struct dpif_flow_stats stats; COVERAGE_INC(revalidate_missed_dp_flow); -- 2.21.0 _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev