It is observed in some environments that there are much more ukeys than
actual DP flows. For example:

$ ovs-appctl upcall/show
system@ovs-system:
flows : (current 7) (avg 6) (max 117) (limit 2125)
offloaded flows : 525
dump duration : 1063ms
ufid enabled : true

23: (keys 3612)
24: (keys 3625)
25: (keys 3485)

The revalidator threads are busy revalidating the stale ukeys leading to
high CPU and long dump duration.

There are some possible situations that may result in stale ukeys that
have no corresponding DP flows.

In revalidator, push_dp_ops() doesn't check error if the op type is not
DEL. It is possible that a PUT(MODIFY) fails, especially for tc offload
case, where the old flow is deleted first and then the new one is
created. If the creation fails, the ukey will be stale (no corresponding
DP flow). This patch adds a warning in such case.

Another possible scenario is in handle_upcalls() if a PUT operation did
not succeed and op->error attribute was not set correctly it can lead to
stale ukey in operational state.

This patch adds checks in the sweep phase for such ukeys and move them
to DELETE so that they can be cleared eventually.

Co-authored-by: Han Zhou <hz...@ovn.org>
Signed-off-by: Han Zhou <hz...@ovn.org>
Signed-off-by: Roi Dayan <r...@nvidia.com>
---
 ofproto/ofproto-dpif-upcall.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
index 83609ec62b63..e9520ebdf910 100644
--- a/ofproto/ofproto-dpif-upcall.c
+++ b/ofproto/ofproto-dpif-upcall.c
@@ -57,6 +57,7 @@ COVERAGE_DEFINE(dumped_inconsistent_flow);
 COVERAGE_DEFINE(dumped_new_flow);
 COVERAGE_DEFINE(handler_duplicate_upcall);
 COVERAGE_DEFINE(revalidate_missed_dp_flow);
+COVERAGE_DEFINE(revalidate_missed_dp_flow_del);
 COVERAGE_DEFINE(ukey_dp_change);
 COVERAGE_DEFINE(ukey_invalid_stat_reset);
 COVERAGE_DEFINE(ukey_replace_contention);
@@ -278,6 +279,7 @@ enum flow_del_reason {
     FDR_BAD_ODP_FIT,        /* Bad ODP flow fit. */
     FDR_FLOW_IDLE,          /* Flow idle timeout. */
     FDR_FLOW_LIMIT,         /* Kill all flows condition reached. */
+    FDR_FLOW_STALE,         /* Flow stale detected. */
     FDR_FLOW_WILDCARDED,    /* Flow needs a narrower wildcard mask. */
     FDR_NO_OFPROTO,         /* Bridge not found. */
     FDR_PURGE,              /* User requested flow deletion. */
@@ -2557,6 +2559,10 @@ push_dp_ops(struct udpif *udpif, struct ukey_op *ops, 
size_t n_ops)
 
         if (op->dop.type != DPIF_OP_FLOW_DEL) {
             /* Only deleted flows need their stats pushed. */
+            if (op->dop.error) {
+                VLOG_WARN_RL(&rl, "push_dp_ops: error %d in op type %d, ukey "
+                             "%p", op->dop.error, op->dop.type, op->ukey);
+            }
             continue;
         }
 
@@ -3027,6 +3033,15 @@ revalidator_sweep__(struct revalidator *revalidator, 
bool purge)
                     del_reason = purge ? FDR_PURGE : FDR_UPDATE_FAIL;
                 } else if (!seq_mismatch) {
                     result = UKEY_KEEP;
+                } else if (!ukey->stats.used &&
+                           udpif_flow_time_delta(udpif, ukey) * 1000 >
+                           ofproto_max_idle) {
+                    COVERAGE_INC(revalidate_missed_dp_flow_del);
+                    VLOG_WARN_RL(&rl, "revalidator_sweep__: Remove stale ukey "
+                                 "%p delta %llus", ukey,
+                                 udpif_flow_time_delta(udpif, ukey));
+                    result = UKEY_DELETE;
+                    del_reason = FDR_FLOW_STALE;
                 } else {
                     struct dpif_flow_stats stats;
                     COVERAGE_INC(revalidate_missed_dp_flow);
-- 
2.21.0

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to