Our running ovs-vswitchd was cored with stack [1]. It shows a
concurrent race condition.

When the revalidator thread is performing rule revalidation/translation,
another thread has already deleted the rule tag and reduced the reference
count to 0. Using ofproto_rule_ref() at this point will trigger an assertion,
as it assumes the object is still alive (refcount > 0).

Replace the mandatory reference to the rule in xlate_table_action()
with a non-fatal try-ref: ofproto_rule_try_ref().
If the reference fails (the rule is being/has been released),
treat it as a table miss path, avoiding referencing "dead" rules
and causing assertion errors.

[1] call stack:
*0  raise () from /lib64/libc.so.6
*1  abort () from /lib64/libc.so.6
*2  ovs_abort_valist () at lib/util.c:499
*3  vlog_abort_valist () at lib/vlog.c:1249
*4  vlog_abort () at lib/vlog.c:1263
*5  ovs_assert_failure () at lib/util.c:86
*6  ovs_refcount_ref () at lib/ovs-atomic.h:547
*7  ovs_refcount_ref () at ofproto/ofproto.c:3010
*8  ofproto_rule_ref () at ofproto/ofproto.c:3012
*9  xlate_table_action () at ofproto/ofproto-dpif-xlate.c:4506
*10 do_xlate_actions () at ofproto/ofproto-dpif-xlate.c:7183
*11 xlate_recursively () at ofproto/ofproto-dpif-xlate.c:4382
*12 xlate_table_action () at ofproto/ofproto-dpif-xlate.c:4511
*13 do_xlate_actions () at ofproto/ofproto-dpif-xlate.c:7183
*14 clone_xlate_actions () at ofproto/ofproto-dpif-xlate.c:5809
*15 xlate_recursively () at ofproto/ofproto-dpif-xlate.c:4382
*16 xlate_table_action () at ofproto/ofproto-dpif-xlate.c:4511
*17 patch_port_output () at ofproto/ofproto-dpif-xlate.c:3889
*18 compose_output_action__ () at ofproto/ofproto-dpif-xlate.c:4204
*19 compose_output_action () at ofproto/ofproto-dpif-xlate.c:4359
*20 xlate_output_action () at ofproto/ofproto-dpif-xlate.c:5304
*21 do_xlate_actions () at ofproto/ofproto-dpif-xlate.c:7033
*22 xlate_actions () at ofproto/ofproto-dpif-xlate.c:8001
*23 xlate_key () at ofproto/ofproto-dpif-upcall.c:2321
*24 xlate_ukey () at ofproto/ofproto-dpif-upcall.c:2336
*25 revalidate_ukey__ () at ofproto/ofproto-dpif-upcall.c:2382
*26 revalidate_ukey () at ofproto/ofproto-dpif-upcall.c:2491
*27 revalidate () at ofproto/ofproto-dpif-upcall.c:2945
*28 udpif_revalidator () at ofproto/ofproto-dpif-upcall.c:1089
*29 ovsthread_wrapper () at lib/ovs-thread.c:422
*30 start_thread () from /lib64/libpthread.so.0
*31 clone () from /lib64/libc.so.6

Signed-off-by: LIU Yulong <[email protected]>
---
 ofproto/ofproto-dpif-xlate.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
index 920d998e6..b0d42d29d 100644
--- a/ofproto/ofproto-dpif-xlate.c
+++ b/ofproto/ofproto-dpif-xlate.c
@@ -4793,7 +4793,7 @@ xlate_table_action(struct xlate_ctx *ctx, ofp_port_t 
in_port, uint8_t table_id,
             tuple_swap(&ctx->xin->flow, ctx->wc);
         }
 
-        if (rule) {
+        if (rule && ofproto_rule_try_ref(&rule->up)) {
             /* Fill in the cache entry here instead of xlate_recursively
              * to make the reference counting more explicit.  We take a
              * reference in the lookups above if we are going to cache the
@@ -4804,6 +4804,8 @@ xlate_table_action(struct xlate_ctx *ctx, ofp_port_t 
in_port, uint8_t table_id,
                 entry = xlate_cache_add_entry(ctx->xin->xcache, XC_RULE);
                 entry->rule = rule;
                 ofproto_rule_ref(&rule->up);
+            } else {
+                ofproto_rule_unref(&rule->up);
             }
 
             struct ovs_list *old_trace = ctx->xin->trace;
-- 
2.50.1 (Apple Git-155)

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to