ovs-vswitchd could crash under these circumstances:
1. When one bridge is being destroyed, ofproto_destroy() is called and
connmgr pointer of its ofproto struct is nullified. This ofproto struct is
deallocated through 'ovsrcu_postpone(ofproto_destroy_defer__, p);'.
2. Before RCU enters quiesce state to actually free this ofproto struct,
revalidator thread calls udpif_revalidator(), which could handle
a learn flow and calls ofproto_flow_mod_learn(), it later calls
ofmonitor_report() and ofproto struct's connmgr pointer is accessed.

The crash stack trace is shown below:

0  ofmonitor_report (mgr=0x0, rule=rule@entry=0x7fa4ac067c30, 
event=event@entry=NXFME_ADDED,
    reason=reason@entry=OFPRR_IDLE_TIMEOUT, abbrev_ofconn=0x0, abbrev_xid=0, 
old_actions=old_actions@entry=0x0)
    at ofproto/connmgr.c:2160
1  0x00007fa4d6803495 in add_flow_finish (ofproto=0x55d9075d4ab0, 
ofm=<optimized out>, req=req@entry=0x0)
    at ofproto/ofproto.c:5221
2  0x00007fa4d68036af in modify_flows_finish (req=0x0, ofm=0x7fa4980753f0, 
ofproto=0x55d9075d4ab0)
    at ofproto/ofproto.c:5823
3  ofproto_flow_mod_finish (ofproto=0x55d9075d4ab0, 
ofm=ofm@entry=0x7fa4980753f0, req=req@entry=0x0)
    at ofproto/ofproto.c:8088
4  0x00007fa4d680372d in ofproto_flow_mod_learn_finish 
(ofm=ofm@entry=0x7fa4980753f0,
    orig_ofproto=orig_ofproto@entry=0x0) at ofproto/ofproto.c:5439
5  0x00007fa4d68072f9 in ofproto_flow_mod_learn (ofm=0x7fa4980753f0, 
keep_ref=keep_ref@entry=true,
    limit=<optimized out>, below_limitp=below_limitp@entry=0x0) at 
ofproto/ofproto.c:5499
6  0x00007fa4d6835d33 in xlate_push_stats_entry (entry=0x7fa498012448, 
stats=stats@entry=0x7fa4d2701a10,
    offloaded=offloaded@entry=false) at ofproto/ofproto-dpif-xlate-cache.c:127
7  0x00007fa4d6835e3a in xlate_push_stats (xcache=<optimized out>, 
stats=stats@entry=0x7fa4d2701a10,
    offloaded=offloaded@entry=false) at ofproto/ofproto-dpif-xlate-cache.c:181
8  0x00007fa4d6822046 in revalidate_ukey (udpif=udpif@entry=0x55d90760b240, 
ukey=ukey@entry=0x7fa4b0191660,
    stats=stats@entry=0x7fa4d2705118, 
odp_actions=odp_actions@entry=0x7fa4d2701b50,
    reval_seq=reval_seq@entry=5655486242, recircs=recircs@entry=0x7fa4d2701b40, 
offloaded=false)
    at ofproto/ofproto-dpif-upcall.c:2294
9  0x00007fa4d6825aee in revalidate (revalidator=0x55d90769dd00) at 
ofproto/ofproto-dpif-upcall.c:2683
10 0x00007fa4d6825cf3 in udpif_revalidator (arg=0x55d90769dd00) at 
ofproto/ofproto-dpif-upcall.c:936
11 0x00007fa4d6259c9f in ovsthread_wrapper (aux_=<optimized out>) at 
lib/ovs-thread.c:423
12 0x00007fa4d582cea5 in start_thread () from /usr/lib64/libpthread.so.0
13 0x00007fa4d504b96d in clone () from /usr/lib64/libc.so.6

At the time of crash, the involved ofproto was already deallocated:

(gdb) print *ofproto
$1 = ..., name = 0x55d907602820 "nsx-managed", ..., ports = {...,
    one = 0x0, mask = 63, n = 0}, ..., connmgr = 0x0, ...

This patch fixes it.

VMware-BZ: #2700626
Signed-off-by: Yifeng Sun <pkusunyif...@gmail.com>
---
v1->v2: Add check for ofmonitor_flush, thanks William.

 ofproto/connmgr.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/ofproto/connmgr.c b/ofproto/connmgr.c
index 9c5c633b4171..fa8f6cd0e83a 100644
--- a/ofproto/connmgr.c
+++ b/ofproto/connmgr.c
@@ -2140,7 +2140,7 @@ ofmonitor_report(struct connmgr *mgr, struct rule *rule,
                  const struct rule_actions *old_actions)
     OVS_REQUIRES(ofproto_mutex)
 {
-    if (rule_is_hidden(rule)) {
+    if (!mgr || rule_is_hidden(rule)) {
         return;
     }
 
@@ -2244,6 +2244,10 @@ ofmonitor_flush(struct connmgr *mgr)
 {
     struct ofconn *ofconn;
 
+    if (!mgr) {
+        return;
+    }
+
     LIST_FOR_EACH (ofconn, connmgr_node, &mgr->conns) {
         struct rconn_packet_counter *counter = ofconn->monitor_counter;
 
-- 
2.7.4

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to