Thank you Eelco.
Code search shows we have `recv_upcalls` and `upcall_cb` which will call `upcall_uninit`. And dp_netdev_upcall will call the dp->upcall_cb. So we have call stacks like this: i) handle_packet_upcall->dp_netdev_upcall->upcall_cb->upcall_uninit ii) dp_execute_userspace_action->dp_netdev_upcall->upcall_cb->upcall_uninit Cloud you confirm these calls? For your change, I'll run tests with recoreded packets to verify. Regards, LIU Yulong ------------------ Original ------------------ From: "Eelco Chaudron"<[email protected]>; Date: Fri, Oct 17, 2025 08:08 PM To: "LIU Yulong"<[email protected]>; Cc: "dev"<[email protected]>; Subject: Re: [ovs-dev] [PATCH] ofproto-dpif-upcall: Add ovsrcu_postpone for ukey_delete__. Hi Liu, I looked at the change; however, upcall_uninit() is only called for newly created (never inserted) ukeys, so the ovs_postpone() call is not needed. However, I did find an issue in upcall_receive(), where, in an error path, it could use an uninitialized upcall structure — causing a ukey to be freed that should not have been. Can you try out the diff below to see if it fixes your problem? Cheers, Eelco diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c index b3b4b2d2f..53b906a16 100644 --- a/ofproto/ofproto-dpif-upcall.c +++ b/ofproto/ofproto-dpif-upcall.c @@ -1230,6 +1230,17 @@ upcall_receive(struct upcall *upcall, const struct dpif_backer *backer, { int error; + /* Initialize the minimal required fields in the upcall structure to ensure + * upcall_uninit() does not operate on invalid data. */ + upcall->have_recirc_ref = false; + upcall->xout_initialized = false; + upcall->ukey_persists = false; + upcall->ukey = NULL; + ofpbuf_use_stub(&upcall->odp_actions, upcall->odp_actions_stub, + sizeof upcall->odp_actions_stub); + ofpbuf_init(&upcall->put_actions, 0); + + upcall->type = classify_upcall(type, userdata, &upcall->cookie); if (upcall->type == BAD_UPCALL) { return EAGAIN; @@ -1258,19 +1269,11 @@ upcall_receive(struct upcall *upcall, const struct dpif_backer *backer, } upcall->recirc = NULL; - upcall->have_recirc_ref = false; upcall->flow = flow; upcall->packet = packet; upcall->ufid = ufid; upcall->pmd_id = pmd_id; - ofpbuf_use_stub(&upcall->odp_actions, upcall->odp_actions_stub, - sizeof upcall->odp_actions_stub); - ofpbuf_init(&upcall->put_actions, 0); - upcall->xout_initialized = false; - upcall->ukey_persists = false; - - upcall->ukey = NULL; upcall->key = NULL; upcall->key_len = 0; upcall->mru = mru; On 16 Oct 2025, at 3:12, LIU Yulong wrote: > We have such call stack of coredump: > *0 0x00007f7f197ae337 in raise () from /lib64/libc.so.6 > *1 0x00007f7f197afa28 in abort () from /lib64/libc.so.6 > *2 0x000055934ca4f4ee in ovs_abort_valist (err_no=<optimized out>, format=<optimized out>, args=args@entry=0x7f7f07530360) at lib/util.c:499 > *3 0x000055934ca4f584 in ovs_abort (err_no=err_no@entry=0, format=format@entry=0x55934ccedd18 "%s: %s() passed uninitialized ovs_mutex") at lib/util.c:491 > *4 0x000055934ca1a4a1 in ovs_mutex_trylock_at (l_=l_@entry=0x7f7ed4a43e58, where=where@entry=0x55934cccb318 "ofproto/ofproto-dpif-upcall.c:3014") at lib/ovs-thread.c:106 > *5 0x000055934c943181 in revalidator_sweep__ (revalidator=revalidator@entry=0x5593518c1720, purge=purge@entry=false) at ofproto/ofproto-dpif-upcall.c:3014 > *6 0x000055934c9471a6 in revalidator_sweep (revalidator=0x5593518c1720) at ofproto/ofproto-dpif-upcall.c:3072 > *7 udpif_revalidator (arg=0x5593518c1720) at ofproto/ofproto-dpif-upcall.c:1086 > *8 0x000055934ca1b05f in ovsthread_wrapper (aux_=<optimized out>) at lib/ovs-thread.c:422 > *9 0x00007f7f1b6ece65 in start_thread () from /lib64/libpthread.so.0 > *10 0x00007f7f1987688d in clone () from /lib64/libc.so.6 > > When calling ovs_mutex_trylock() on ukey->mutex, ovs_mutex_trylock_at > sees that the input is an "uninitialized ovs_mutex" (l->where is NULL), > and aborts. > > This state can only occur after the mutex has not been initialized or > has been destroyed. The mutex is definitely initialized in ukey_create__ > by ukey, so the "uninitialized" state is almost certainly "destroyed". > Destruction occurs in ukey_delete__, which calls ovs_mutex_destroy(&ukey->mutex) > and sets where to NULL. > > When revalidator_sweep__ is traversing cmap and trying to lock (&ukey->mutex), > it encounters a ukey that has been directly destroyed by ukey_delete__, > However, the ukey is still visible to the revalidator (either still in > the cmap or has not yet passed the RCU grace period), resulting in an > abort. That is to say, there is a path where ukey_delete__ is directly > called during concurrent traversal of ukey, bypassing the > cmap_remove + ovsrcu_postpone semantics of ukey_delete. > > Modify upcall_uninit() to change direct ukey_delete__ to RCU deferred > release to avoid concurrent traversal conflicts with revalidator. > This ensures that ukey_delete__ is not executed until after the > global grace period, and that the CMAP_FOR_EACH within > revalidator_sweep__ will not encounter a destroyed mutex before > the end of running cycle. > > Some earlier email discussions: > [1] https://mail.openvswitch.org/pipermail/ovs-discuss/2024-March/052973.html > [2] https://mail.openvswitch.org/pipermail/ovs-discuss/2024-February/052949.html > [3] https://mail.openvswitch.org/pipermail/ovs-discuss/2024-March/052993.html > > Signed-off-by: LIU Yulong <[email protected]> > --- > ofproto/ofproto-dpif-upcall.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c > index 9dfa52d82..b3b4b2d2f 100644 > --- a/ofproto/ofproto-dpif-upcall.c > +++ b/ofproto/ofproto-dpif-upcall.c > @@ -1386,7 +1386,7 @@ upcall_uninit(struct upcall *upcall) > ofpbuf_uninit(&upcall->put_actions); > if (upcall->ukey) { > if (!upcall->ukey_persists) { > - ukey_delete__(upcall->ukey); > + ovsrcu_postpone(ukey_delete__, upcall->ukey); > } > } else if (upcall->have_recirc_ref) { > /* The reference was transferred to the ukey if one was created. */ > -- > 2.50.1 (Apple Git-155) > > _______________________________________________ > dev mailing list > [email protected] > https://mail.openvswitch.org/mailman/listinfo/ovs-dev _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
