On 20 Oct 2025, at 12:35, LIU Yulong wrote:
> Thank you Eelco. > > > Code search shows we have `recv_upcalls` and `upcall_cb` which will call > `upcall_uninit`. > And dp_netdev_upcall will call the dp->upcall_cb. > So we have call stacks like this: > i) handle_packet_upcall->dp_netdev_upcall->upcall_cb->upcall_uninit > ii) > dp_execute_userspace_action->dp_netdev_upcall->upcall_cb->upcall_uninit > > Cloud you confirm these calls? From the top of my head, this is correct. However, the new ukey structure is never inserted, so we do not need the RCU-delayed remove. > For your change, I'll run tests with recoreded packets to verify. Thanks, and let me know the results. //Eelco > > Regards, > > > LIU Yulong > > > ------------------ Original ------------------ > From: "Eelco Chaudron"<[email protected]>; > Date: Fri, Oct 17, 2025 08:08 PM > To: "LIU Yulong"<[email protected]>; > Cc: "dev"<[email protected]>; > Subject: Re: [ovs-dev] [PATCH] ofproto-dpif-upcall: Add ovsrcu_postpone > for ukey_delete__. > > > Hi Liu, > > I looked at the change; however, upcall_uninit() is only called for newly > created (never inserted) ukeys, so the ovs_postpone() call is not needed. > > However, I did find an issue in upcall_receive(), where, in an error path, it > could use an uninitialized upcall structure — causing a ukey to be freed that > should not have been. > > Can you try out the diff below to see if it fixes your problem? > > Cheers, > > Eelco > > diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c > index b3b4b2d2f..53b906a16 100644 > --- a/ofproto/ofproto-dpif-upcall.c > +++ b/ofproto/ofproto-dpif-upcall.c > @@ -1230,6 +1230,17 @@ upcall_receive(struct upcall *upcall, const struct > dpif_backer *backer, > { > int error; > > + /* Initialize the minimal required fields in the upcall > structure to ensure > + * upcall_uninit() does not operate on invalid data. > */ > + upcall->have_recirc_ref = false; > + upcall->xout_initialized = false; > + upcall->ukey_persists = false; > + upcall->ukey = NULL; > + ofpbuf_use_stub(&upcall->odp_actions, > upcall->odp_actions_stub, > + > sizeof upcall->odp_actions_stub); > + ofpbuf_init(&upcall->put_actions, 0); > + > + > upcall->type = classify_upcall(type, userdata, &upcall->cookie); > if (upcall->type == BAD_UPCALL) { > return EAGAIN; > @@ -1258,19 +1269,11 @@ upcall_receive(struct upcall *upcall, const struct > dpif_backer *backer, > } > > upcall->recirc = NULL; > - upcall->have_recirc_ref = false; > upcall->flow = flow; > upcall->packet = packet; > upcall->ufid = ufid; > upcall->pmd_id = pmd_id; > - ofpbuf_use_stub(&upcall->odp_actions, > upcall->odp_actions_stub, > - > sizeof upcall->odp_actions_stub); > - ofpbuf_init(&upcall->put_actions, 0); > > - upcall->xout_initialized = false; > - upcall->ukey_persists = false; > - > - upcall->ukey = NULL; > upcall->key = NULL; > upcall->key_len = 0; > upcall->mru = mru; > > > On 16 Oct 2025, at 3:12, LIU Yulong wrote: > > > We have such call stack of coredump: > > *0 0x00007f7f197ae337 in raise () from /lib64/libc.so.6 > > *1 0x00007f7f197afa28 in abort () from /lib64/libc.so.6 > > *2 0x000055934ca4f4ee in ovs_abort_valist (err_no=<optimized > out>, format=<optimized out>, args=args@entry=0x7f7f07530360) at > lib/util.c:499 > > *3 0x000055934ca4f584 in ovs_abort (err_no=err_no@entry=0, > format=format@entry=0x55934ccedd18 "%s: %s() passed uninitialized ovs_mutex") > at lib/util.c:491 > > *4 0x000055934ca1a4a1 in ovs_mutex_trylock_at > (l_=l_@entry=0x7f7ed4a43e58, where=where@entry=0x55934cccb318 > "ofproto/ofproto-dpif-upcall.c:3014") at lib/ovs-thread.c:106 > > *5 0x000055934c943181 in revalidator_sweep__ > (revalidator=revalidator@entry=0x5593518c1720, purge=purge@entry=false) at > ofproto/ofproto-dpif-upcall.c:3014 > > *6 0x000055934c9471a6 in revalidator_sweep > (revalidator=0x5593518c1720) at ofproto/ofproto-dpif-upcall.c:3072 > > *7 udpif_revalidator (arg=0x5593518c1720) at > ofproto/ofproto-dpif-upcall.c:1086 > > *8 0x000055934ca1b05f in ovsthread_wrapper (aux_=<optimized > out>) at lib/ovs-thread.c:422 > > *9 0x00007f7f1b6ece65 in start_thread () from > /lib64/libpthread.so.0 > > *10 0x00007f7f1987688d in clone () from /lib64/libc.so.6 > > > > When calling ovs_mutex_trylock() on ukey->mutex, ovs_mutex_trylock_at > > sees that the input is an "uninitialized ovs_mutex" (l->where is > NULL), > > and aborts. > > > > This state can only occur after the mutex has not been initialized or > > has been destroyed. The mutex is definitely initialized in ukey_create__ > > by ukey, so the "uninitialized" state is almost certainly "destroyed". > > Destruction occurs in ukey_delete__, which calls > ovs_mutex_destroy(&ukey->mutex) > > and sets where to NULL. > > > > When revalidator_sweep__ is traversing cmap and trying to lock > (&ukey->mutex), > > it encounters a ukey that has been directly destroyed by ukey_delete__, > > However, the ukey is still visible to the revalidator (either still in > > the cmap or has not yet passed the RCU grace period), resulting in an > > abort. That is to say, there is a path where ukey_delete__ is directly > > called during concurrent traversal of ukey, bypassing the > > cmap_remove + ovsrcu_postpone semantics of ukey_delete. > > > > Modify upcall_uninit() to change direct ukey_delete__ to RCU deferred > > release to avoid concurrent traversal conflicts with revalidator. > > This ensures that ukey_delete__ is not executed until after the > > global grace period, and that the CMAP_FOR_EACH within > > revalidator_sweep__ will not encounter a destroyed mutex before > > the end of running cycle. > > > > Some earlier email discussions: > > [1] > https://mail.openvswitch.org/pipermail/ovs-discuss/2024-March/052973.html > > [2] > https://mail.openvswitch.org/pipermail/ovs-discuss/2024-February/052949.html > > [3] > https://mail.openvswitch.org/pipermail/ovs-discuss/2024-March/052993.html > > > > Signed-off-by: LIU Yulong <[email protected]> > > --- > > ofproto/ofproto-dpif-upcall.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/ofproto/ofproto-dpif-upcall.c > b/ofproto/ofproto-dpif-upcall.c > > index 9dfa52d82..b3b4b2d2f 100644 > > --- a/ofproto/ofproto-dpif-upcall.c > > +++ b/ofproto/ofproto-dpif-upcall.c > > @@ -1386,7 +1386,7 @@ upcall_uninit(struct upcall *upcall) > > > ofpbuf_uninit(&upcall->put_actions); > > if > (upcall->ukey) { > > > if (!upcall->ukey_persists) { > > > - > ukey_delete__(upcall->ukey); > > > + > ovsrcu_postpone(ukey_delete__, upcall->ukey); > > > } > > } else if > (upcall->have_recirc_ref) { > > > /* The reference was transferred to the ukey if one was created. */ > > -- > > 2.50.1 (Apple Git-155) > > > > _______________________________________________ > > dev mailing list > > [email protected] > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
