Hi Liu,
I looked at the change; however, upcall_uninit() is only called for newly
created (never inserted) ukeys, so the ovs_postpone() call is not needed.
However, I did find an issue in upcall_receive(), where, in an error path, it
could use an uninitialized upcall structure — causing a ukey to be freed that
should not have been.
Can you try out the diff below to see if it fixes your problem?
Cheers,
Eelco
diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
index b3b4b2d2f..53b906a16 100644
--- a/ofproto/ofproto-dpif-upcall.c
+++ b/ofproto/ofproto-dpif-upcall.c
@@ -1230,6 +1230,17 @@ upcall_receive(struct upcall *upcall, const struct
dpif_backer *backer,
{
int error;
+ /* Initialize the minimal required fields in the upcall structure to ensure
+ * upcall_uninit() does not operate on invalid data. */
+ upcall->have_recirc_ref = false;
+ upcall->xout_initialized = false;
+ upcall->ukey_persists = false;
+ upcall->ukey = NULL;
+ ofpbuf_use_stub(&upcall->odp_actions, upcall->odp_actions_stub,
+ sizeof upcall->odp_actions_stub);
+ ofpbuf_init(&upcall->put_actions, 0);
+
+
upcall->type = classify_upcall(type, userdata, &upcall->cookie);
if (upcall->type == BAD_UPCALL) {
return EAGAIN;
@@ -1258,19 +1269,11 @@ upcall_receive(struct upcall *upcall, const struct
dpif_backer *backer,
}
upcall->recirc = NULL;
- upcall->have_recirc_ref = false;
upcall->flow = flow;
upcall->packet = packet;
upcall->ufid = ufid;
upcall->pmd_id = pmd_id;
- ofpbuf_use_stub(&upcall->odp_actions, upcall->odp_actions_stub,
- sizeof upcall->odp_actions_stub);
- ofpbuf_init(&upcall->put_actions, 0);
- upcall->xout_initialized = false;
- upcall->ukey_persists = false;
-
- upcall->ukey = NULL;
upcall->key = NULL;
upcall->key_len = 0;
upcall->mru = mru;
On 16 Oct 2025, at 3:12, LIU Yulong wrote:
> We have such call stack of coredump:
> *0 0x00007f7f197ae337 in raise () from /lib64/libc.so.6
> *1 0x00007f7f197afa28 in abort () from /lib64/libc.so.6
> *2 0x000055934ca4f4ee in ovs_abort_valist (err_no=<optimized out>,
> format=<optimized out>, args=args@entry=0x7f7f07530360) at lib/util.c:499
> *3 0x000055934ca4f584 in ovs_abort (err_no=err_no@entry=0,
> format=format@entry=0x55934ccedd18 "%s: %s() passed uninitialized ovs_mutex")
> at lib/util.c:491
> *4 0x000055934ca1a4a1 in ovs_mutex_trylock_at (l_=l_@entry=0x7f7ed4a43e58,
> where=where@entry=0x55934cccb318 "ofproto/ofproto-dpif-upcall.c:3014") at
> lib/ovs-thread.c:106
> *5 0x000055934c943181 in revalidator_sweep__
> (revalidator=revalidator@entry=0x5593518c1720, purge=purge@entry=false) at
> ofproto/ofproto-dpif-upcall.c:3014
> *6 0x000055934c9471a6 in revalidator_sweep (revalidator=0x5593518c1720) at
> ofproto/ofproto-dpif-upcall.c:3072
> *7 udpif_revalidator (arg=0x5593518c1720) at
> ofproto/ofproto-dpif-upcall.c:1086
> *8 0x000055934ca1b05f in ovsthread_wrapper (aux_=<optimized out>) at
> lib/ovs-thread.c:422
> *9 0x00007f7f1b6ece65 in start_thread () from /lib64/libpthread.so.0
> *10 0x00007f7f1987688d in clone () from /lib64/libc.so.6
>
> When calling ovs_mutex_trylock() on ukey->mutex, ovs_mutex_trylock_at
> sees that the input is an "uninitialized ovs_mutex" (l->where is NULL),
> and aborts.
>
> This state can only occur after the mutex has not been initialized or
> has been destroyed. The mutex is definitely initialized in ukey_create__
> by ukey, so the "uninitialized" state is almost certainly "destroyed".
> Destruction occurs in ukey_delete__, which calls
> ovs_mutex_destroy(&ukey->mutex)
> and sets where to NULL.
>
> When revalidator_sweep__ is traversing cmap and trying to lock (&ukey->mutex),
> it encounters a ukey that has been directly destroyed by ukey_delete__,
> However, the ukey is still visible to the revalidator (either still in
> the cmap or has not yet passed the RCU grace period), resulting in an
> abort. That is to say, there is a path where ukey_delete__ is directly
> called during concurrent traversal of ukey, bypassing the
> cmap_remove + ovsrcu_postpone semantics of ukey_delete.
>
> Modify upcall_uninit() to change direct ukey_delete__ to RCU deferred
> release to avoid concurrent traversal conflicts with revalidator.
> This ensures that ukey_delete__ is not executed until after the
> global grace period, and that the CMAP_FOR_EACH within
> revalidator_sweep__ will not encounter a destroyed mutex before
> the end of running cycle.
>
> Some earlier email discussions:
> [1] https://mail.openvswitch.org/pipermail/ovs-discuss/2024-March/052973.html
> [2]
> https://mail.openvswitch.org/pipermail/ovs-discuss/2024-February/052949.html
> [3] https://mail.openvswitch.org/pipermail/ovs-discuss/2024-March/052993.html
>
> Signed-off-by: LIU Yulong <[email protected]>
> ---
> ofproto/ofproto-dpif-upcall.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
> index 9dfa52d82..b3b4b2d2f 100644
> --- a/ofproto/ofproto-dpif-upcall.c
> +++ b/ofproto/ofproto-dpif-upcall.c
> @@ -1386,7 +1386,7 @@ upcall_uninit(struct upcall *upcall)
> ofpbuf_uninit(&upcall->put_actions);
> if (upcall->ukey) {
> if (!upcall->ukey_persists) {
> - ukey_delete__(upcall->ukey);
> + ovsrcu_postpone(ukey_delete__, upcall->ukey);
> }
> } else if (upcall->have_recirc_ref) {
> /* The reference was transferred to the ukey if one was created.
> */
> --
> 2.50.1 (Apple Git-155)
>
> _______________________________________________
> dev mailing list
> [email protected]
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev