Hi Liu,

I looked at the change; however, upcall_uninit() is only called for newly 
created (never inserted) ukeys, so the ovs_postpone() call is not needed.

However, I did find an issue in upcall_receive(), where, in an error path, it 
could use an uninitialized upcall structure — causing a ukey to be freed that 
should not have been.

Can you try out the diff below to see if it fixes your problem?

Cheers,

Eelco

diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
index b3b4b2d2f..53b906a16 100644
--- a/ofproto/ofproto-dpif-upcall.c
+++ b/ofproto/ofproto-dpif-upcall.c
@@ -1230,6 +1230,17 @@ upcall_receive(struct upcall *upcall, const struct 
dpif_backer *backer,
 {
     int error;

+    /* Initialize the minimal required fields in the upcall structure to ensure
+     * upcall_uninit() does not operate on invalid data. */
+    upcall->have_recirc_ref = false;
+    upcall->xout_initialized = false;
+    upcall->ukey_persists = false;
+    upcall->ukey = NULL;
+    ofpbuf_use_stub(&upcall->odp_actions, upcall->odp_actions_stub,
+                    sizeof upcall->odp_actions_stub);
+    ofpbuf_init(&upcall->put_actions, 0);
+
+
     upcall->type = classify_upcall(type, userdata, &upcall->cookie);
     if (upcall->type == BAD_UPCALL) {
         return EAGAIN;
@@ -1258,19 +1269,11 @@ upcall_receive(struct upcall *upcall, const struct 
dpif_backer *backer,
     }

     upcall->recirc = NULL;
-    upcall->have_recirc_ref = false;
     upcall->flow = flow;
     upcall->packet = packet;
     upcall->ufid = ufid;
     upcall->pmd_id = pmd_id;
-    ofpbuf_use_stub(&upcall->odp_actions, upcall->odp_actions_stub,
-                    sizeof upcall->odp_actions_stub);
-    ofpbuf_init(&upcall->put_actions, 0);

-    upcall->xout_initialized = false;
-    upcall->ukey_persists = false;
-
-    upcall->ukey = NULL;
     upcall->key = NULL;
     upcall->key_len = 0;
     upcall->mru = mru;


On 16 Oct 2025, at 3:12, LIU Yulong wrote:

> We have such call stack of coredump:
> *0  0x00007f7f197ae337 in raise () from /lib64/libc.so.6
> *1  0x00007f7f197afa28 in abort () from /lib64/libc.so.6
> *2  0x000055934ca4f4ee in ovs_abort_valist (err_no=<optimized out>, 
> format=<optimized out>, args=args@entry=0x7f7f07530360) at lib/util.c:499
> *3  0x000055934ca4f584 in ovs_abort (err_no=err_no@entry=0, 
> format=format@entry=0x55934ccedd18 "%s: %s() passed uninitialized ovs_mutex") 
> at lib/util.c:491
> *4  0x000055934ca1a4a1 in ovs_mutex_trylock_at (l_=l_@entry=0x7f7ed4a43e58, 
> where=where@entry=0x55934cccb318 "ofproto/ofproto-dpif-upcall.c:3014") at 
> lib/ovs-thread.c:106
> *5  0x000055934c943181 in revalidator_sweep__ 
> (revalidator=revalidator@entry=0x5593518c1720, purge=purge@entry=false) at 
> ofproto/ofproto-dpif-upcall.c:3014
> *6  0x000055934c9471a6 in revalidator_sweep (revalidator=0x5593518c1720) at 
> ofproto/ofproto-dpif-upcall.c:3072
> *7  udpif_revalidator (arg=0x5593518c1720) at 
> ofproto/ofproto-dpif-upcall.c:1086
> *8  0x000055934ca1b05f in ovsthread_wrapper (aux_=<optimized out>) at 
> lib/ovs-thread.c:422
> *9  0x00007f7f1b6ece65 in start_thread () from /lib64/libpthread.so.0
> *10 0x00007f7f1987688d in clone () from /lib64/libc.so.6
>
> When calling ovs_mutex_trylock() on ukey->mutex, ovs_mutex_trylock_at
> sees that the input is an "uninitialized ovs_mutex" (l->where is NULL),
> and aborts.
>
> This state can only occur after the mutex has not been initialized or
> has been destroyed. The mutex is definitely initialized in ukey_create__
> by ukey, so the "uninitialized" state is almost certainly "destroyed".
> Destruction occurs in ukey_delete__, which calls 
> ovs_mutex_destroy(&ukey->mutex)
> and sets where to NULL.
>
> When revalidator_sweep__ is traversing cmap and trying to lock (&ukey->mutex),
> it encounters a ukey that has been directly destroyed by ukey_delete__,
> However, the ukey is still visible to the revalidator (either still in
> the cmap or has not yet passed the RCU grace period), resulting in an
> abort. That is to say, there is a path where ukey_delete__ is directly
> called during concurrent traversal of ukey, bypassing the
> cmap_remove + ovsrcu_postpone semantics of ukey_delete.
>
> Modify upcall_uninit() to change direct ukey_delete__ to RCU deferred
> release to avoid concurrent traversal conflicts with revalidator.
> This ensures that ukey_delete__ is not executed until after the
> global grace period, and that the CMAP_FOR_EACH within
> revalidator_sweep__ will not encounter a destroyed mutex before
> the end of running cycle.
>
> Some earlier email discussions:
> [1] https://mail.openvswitch.org/pipermail/ovs-discuss/2024-March/052973.html
> [2] 
> https://mail.openvswitch.org/pipermail/ovs-discuss/2024-February/052949.html
> [3] https://mail.openvswitch.org/pipermail/ovs-discuss/2024-March/052993.html
>
> Signed-off-by: LIU Yulong <[email protected]>
> ---
>  ofproto/ofproto-dpif-upcall.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
> index 9dfa52d82..b3b4b2d2f 100644
> --- a/ofproto/ofproto-dpif-upcall.c
> +++ b/ofproto/ofproto-dpif-upcall.c
> @@ -1386,7 +1386,7 @@ upcall_uninit(struct upcall *upcall)
>          ofpbuf_uninit(&upcall->put_actions);
>          if (upcall->ukey) {
>              if (!upcall->ukey_persists) {
> -                ukey_delete__(upcall->ukey);
> +                ovsrcu_postpone(ukey_delete__, upcall->ukey);
>              }
>          } else if (upcall->have_recirc_ref) {
>              /* The reference was transferred to the ukey if one was created. 
> */
> -- 
> 2.50.1 (Apple Git-155)
>
> _______________________________________________
> dev mailing list
> [email protected]
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to