We have such call stack of coredump: *0 0x00007f7f197ae337 in raise () from /lib64/libc.so.6 *1 0x00007f7f197afa28 in abort () from /lib64/libc.so.6 *2 0x000055934ca4f4ee in ovs_abort_valist (err_no=<optimized out>, format=<optimized out>, args=args@entry=0x7f7f07530360) at lib/util.c:499 *3 0x000055934ca4f584 in ovs_abort (err_no=err_no@entry=0, format=format@entry=0x55934ccedd18 "%s: %s() passed uninitialized ovs_mutex") at lib/util.c:491 *4 0x000055934ca1a4a1 in ovs_mutex_trylock_at (l_=l_@entry=0x7f7ed4a43e58, where=where@entry=0x55934cccb318 "ofproto/ofproto-dpif-upcall.c:3014") at lib/ovs-thread.c:106 *5 0x000055934c943181 in revalidator_sweep__ (revalidator=revalidator@entry=0x5593518c1720, purge=purge@entry=false) at ofproto/ofproto-dpif-upcall.c:3014 *6 0x000055934c9471a6 in revalidator_sweep (revalidator=0x5593518c1720) at ofproto/ofproto-dpif-upcall.c:3072 *7 udpif_revalidator (arg=0x5593518c1720) at ofproto/ofproto-dpif-upcall.c:1086 *8 0x000055934ca1b05f in ovsthread_wrapper (aux_=<optimized out>) at lib/ovs-thread.c:422 *9 0x00007f7f1b6ece65 in start_thread () from /lib64/libpthread.so.0 *10 0x00007f7f1987688d in clone () from /lib64/libc.so.6
When calling ovs_mutex_trylock() on ukey->mutex, ovs_mutex_trylock_at sees that the input is an "uninitialized ovs_mutex" (l->where is NULL), and aborts. This state can only occur after the mutex has not been initialized or has been destroyed. The mutex is definitely initialized in ukey_create__ by ukey, so the "uninitialized" state is almost certainly "destroyed". Destruction occurs in ukey_delete__, which calls ovs_mutex_destroy(&ukey->mutex) and sets where to NULL. When revalidator_sweep__ is traversing cmap and trying to lock (&ukey->mutex), it encounters a ukey that has been directly destroyed by ukey_delete__, However, the ukey is still visible to the revalidator (either still in the cmap or has not yet passed the RCU grace period), resulting in an abort. That is to say, there is a path where ukey_delete__ is directly called during concurrent traversal of ukey, bypassing the cmap_remove + ovsrcu_postpone semantics of ukey_delete. Modify upcall_uninit() to change direct ukey_delete__ to RCU deferred release to avoid concurrent traversal conflicts with revalidator. This ensures that ukey_delete__ is not executed until after the global grace period, and that the CMAP_FOR_EACH within revalidator_sweep__ will not encounter a destroyed mutex before the end of running cycle. Some earlier email discussions: [1] https://mail.openvswitch.org/pipermail/ovs-discuss/2024-March/052973.html [2] https://mail.openvswitch.org/pipermail/ovs-discuss/2024-February/052949.html [3] https://mail.openvswitch.org/pipermail/ovs-discuss/2024-March/052993.html Signed-off-by: LIU Yulong <[email protected]> --- ofproto/ofproto-dpif-upcall.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c index 9dfa52d82..b3b4b2d2f 100644 --- a/ofproto/ofproto-dpif-upcall.c +++ b/ofproto/ofproto-dpif-upcall.c @@ -1386,7 +1386,7 @@ upcall_uninit(struct upcall *upcall) ofpbuf_uninit(&upcall->put_actions); if (upcall->ukey) { if (!upcall->ukey_persists) { - ukey_delete__(upcall->ukey); + ovsrcu_postpone(ukey_delete__, upcall->ukey); } } else if (upcall->have_recirc_ref) { /* The reference was transferred to the ukey if one was created. */ -- 2.50.1 (Apple Git-155) _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
