On 20 Oct 2025, at 12:35, LIU Yulong wrote:

> Thank you Eelco.
>
>
> Code search shows we have `recv_upcalls` and `upcall_cb` which will call 
> `upcall_uninit`.
> And dp_netdev_upcall will call the dp->upcall_cb.
> So we have call stacks like this:
> i) handle_packet_upcall->dp_netdev_upcall->upcall_cb->upcall_uninit
> ii) 
> dp_execute_userspace_action->dp_netdev_upcall->upcall_cb->upcall_uninit
>
> Cloud you confirm these calls?

From the top of my head, this is correct. However, the new ukey structure is 
never inserted, so we do not need the RCU-delayed remove.

> For your change, I'll run tests with recoreded packets to verify.

Thanks, and let me know the results.

//Eelco

>
> Regards,
>
>
> LIU Yulong
>  
>  
> ------------------ Original ------------------
> From: &nbsp;"Eelco&nbsp;Chaudron"<[email protected]&gt;;
> Date: &nbsp;Fri, Oct 17, 2025 08:08 PM
> To: &nbsp;"LIU Yulong"<[email protected]&gt;;
> Cc: &nbsp;"dev"<[email protected]&gt;;
> Subject: &nbsp;Re: [ovs-dev] [PATCH] ofproto-dpif-upcall: Add ovsrcu_postpone 
> for ukey_delete__.
>
> &nbsp;
> Hi Liu,
>
> I looked at the change; however, upcall_uninit() is only called for newly 
> created (never inserted) ukeys, so the ovs_postpone() call is not needed.
>
> However, I did find an issue in upcall_receive(), where, in an error path, it 
> could use an uninitialized upcall structure — causing a ukey to be freed that 
> should not have been.
>
> Can you try out the diff below to see if it fixes your problem?
>
> Cheers,
>
> Eelco
>
> diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
> index b3b4b2d2f..53b906a16 100644
> --- a/ofproto/ofproto-dpif-upcall.c
> +++ b/ofproto/ofproto-dpif-upcall.c
> @@ -1230,6 +1230,17 @@ upcall_receive(struct upcall *upcall, const struct 
> dpif_backer *backer,
> {
>  int error;
>
> +&nbsp;&nbsp;&nbsp; /* Initialize the minimal required fields in the upcall 
> structure to ensure
> +&nbsp;&nbsp;&nbsp;&nbsp; * upcall_uninit() does not operate on invalid data. 
> */
> +&nbsp;&nbsp;&nbsp; upcall-&gt;have_recirc_ref = false;
> +&nbsp;&nbsp;&nbsp; upcall-&gt;xout_initialized = false;
> +&nbsp;&nbsp;&nbsp; upcall-&gt;ukey_persists = false;
> +&nbsp;&nbsp;&nbsp; upcall-&gt;ukey = NULL;
> +&nbsp;&nbsp;&nbsp; ofpbuf_use_stub(&amp;upcall-&gt;odp_actions, 
> upcall-&gt;odp_actions_stub,
> +&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
>  sizeof upcall-&gt;odp_actions_stub);
> +&nbsp;&nbsp;&nbsp; ofpbuf_init(&amp;upcall-&gt;put_actions, 0);
> +
> +
>  upcall-&gt;type = classify_upcall(type, userdata, &amp;upcall-&gt;cookie);
>  if (upcall-&gt;type == BAD_UPCALL) {
>  return EAGAIN;
> @@ -1258,19 +1269,11 @@ upcall_receive(struct upcall *upcall, const struct 
> dpif_backer *backer,
>  }
>
>  upcall-&gt;recirc = NULL;
> -&nbsp;&nbsp;&nbsp; upcall-&gt;have_recirc_ref = false;
>  upcall-&gt;flow = flow;
>  upcall-&gt;packet = packet;
>  upcall-&gt;ufid = ufid;
>  upcall-&gt;pmd_id = pmd_id;
> -&nbsp;&nbsp;&nbsp; ofpbuf_use_stub(&amp;upcall-&gt;odp_actions, 
> upcall-&gt;odp_actions_stub,
> -&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
>  sizeof upcall-&gt;odp_actions_stub);
> -&nbsp;&nbsp;&nbsp; ofpbuf_init(&amp;upcall-&gt;put_actions, 0);
>
> -&nbsp;&nbsp;&nbsp; upcall-&gt;xout_initialized = false;
> -&nbsp;&nbsp;&nbsp; upcall-&gt;ukey_persists = false;
> -
> -&nbsp;&nbsp;&nbsp; upcall-&gt;ukey = NULL;
>  upcall-&gt;key = NULL;
>  upcall-&gt;key_len = 0;
>  upcall-&gt;mru = mru;
>
>
> On 16 Oct 2025, at 3:12, LIU Yulong wrote:
>
> &gt; We have such call stack of coredump:
> &gt; *0&nbsp; 0x00007f7f197ae337 in raise () from /lib64/libc.so.6
> &gt; *1&nbsp; 0x00007f7f197afa28 in abort () from /lib64/libc.so.6
> &gt; *2&nbsp; 0x000055934ca4f4ee in ovs_abort_valist (err_no=<optimized 
> out&gt;, format=<optimized out&gt;, args=args@entry=0x7f7f07530360) at 
> lib/util.c:499
> &gt; *3&nbsp; 0x000055934ca4f584 in ovs_abort (err_no=err_no@entry=0, 
> format=format@entry=0x55934ccedd18 "%s: %s() passed uninitialized ovs_mutex") 
> at lib/util.c:491
> &gt; *4&nbsp; 0x000055934ca1a4a1 in ovs_mutex_trylock_at 
> (l_=l_@entry=0x7f7ed4a43e58, where=where@entry=0x55934cccb318 
> "ofproto/ofproto-dpif-upcall.c:3014") at lib/ovs-thread.c:106
> &gt; *5&nbsp; 0x000055934c943181 in revalidator_sweep__ 
> (revalidator=revalidator@entry=0x5593518c1720, purge=purge@entry=false) at 
> ofproto/ofproto-dpif-upcall.c:3014
> &gt; *6&nbsp; 0x000055934c9471a6 in revalidator_sweep 
> (revalidator=0x5593518c1720) at ofproto/ofproto-dpif-upcall.c:3072
> &gt; *7&nbsp; udpif_revalidator (arg=0x5593518c1720) at 
> ofproto/ofproto-dpif-upcall.c:1086
> &gt; *8&nbsp; 0x000055934ca1b05f in ovsthread_wrapper (aux_=<optimized 
> out&gt;) at lib/ovs-thread.c:422
> &gt; *9&nbsp; 0x00007f7f1b6ece65 in start_thread () from 
> /lib64/libpthread.so.0
> &gt; *10 0x00007f7f1987688d in clone () from /lib64/libc.so.6
> &gt;
> &gt; When calling ovs_mutex_trylock() on ukey-&gt;mutex, ovs_mutex_trylock_at
> &gt; sees that the input is an "uninitialized ovs_mutex" (l-&gt;where is 
> NULL),
> &gt; and aborts.
> &gt;
> &gt; This state can only occur after the mutex has not been initialized or
> &gt; has been destroyed. The mutex is definitely initialized in ukey_create__
> &gt; by ukey, so the "uninitialized" state is almost certainly "destroyed".
> &gt; Destruction occurs in ukey_delete__, which calls 
> ovs_mutex_destroy(&amp;ukey-&gt;mutex)
> &gt; and sets where to NULL.
> &gt;
> &gt; When revalidator_sweep__ is traversing cmap and trying to lock 
> (&amp;ukey-&gt;mutex),
> &gt; it encounters a ukey that has been directly destroyed by ukey_delete__,
> &gt; However, the ukey is still visible to the revalidator (either still in
> &gt; the cmap or has not yet passed the RCU grace period), resulting in an
> &gt; abort. That is to say, there is a path where ukey_delete__ is directly
> &gt; called during concurrent traversal of ukey, bypassing the
> &gt; cmap_remove + ovsrcu_postpone semantics of ukey_delete.
> &gt;
> &gt; Modify upcall_uninit() to change direct ukey_delete__ to RCU deferred
> &gt; release to avoid concurrent traversal conflicts with revalidator.
> &gt; This ensures that ukey_delete__ is not executed until after the
> &gt; global grace period, and that the CMAP_FOR_EACH within
> &gt; revalidator_sweep__ will not encounter a destroyed mutex before
> &gt; the end of running cycle.
> &gt;
> &gt; Some earlier email discussions:
> &gt; [1] 
> https://mail.openvswitch.org/pipermail/ovs-discuss/2024-March/052973.html
> &gt; [2] 
> https://mail.openvswitch.org/pipermail/ovs-discuss/2024-February/052949.html
> &gt; [3] 
> https://mail.openvswitch.org/pipermail/ovs-discuss/2024-March/052993.html
> &gt;
> &gt; Signed-off-by: LIU Yulong <[email protected]&gt;
> &gt; ---
> &gt;&nbsp; ofproto/ofproto-dpif-upcall.c | 2 +-
> &gt;&nbsp; 1 file changed, 1 insertion(+), 1 deletion(-)
> &gt;
> &gt; diff --git a/ofproto/ofproto-dpif-upcall.c 
> b/ofproto/ofproto-dpif-upcall.c
> &gt; index 9dfa52d82..b3b4b2d2f 100644
> &gt; --- a/ofproto/ofproto-dpif-upcall.c
> &gt; +++ b/ofproto/ofproto-dpif-upcall.c
> &gt; @@ -1386,7 +1386,7 @@ upcall_uninit(struct upcall *upcall)
> &gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 
> ofpbuf_uninit(&amp;upcall-&gt;put_actions);
> &gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; if 
> (upcall-&gt;ukey) {
> &gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
>  if (!upcall-&gt;ukey_persists) {
> &gt; 
> -&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
>  ukey_delete__(upcall-&gt;ukey);
> &gt; 
> +&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
>  ovsrcu_postpone(ukey_delete__, upcall-&gt;ukey);
> &gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
>  }
> &gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; } else if 
> (upcall-&gt;have_recirc_ref) {
> &gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
>  /* The reference was transferred to the ukey if one was created. */
> &gt; --
> &gt; 2.50.1 (Apple Git-155)
> &gt;
> &gt; _______________________________________________
> &gt; dev mailing list
> &gt; [email protected]
> &gt; https://mail.openvswitch.org/mailman/listinfo/ovs-dev

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to