On Thu, May 14, 2026 at 12:03:48PM -0400, Steven Rostedt wrote: > On Thu, 14 May 2026 14:13:35 +0000 > Dmitry Ilvokhin <[email protected]> wrote: > > > > > +void __lockfunc queued_spin_release_traced(struct qspinlock *lock) > > > > +{ > > > > + if (queued_spin_is_contended(lock)) > > > > + trace_call__contended_release(lock); > > > > + queued_spin_release(lock); > > > > > > And then remove the duplicate call of "queued_spin_release()" here. > > > > This is the scenario the comment above the static branch describes. > > Here's what it looks like in practice on x86_64 (defconfig, compiled > > with GCC 11). > > > > Current design (trace + unlock combined, with return): > > > > endbr64 > > xchg %ax,%ax ; NOP (static branch) > > movb $0x0,(%rdi) ; unlock > > decl %gs:__preempt_count > > je preempt > > jmp __x86_return_thunk > > call queued_spin_release_traced ; cold > > jmp preempt_handling ; cold > > call __SCT__preempt_schedule > > jmp __x86_return_thunk > > > > With the trace-only function (no return, unlock after the call): > > > > endbr64 > > push %rbx ; saves callee-saved rbx (!) > > mov %rdi,%rbx ; preserve lock across call (!) > > xchg %ax,%ax ; NOP (static branch) > > movb $0x0,(%rbx) ; unlock > > decl %gs:__preempt_count > > je preempt > > pop %rbx ; callee-saved restore (!) > > jmp __x86_return_thunk > > call queued_spin_release_traced ; cold > > jmp unlock ; cold > > call __SCT__preempt_schedule > > pop %rbx > > jmp __x86_return_thunk > > > > Three extra instructions marked by "!" on the hot path (push, mov, pop), > > all wasted when the tracepoint is off. That's the main reason for > > combining trace and unlock in the same out-of-line function. > > Ah, because the return makes it into two tail calls. > > I still don't like the duplication, perhaps add some more comments about > needing to update the other location if anything changes here? And perhaps > comment that this duplicate code helps the assembly.
My idea was that queued_spin_release() serves the same role that the old queued_spin_unlock() had: a pure lock-release primitive without tracing. That was the primary motivation for extracting queued_spin_release() in the first place (it is just one line of code), so the common release logic between the traced and non-traced paths is shared explicitly rather than duplicated semantically. I agree that this could be explained better. I'll add more comments there to clarify the rationale. Thanks for the suggestion, Steve. > > -- Steve >
