On Wed, May 13, 2026 at 11:41:02AM -0400, Steven Rostedt wrote:
> On Tue, 5 May 2026 17:09:34 +0000
> Dmitry Ilvokhin <[email protected]> wrote:
>
> > Use the arch-overridable queued_spin_release(), introduced in the
> > previous commit, to ensure the tracepoint works correctly across all
>
> Remove the ", introduced in the previous commit," That's useless in git
> change logs.
Thanks for the suggestion, will do here and in other places.
[...]
> > /**
> > * queued_spin_unlock - unlock a queued spinlock
> > * @lock : Pointer to queued spinlock structure
> > + *
> > + * Generic tracing wrapper around the arch-overridable
> > + * queued_spin_release().
> > */
> > static __always_inline void queued_spin_unlock(struct qspinlock *lock)
> > {
> > + /*
> > + * Trace and release are combined in queued_spin_release_traced() so
> > + * the compiler does not need to preserve the lock pointer across the
> > + * function call, avoiding callee-saved register save/restore on the
> > + * hot path.
> > + */
> > + if (tracepoint_enabled(contended_release)) {
> > + queued_spin_release_traced(lock);
> > + return;
>
> Get rid of the "return;". What does it save you? It just makes it that you
> need to duplicate the code. Even though it's a one liner, it can cause bugs
> in the future if this changes. You could call the function:
>
> do_trace_queued_spin_release_traced(lock);
>
>
> > + }
> > queued_spin_release(lock);
> > }
> >
> > diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
> > index af8d122bb649..649fdca69288 100644
> > --- a/kernel/locking/qspinlock.c
> > +++ b/kernel/locking/qspinlock.c
> > @@ -104,6 +104,14 @@ static __always_inline u32
> > __pv_wait_head_or_lock(struct qspinlock *lock,
> > #define queued_spin_lock_slowpath native_queued_spin_lock_slowpath
> > #endif
> >
> > +void __lockfunc queued_spin_release_traced(struct qspinlock *lock)
> > +{
> > + if (queued_spin_is_contended(lock))
> > + trace_call__contended_release(lock);
> > + queued_spin_release(lock);
>
> And then remove the duplicate call of "queued_spin_release()" here.
This is the scenario the comment above the static branch describes.
Here's what it looks like in practice on x86_64 (defconfig, compiled
with GCC 11).
Current design (trace + unlock combined, with return):
endbr64
xchg %ax,%ax ; NOP (static branch)
movb $0x0,(%rdi) ; unlock
decl %gs:__preempt_count
je preempt
jmp __x86_return_thunk
call queued_spin_release_traced ; cold
jmp preempt_handling ; cold
call __SCT__preempt_schedule
jmp __x86_return_thunk
With the trace-only function (no return, unlock after the call):
endbr64
push %rbx ; saves callee-saved rbx (!)
mov %rdi,%rbx ; preserve lock across call (!)
xchg %ax,%ax ; NOP (static branch)
movb $0x0,(%rbx) ; unlock
decl %gs:__preempt_count
je preempt
pop %rbx ; callee-saved restore (!)
jmp __x86_return_thunk
call queued_spin_release_traced ; cold
jmp unlock ; cold
call __SCT__preempt_schedule
pop %rbx
jmp __x86_return_thunk
Three extra instructions marked by "!" on the hot path (push, mov, pop),
all wasted when the tracepoint is off. That's the main reason for
combining trace and unlock in the same out-of-line function.