On Wed, May 13, 2026 at 11:41:02AM -0400, Steven Rostedt wrote:
> On Tue,  5 May 2026 17:09:34 +0000
> Dmitry Ilvokhin <[email protected]> wrote:
> 
> > Use the arch-overridable queued_spin_release(), introduced in the
> > previous commit, to ensure the tracepoint works correctly across all
> 
> Remove the ", introduced in the previous commit," That's useless in git
> change logs.

Thanks for the suggestion, will do here and in other places.

[...]

> >  /**
> >   * queued_spin_unlock - unlock a queued spinlock
> >   * @lock : Pointer to queued spinlock structure
> > + *
> > + * Generic tracing wrapper around the arch-overridable
> > + * queued_spin_release().
> >   */
> >  static __always_inline void queued_spin_unlock(struct qspinlock *lock)
> >  {
> > +   /*
> > +    * Trace and release are combined in queued_spin_release_traced() so
> > +    * the compiler does not need to preserve the lock pointer across the
> > +    * function call, avoiding callee-saved register save/restore on the
> > +    * hot path.
> > +    */
> > +   if (tracepoint_enabled(contended_release)) {
> > +           queued_spin_release_traced(lock);
> > +           return;
> 
> Get rid of the "return;". What does it save you? It just makes it that you
> need to duplicate the code. Even though it's a one liner, it can cause bugs
> in the future if this changes. You could call the function:
> 
>   do_trace_queued_spin_release_traced(lock);
> 
> 
> > +   }
> >     queued_spin_release(lock);
> >  }
> >  
> > diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
> > index af8d122bb649..649fdca69288 100644
> > --- a/kernel/locking/qspinlock.c
> > +++ b/kernel/locking/qspinlock.c
> > @@ -104,6 +104,14 @@ static __always_inline u32  
> > __pv_wait_head_or_lock(struct qspinlock *lock,
> >  #define queued_spin_lock_slowpath  native_queued_spin_lock_slowpath
> >  #endif
> >  
> > +void __lockfunc queued_spin_release_traced(struct qspinlock *lock)
> > +{
> > +   if (queued_spin_is_contended(lock))
> > +           trace_call__contended_release(lock);
> > +   queued_spin_release(lock);
> 
> And then remove the duplicate call of "queued_spin_release()" here.

This is the scenario the comment above the static branch describes.
Here's what it looks like in practice on x86_64 (defconfig, compiled
with GCC 11).

Current design (trace + unlock combined, with return):
  
    endbr64
    xchg %ax,%ax                     ; NOP (static branch)
    movb $0x0,(%rdi)                 ; unlock
    decl %gs:__preempt_count
    je   preempt
    jmp  __x86_return_thunk
    call queued_spin_release_traced  ; cold
    jmp  preempt_handling            ; cold
    call __SCT__preempt_schedule
    jmp  __x86_return_thunk

With the trace-only function (no return, unlock after the call):
  
    endbr64
    push %rbx                        ; saves callee-saved rbx (!)
    mov  %rdi,%rbx                   ; preserve lock across call (!)
    xchg %ax,%ax                     ; NOP (static branch)
    movb $0x0,(%rbx)                 ; unlock
    decl %gs:__preempt_count
    je   preempt
    pop  %rbx                        ; callee-saved restore (!)
    jmp  __x86_return_thunk
    call queued_spin_release_traced  ; cold
    jmp  unlock                      ; cold
    call __SCT__preempt_schedule
    pop  %rbx
    jmp  __x86_return_thunk

Three extra instructions marked by "!" on the hot path (push, mov, pop),
all wasted when the tracepoint is off. That's the main reason for
combining trace and unlock in the same out-of-line function.

Reply via email to