On Thu, May 14, 2026 at 12:03:48PM -0400, Steven Rostedt wrote:
> On Thu, 14 May 2026 14:13:35 +0000
> Dmitry Ilvokhin <[email protected]> wrote:
> 
> > > > +void __lockfunc queued_spin_release_traced(struct qspinlock *lock)
> > > > +{
> > > > +       if (queued_spin_is_contended(lock))
> > > > +               trace_call__contended_release(lock);
> > > > +       queued_spin_release(lock);  
> > > 
> > > And then remove the duplicate call of "queued_spin_release()" here.  
> > 
> > This is the scenario the comment above the static branch describes.
> > Here's what it looks like in practice on x86_64 (defconfig, compiled
> > with GCC 11).
> > 
> > Current design (trace + unlock combined, with return):
> >   
> >     endbr64
> >     xchg %ax,%ax                     ; NOP (static branch)
> >     movb $0x0,(%rdi)                 ; unlock
> >     decl %gs:__preempt_count
> >     je   preempt
> >     jmp  __x86_return_thunk
> >     call queued_spin_release_traced  ; cold
> >     jmp  preempt_handling            ; cold
> >     call __SCT__preempt_schedule
> >     jmp  __x86_return_thunk
> > 
> > With the trace-only function (no return, unlock after the call):
> >   
> >     endbr64
> >     push %rbx                        ; saves callee-saved rbx (!)
> >     mov  %rdi,%rbx                   ; preserve lock across call (!)
> >     xchg %ax,%ax                     ; NOP (static branch)
> >     movb $0x0,(%rbx)                 ; unlock
> >     decl %gs:__preempt_count
> >     je   preempt
> >     pop  %rbx                        ; callee-saved restore (!)
> >     jmp  __x86_return_thunk
> >     call queued_spin_release_traced  ; cold
> >     jmp  unlock                      ; cold
> >     call __SCT__preempt_schedule
> >     pop  %rbx
> >     jmp  __x86_return_thunk
> > 
> > Three extra instructions marked by "!" on the hot path (push, mov, pop),
> > all wasted when the tracepoint is off. That's the main reason for
> > combining trace and unlock in the same out-of-line function.
> 
> Ah, because the return makes it into two tail calls.
> 
> I still don't like the duplication, perhaps add some more comments about
> needing to update the other location if anything changes here? And perhaps
> comment that this duplicate code helps the assembly.

My idea was that queued_spin_release() serves the same role that the old
queued_spin_unlock() had: a pure lock-release primitive without tracing.
That was the primary motivation for extracting queued_spin_release() in
the first place (it is just one line of code), so the common release
logic between the traced and non-traced paths is shared explicitly
rather than duplicated semantically.

I agree that this could be explained better. I'll add more comments
there to clarify the rationale. Thanks for the suggestion, Steve.

> 
> -- Steve
> 

Reply via email to