From: Peter Zijlstra
> Sent: 30 October 2020 23:02
> 
> On Fri, Oct 30, 2020 at 04:22:48PM -0400, Steven Rostedt wrote:
> > As this is something that ftrace recursion also does, perhaps we should
> > move this into interrupt.h so that anyone that needs a counter can get
> > it quickly, and not keep re-implementing it.
> 
> Works for me, however:
> 
> > /*
> >  * Quickly find what context you are in.
> >  * 0 - normal
> >  * 1 - softirq
> >  * 2 - hard interrupt
> >  * 3 - NMI
> >  */
> > static inline int irq_context()
> > {
> >     unsigned int pc = preempt_count();
> >     int rctx = 0;
> 
> unsigned
> 
> >
> >     if (pc & (NMI_MASK))
> >             rctx++;
> >     if (pc & (NMI_MASK | HARDIRQ_MASK))
> >             rctx++;
> >     if (pc & (NMI_MASK | HARDIRQ_MASK | SOFTIRQ_OFFSET))
> >             rctx++;
> >
> >     return rctx;
> > }
> 
> otherwise you'll get an extra instruction to sign extend it, which is
> daft (yes, i've been staring at GCC output far too much).
> 
> Also, gcc-9 does worse (like 1 byte iirc) with:
> 
>       rctx += !!(pc & (NMI_MASK));
>       rctx += !!(pc & (NMI_MASK | HARDIRQ_MASK));
>       rctx += !!(pc & (NMI_MASK | HARDIRQ_MASK | SOFTIRQ_OFFSET));
> 
> but gcc-10 doesn't seem to care.

You've made be look at some gcc output (it's raining).

The gcc 7.5.0 I have handy probably generates the best code for:

unsigned char q_2(unsigned int pc)
{
        unsigned char rctx = 0;

        rctx += !!(pc & (NMI_MASK));
        rctx += !!(pc & (NMI_MASK | HARDIRQ_MASK));
        rctx += !!(pc & (NMI_MASK | HARDIRQ_MASK | SOFTIRQ_OFFSET));

        return rctx;
}

0000000000000000 <q_2>:
   0:   f7 c7 00 00 f0 00       test   $0xf00000,%edi     # clock 0
   6:   0f 95 c0                setne  %al                # clock 1
   9:   f7 c7 00 00 ff 00       test   $0xff0000,%edi     # clock 0
   f:   0f 95 c2                setne  %dl                # clock 1
  12:   01 c2                   add    %eax,%edx          # clock 2
  14:   81 e7 00 01 ff 00       and    $0xff0100,%edi
  1a:   0f 95 c0                setne  %al
  1d:   01 d0                   add    %edx,%eax          # clock 3
  1f:   c3                      retq

I doubt that is beatable.

I've annotated the register dependency chain.
Likely to be 3 (or maybe 4) clocks.
The other versions are a lot worse (7 or 8) without allowing
for 'sbb' taking 2 clocks on a lot of Intel cpus.

        David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)

Reply via email to