* Peter Zijlstra <pet...@infradead.org> wrote:

> On Wed, Apr 04, 2018 at 05:05:25PM -0700, Linus Torvalds wrote:
> > for some reason the test_bit() case looks like
> > this:
> > 
> >   #define test_bit(nr, addr)                      \
> >         (__builtin_constant_p((nr))             \
> >          ? constant_test_bit((nr), (addr))      \
> >          : variable_test_bit((nr), (addr)))
> > 
> > which is much more straightforward anyway. I'm not quite sure why we
> > did it that odd way anyway, but I bet it's just "hysterical raisins"
> > along with the test_bit() not needing inline asm at all for the
> > constant case.
> 
> I always assumed BT was a more expensive instruction than AND with
> immediate.

According to:

   http://www.agner.org/optimize/instruction_tables.pdf

The SkyLake costs for 'BT', 'AND' and 'TEST' variants are:

         Instruction        Operands      uops fused    uops unfused       uops 
port    latency throughput
                  BT           r,r/i               1               1            
 p06          1        0.5
                  BT             m,r              10              10            
                         5
                  BT             m,i               2               2         
p06 p23                   0.5
         BTR BTS BTC           r,r/i               1               1            
 p06          1        0.5
         BTR BTS BTC             m,r              10              11            
                         5
         BTR BTS BTC             m,i               3               4      p06 
p4 p23                     1
          AND OR XOR           r,r/i               1               1           
p0156          1       0.25
          AND OR XOR             r,m               1               2       
p0156 p23                   0.5
          AND OR XOR           m,r/i               2               4 2p0156 
2p237 p4          5          1
                TEST           r,r/i               1               1           
p0156          1       0.25
                TEST           m,r/i               1               2       
p0156 p23          1        0.5


So if I'm reading it right, the relevant comparison would be:

                  BT             m,i               2               2         
p06 p23                   0.5
          AND OR XOR           m,r/i               2               4 2p0156 
2p237 p4          5          1

?

Thanks,

        Ingo

Reply via email to