* Peter Zijlstra <pet...@infradead.org> wrote: > On Wed, Apr 04, 2018 at 05:05:25PM -0700, Linus Torvalds wrote: > > for some reason the test_bit() case looks like > > this: > > > > #define test_bit(nr, addr) \ > > (__builtin_constant_p((nr)) \ > > ? constant_test_bit((nr), (addr)) \ > > : variable_test_bit((nr), (addr))) > > > > which is much more straightforward anyway. I'm not quite sure why we > > did it that odd way anyway, but I bet it's just "hysterical raisins" > > along with the test_bit() not needing inline asm at all for the > > constant case. > > I always assumed BT was a more expensive instruction than AND with > immediate.
According to: http://www.agner.org/optimize/instruction_tables.pdf The SkyLake costs for 'BT', 'AND' and 'TEST' variants are: Instruction Operands uops fused uops unfused uops port latency throughput BT r,r/i 1 1 p06 1 0.5 BT m,r 10 10 5 BT m,i 2 2 p06 p23 0.5 BTR BTS BTC r,r/i 1 1 p06 1 0.5 BTR BTS BTC m,r 10 11 5 BTR BTS BTC m,i 3 4 p06 p4 p23 1 AND OR XOR r,r/i 1 1 p0156 1 0.25 AND OR XOR r,m 1 2 p0156 p23 0.5 AND OR XOR m,r/i 2 4 2p0156 2p237 p4 5 1 TEST r,r/i 1 1 p0156 1 0.25 TEST m,r/i 1 2 p0156 p23 1 0.5 So if I'm reading it right, the relevant comparison would be: BT m,i 2 2 p06 p23 0.5 AND OR XOR m,r/i 2 4 2p0156 2p237 p4 5 1 ? Thanks, Ingo