On 08/18/2016 07:24 PM, Linus Torvalds wrote:
That said, your numbers really aren't very convincing. If popf really
is just 10 cycles on modern Intel hardware, it's already fast enough
that I really don't think it matters.

It's 20 cycles. I was wrong in my email, I forgot that the insn count
also counts "push %ebx" insns.

Since I already made a mistake, let me double-check.

200 million iterations of this loop execute under 17 seconds:

  400100:       b8 00 c2 eb 0b          mov    $0xbebc200,%eax # 1000*1000*1000
  400105:       9c                      pushfq
  400106:       5b                      pop    %rbx
  400107:       90                      nop
....
0000000000400140 <loop>:
  400140:       53                      push   %rbx
  400141:       9d                      popfq
  400142:       53                      push   %rbx
  400143:       9d                      popfq
  400144:       53                      push   %rbx
  400145:       9d                      popfq
  400146:       53                      push   %rbx
  400147:       9d                      popfq
  400148:       53                      push   %rbx
  400149:       9d                      popfq
  40014a:       53                      push   %rbx
  40014b:       9d                      popfq
  40014c:       53                      push   %rbx
  40014d:       9d                      popfq
  40014e:       53                      push   %rbx
  40014f:       9d                      popfq
  400150:       53                      push   %rbx
  400151:       9d                      popfq
  400152:       53                      push   %rbx
  400153:       9d                      popfq
  400154:       53                      push   %rbx
  400155:       9d                      popfq
  400156:       53                      push   %rbx
  400157:       9d                      popfq
  400158:       53                      push   %rbx
  400159:       9d                      popfq
  40015a:       53                      push   %rbx
  40015b:       9d                      popfq
  40015c:       ff c8                   dec    %eax
  40015e:       75 e0                   jne    400140 <loop>

The loop is exactly 32 bytes, aligned.
There are 14 POPFs. Other insns are very fast.

No perf, just "time taskset 1 ./test".
My CPU frequency hovers around 3500 MHz when loaded.

17 seconds is 17*3500 million cycles.
17*3500 million cycles / 200*14 million cycles = 21.25

Thus, one POPF in CPL3 is ~20 cycles on Skylake.

Reply via email to