Of course, somebody really should do timings on modern CPU's (in cpl0,
comparing native_fl() that enables interrupts with a popf)

I didn't do CPL0 tests yet. Realized that cli/sti can be tested in userspace
if we set iopl(3) first.

Surprisingly, STI is slower than CLI. A loop with 27 CLI's and one STI
converges to about ~0.5 insn/cycle:

# compile with: gcc -nostartfiles -nostdlib
_start:         .globl  _start
                mov     $172, %eax #iopl
                mov     $3, %edi
                syscall
                mov     $200*1000*1000, %eax
                .balign 64
loop:
                cli;cli;cli;cli
                cli;cli;cli;cli
                cli;cli;cli;cli
                cli;cli;cli;cli

                cli;cli;cli;cli
                cli;cli;cli;cli
                cli;cli;cli;sti
                dec     %eax
                jnz     loop

                mov     $231, %eax #exit_group
                syscall

perf stat:
     6,015,787,968      instructions              #    0.52  insn per cycle
       3.355474199 seconds time elapsed

With all CLIs replaced by STIs, it's ~0.25 insn/cycle:

     6,030,530,328      instructions              #    0.27  insn per cycle
       6.547200322 seconds time elapsed


POPF which needs to enable interrupts is not measurably faster than
one which does not change .IF:

Loop with:
  400158:       fa                      cli
  400159:       53                      push   %rbx  #saved eflags with if=1
  40015a:       9d                      popfq
shows:
     8,908,857,324      instructions              #    0.11  insn per cycle     
      ( +-  0.00% )

Loop with:
  400140:       fb                      sti
  400141:       53                      push   %rbx
  400142:       9d                      popfq
shows:
     8,920,243,701      instructions              #    0.10  insn per cycle     
      ( +-  0.01% )

Even loop with neither CLI nor STI, only with POPF:
  400140:       53                      push   %rbx
  400141:       9d                      popfq
shows:
     6,079,936,714      instructions              #    0.10  insn per cycle     
      ( +-  0.00% )

This is on a Skylake CPU.


The gist of it:
CLI is 2 cycles,
STI is 4 cycles,
POPF is 10 cycles
seemingly regardless of prior value of EFLAGS.IF.

Reply via email to