On Fri, Oct 26, 2018 at 01:47:19PM -0500, David Wright wrote:
On Fri 26 Oct 2018 at 11:04:48 (-0400), Michael Stone wrote:
FWIW, even the kernel doesn't use naive busy loops anymore on newer
hardware. (TSC or MWAIT is used, depending on what the processor
supports.)
I've programmed a "busy loop" in the past and found that linux
bogomips tracked the loop speed quite closely on a variety of machines
from 486DX to 650MHz Pentium III (Coppermine). Nothing multiprocessor.
When I say busy loop, I mean a loop like
FOR J=1 TO T
X=X+1
NEXT J
where X is floating point and the language is an HP Basic clone on MSDOS.
That's basically all bogomips is:
" test %0,%0 \n"
" jz 3f \n"
" jmp 1f \n"
".align 16 \n"
"1: jmp 2f \n"
".align 16 \n"
"2: dec %0 \n"
" jnz 2b \n"
"3: dec %0 \n"
in a loop. (From
https://github.com/torvalds/linux/blob/master/arch/x86/lib/delay.c)
(Note that other architectures are completely different and don't use
the x86 assembly--one more way the bogomips numbers are meaningless. :) )
For a given CPU family there's a de-facto multiplier (essentially, how
many instructions can be issued per cycle) and then within that family
bogomips is directly proportional to clock speed. None of that works
particularly well in the face of CPUs that change speed, and it's not
particularly efficient given current desires to minimizing power
consumption. So, current CPUs use the TSC (which, again for current
CPUs, increments at a constant rate regardless of reduced clock speeds
in power saving modes) possibly in concert with MWAIT (which lets the
CPU idle for a bit to save power). In those cases the loop frequency
calibration is just determining how the TSC counter relates to wall
clock time, and has nothing at all to do with CPU performance--even the
minimal "how fast can the CPU run these 5 instructions" 'benchmark' of
classic bogomips.