On 26.09.2011 17:43, Robert Jacques wrote:
Second, timing generally relies on the CPUs Time Stamp Counter, which is
not multi-thread safe; a core switch invalidates all previous TSC
values, and hence, the time measurement itself. Furthermore, the TSC is
not even guaranteed to have a fixed frequency on some CPUs. Now there
are ways around the problems of the TSC, but even so:

(From the Wikipedia)
"Under Windows platforms, Microsoft strongly discourages using the TSC
for high-resolution timing for exactly these reasons, providing instead
the Windows APIs QueryPerformanceCounter and
QueryPerformanceFrequency.[2] Even when using these functions, Microsoft
recommends the code to be locked to a single CPU."

std.benchmark uses QueryPerformanceCounter on Windows and
clock_gettime/gettimeofday on Unix.

Great, but MS still recommends benchmarking be done on a single core.
And if MS thinks that is how benchmarking should be done, I think that's
how we should do it.

I think that's quite misleading. Microsoft has made some assumptions in there about what you are measuring. QueryPerformanceCounter gives you the wall clock time. But, with modern CPUs (esp, core i7 Sandy Bridge, but also as far back as Pentium M) the CPU frequency isn't constant. So, the wall clock time depends both on the number of clock cycles required to execute the code, AND on the temperature of the CPU! If you run the same benchmark code enough times, it'll eventually get slower as the CPU heats up!

Personally I always use the TSC, and then discard any results where the TSC is inconsistent. Specifically I use the hardware performance counters since it gives you real information: code #1 executes N more instructions than code #2, it has B fewer branches, it has D more level 1 cache misses, etc. I've managed to get rock-solid data that way, but it takes a lot of work (eg, you have to make sure that your stack is aligned to 16 bytes). BUT this sort of thing is only relevant to really small sections of code (< 1 time slice).

If you're timing something which involves hardware other than the CPU (eg, database access, network access) then you really want the wall clock time. And if the section of code is long enough, then maybe you do care how much it heats up the CPU! But OTOH once you're in that regime, I don't think you care about processor affinity. In real life, you WILL get core transitions.

So I don't think it's anywhere near as simple as what Microsoft makes out. It's very important to be clear about what kind of stuff you intend to measure.

Reply via email to