Re: std.benchmark is in reviewable state

Don Thu, 29 Sep 2011 05:51:24 -0700

On 26.09.2011 17:43, Robert Jacques wrote:

Second, timing generally relies on the CPUs Time Stamp Counter, which is
not multi-thread safe; a core switch invalidates all previous TSC
values, and hence, the time measurement itself. Furthermore, the TSC is
not even guaranteed to have a fixed frequency on some CPUs. Now there
are ways around the problems of the TSC, but even so:


(From the Wikipedia)
"Under Windows platforms, Microsoft strongly discourages using the TSC
for high-resolution timing for exactly these reasons, providing instead
the Windows APIs QueryPerformanceCounter and
QueryPerformanceFrequency.[2] Even when using these functions, Microsoft
recommends the code to be locked to a single CPU."


std.benchmark uses QueryPerformanceCounter on Windows and
clock_gettime/gettimeofday on Unix.


Great, but MS still recommends benchmarking be done on a single core.
And if MS thinks that is how benchmarking should be done, I think that's
how we should do it.

I think that's quite misleading. Microsoft has made some assumptions inthere about what you are measuring. QueryPerformanceCounter gives youthe wall clock time. But, with modern CPUs (esp, core i7 Sandy Bridge,but also as far back as Pentium M) the CPU frequency isn't constant.So, the wall clock time depends both on the number of clock cyclesrequired to execute the code, AND on the temperature of the CPU!If you run the same benchmark code enough times, it'll eventually getslower as the CPU heats up!

Personally I always use the TSC, and then discard any results where theTSC is inconsistent. Specifically I use the hardware performancecounters since it gives you real information: code #1 executes N moreinstructions than code #2, it has B fewer branches, it has D more level1 cache misses, etc. I've managed to get rock-solid data that way, butit takes a lot of work (eg, you have to make sure that your stack isaligned to 16 bytes).BUT this sort of thing is only relevant to really small sections of code(< 1 time slice).

If you're timing something which involves hardware other than the CPU(eg, database access, network access) then you really want the wallclock time. And if the section of code is long enough, then maybe you docare how much it heats up the CPU!But OTOH once you're in that regime, I don't think you care aboutprocessor affinity. In real life, you WILL get core transitions.

So I don't think it's anywhere near as simple as what Microsoft makesout. It's very important to be clear about what kind of stuff you intendto measure.

Re: std.benchmark is in reviewable state

Reply via email to