On 4/9/12 9:25 AM, Manfred Nowak wrote:
Andrei Alexandrescu wrote:
all noise is additive (there's no noise that may make a benchmark
appear to run faster)

This is in doubt, because you yourself wrote "the machine itself has
complex interactions". This complex interactions might lower the time
needed for an operation of the benchmarked program.

Examples that come to mind:
a) needed data is already in a (faster) cache because it belongs to a
memory block, from which some data is needed by some program not
belonging to the benchmarked set---and that block isnt replaced yet.

Which is great, unless the program wants to measure the cache memory itself, in which case it would use special assembler instructions or large memset()s. (We do such at Facebook.)

b) needed data is stored in a hdd whose I/O scheduler uses the elevator
algorithm and serves the request by pure chance instantly, because the
position of the needed data is between two positions accessed by some
programs not belonging to the benchmarked set.

Especially a hdd, if used, will be responsible for a lot of noise you
define as "quantization noise (uniform distribution)" even if the head
stays at the same cylinder. Not recognizing this noise would only mean
that the data is cached and interpreting the only true read from the
hdd as a jerky outlier sems quite wrong.

If the goal is to measure the seek time of the HDD, the benchmark itself should make sure the HDD cache is cleared. (What I recall they do on Linux is unmounting and remounting the drive.) Otherwise, it adds a useless component to the timing.

1) The "noise during normal use" has to be measured in order to
detect the sensibility of the benchmarked program to that noise.
How do you measure it, and what
conclusions do you draw other than there's a more or less other
stuff going on on the machine, and the machine itself has complex
interactions?

Far as I can tell a time measurement result is:

T = A + Q + N

For example by running more than one instance of the benchmarked
program in paralell and use the thereby gathered statistical routines
to spread T into the additiv components A, Q and N.

I disagree with running two benchmarks in parallel because that exposes them to even more noise (scheduling, CPU count, current machine load etc). I don't understand the part of the sentence starting with "...use the thereby...", I'd be grateful if you elaborated.


Andrei

Reply via email to