On Thu, Aug 20, 2015 at 6:58 AM, Pekka Paalanen ppaala...@gmail.com wrote:
A thing that explains a great deal of these anomalies, but not all of it,
has
something to do with function addresses. There are hypotheses that it might
have to do with the branch predictor and its cache. We made a test
targeting
exactly that idea: pick a fast path function that seems to be most
susceptible
to unexpected changes, pad it with x nops before the function start and N-x
nops after the function end. We never execute those nops, but changing x
changes the function start address while keeping everything else in the
whole
binary in the same place.
The results were mind-boggling: depending on the function starting
address, the
src__ L1 test of lowlevel-blt-bench went either 355 Mpx/s or 470
Mpx/s.
There does not seem to be any predictable pattern on which addresses are
fast
and which are slow. Obviously this will screw up our benchmarks, because
a
change in an unrelated function may cause another function's address to
shift,
and therefore change its performance. See [1] for the plot.
[1] The plot of alignment vs. performance
https://git.collabora.com/cgit/user/pq/pixman-benchmarking.git/plain/octave/figures/fig-src---L1.pdf
Could this be whether some bad instruction ends up next to or split by a
cache line boundary? That would produce a random-looking plot, though it
really is a plot of the location of the bad instructions in the measured
function.
If this really is a problem then the ideal fix is for the compiler to
insert NOP instructions in order to move the bad instructions away from the
locations that make them bad. Yike.
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman