On Thu, Aug 20, 2015 at 6:58 AM, Pekka Paalanen <ppaala...@gmail.com> wrote:
> A thing that explains a great deal of these anomalies, but not all of it, > has > something to do with function addresses. There are hypotheses that it might > have to do with the branch predictor and its cache. We made a test > targeting > exactly that idea: pick a fast path function that seems to be most > susceptible > to unexpected changes, pad it with x nops before the function start and N-x > nops after the function end. We never execute those nops, but changing x > changes the function start address while keeping everything else in the > whole > binary in the same place. > > The results were mind-boggling: depending on the function starting > address, the > src_8888_8888 L1 test of lowlevel-blt-bench went either 355 Mpx/s or 470 > Mpx/s. > There does not seem to be any predictable pattern on which addresses are > "fast" > and which are "slow". Obviously this will screw up our benchmarks, because > a > change in an unrelated function may cause another function's address to > shift, > and therefore change its performance. See [1] for the plot. > > [1] The plot of alignment vs. performance > > https://git.collabora.com/cgit/user/pq/pixman-benchmarking.git/plain/octave/figures/fig-src-8888-8888-L1.pdf > Could this be whether some "bad" instruction ends up next to or split by a cache line boundary? That would produce a random-looking plot, though it really is a plot of the location of the bad instructions in the measured function. If this really is a problem then the ideal fix is for the compiler to insert NOP instructions in order to move the bad instructions away from the locations that make them bad. Yike.
_______________________________________________ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman