Hi,
On 13.6.2025 4.36, Finn Thain wrote:
And therein lies the rub -- to identify those workloads which should be
measured and to afford each one a suitable weight in your decision making.
It's not just workload affecting the results; compiler version,
optimization options [1], workload & kernel config options and sometimes
even unrelated code changes [2], can affect how given instruction
sequence settles into cache.
> That's why this was always political.
I'd rather keep things technical and fact-based.
Whatever testing is done, the one wider conclusion that *can* be drawn
from it, is that if there's a noticeable performance difference, such
differences are possible also in other workloads.
(Very large difference could indicate also functional issues, e.g. bug
in given compiler build code generation. That's why it's important to
have good tooling for pinpointing what exactly is causing the difference.)
- Eero
[1] One example is -Os vs. -O2 having 2x perf impact on Geert's
experimental Atari drm fb code. That would completely hide any impact
from alignment.
[2] with more complex cache hierarchies than on m68k, adding or removing
code elsewhere can impact cache line alignment on other parts of the
resulting binary. Not a concern for m68k though.