https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77468
--- Comment #27 from James Greenhalgh <jgreenhalgh at gcc dot gnu.org> --- (In reply to PeteVine from comment #25) > The original issue never mentioned -Ofast or -ffast-math and I see no > difference at -Ofast, indeed: > > http://openbenchmarking.org/result/1702153-RI-CRAYFAST424 > > @jgreenhalgh Can you confirm there's no regression @ -O3 as well? Thanks. I'm getting confused here by the number of bugs trying to multiplex on the same report. Aldy's reproducer instructions are with -ffast-math, and that's what I've used in my analysis. It seems you are more interested in a run without -ffast-math, which is presumably a separate issue. There is a useful lesson here for bug reporting. Upload a preprocessed file, with an explicit set of flags, and the explicit configuration of the compiler you are using. Having a solid description of the input you are concerned about is more useful to me than annotated output - I can generate that myself! Don't assume that because you have easy access to a file on the internet I also have easy access to it, the best way to get code in front of me is with a preprocessed file. I'll take a look at what causes the performance difference between -mcpu=cortex-a53 and -mcpu=thunderx. I'm now using the flags: -O3 -mcpu=cortex-a53 -lpthread -lm -fomit-frame-pointer -fipa-pta -march=armv8-a+crc and -O3 -mcpu=thunderx -lpthread -lm -fomit-frame-pointer -fipa-pta -march=armv8-a+crc