https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84114
--- Comment #8 from Wilco <wilco at gcc dot gnu.org> --- (In reply to Steve Ellcey from comment #6) > (In reply to Wilco from comment #5) > > (In reply to Steve Ellcey from comment #4) > > > While teaching the reassociation pass about fma's seems like the right > > > answer would it be reasonable (and simpler) to do the fma pass > > > (pass_optimize_widening_mul) before > > > the reassociation pass (pass_reassoc) to get the most fma's? > > > > > > That fixes my small test case but I haven't done a bigger performance > > > check > > > to see what the overall impact would be. > > > > I don't know what else that would affect since the reassociation phase runs > > very early - and it's late at this stage. My patch seems much safer. Even > > easier might be to return 1 for FLOAT_MODE PLUS_EXPR in > > aarch64_reassociation_width. Then we can fix the reassociation phase in > > GCC9. > > Moving the fma phase did not have a good performance impact (it was worse). So it looks like it's best to teach the reassociation phase about FMA for GCC9. > Your patch of setting the reassociation width to 1 did help performance on > ThunderX2. Can you let me know if my workaround helped? If useful I could backport it to GCC7 as well.