https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89584
Bug ID: 89584 Summary: CPU2000 degradations with r268448 (172.mgrid -22%, 252.eon -8%) Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: ipa Assignee: unassigned at gcc dot gnu.org Reporter: pthaugen at linux dot ibm.com CC: dje at gcc dot gnu.org, hubicka at gcc dot gnu.org, marxin at gcc dot gnu.org, rguenth at gcc dot gnu.org, segher at gcc dot gnu.org, wschmidt at gcc dot gnu.org Target Milestone: --- Host: powerpc64-unknown-linux-gnu Target: powerpc64-unknown-linux-gnu Build: powerpc64-unknown-linux-gnu Revision 268448 introduced the noted degradations. Compile flags are -m64 -O3 -mcpu=power7 -fpeel-loops -funroll-loops -ffast-math -mpopcntd -mrecip=all. I dug into the mgrid degradation further to have some more detail. The main difference appears to be that the last call to RESID() in the main function is now inlined. RESID() is actually cloned, and this call is to the clone, resid_.constprop.0. I'm not sure if this is another instance of losing RESTRICT on the parameters as seen in prior PRs (54497/55334 and 84737) or just a fact of inlining that specific call into an inner loop now creates too much register pressure and we spill too much (I suspect the latter). Following is a simple static instruction count comparison of the vectorized loop from resid_.constprop.0() and the same loop after inlining, note the obvious increase in load/store insns. Old = constprop.s New = constprop_inline.s INSTR Old New Change ----------- ----- ----- ------ addi - 1 29 28 bc - 1 1 0 cmpl - 1 1 0 ld - 0 17 17 lxvd2x - 19 33 14 ori - 0 5 5 stxvd2x - 1 15 14 xvadddp - 17 17 0 xvnmsubadp - 1 1 0 xvnmsubmdp - 3 3 0 xxlor - 3 2 -1 ----------- ----- ----- ----- load - 19 50 31 store - 1 15 14 total - 47 124 77