https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059
Michael Meissner <meissner at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |meissner at gcc dot gnu.org --- Comment #19 from Michael Meissner <meissner at gcc dot gnu.org> --- The main power8 fusion that GCC does is combining: addis rtmp,r0,symbol@hi(r2) ld/lbz/lwz rx,symbol@lo(rtmp) into: addis rx,symbol@hi(r2) ld/lbz/lwz rx,symbol@lo(rx) This fusion is listed as one of the fusion types in the power10 documents. The fusion type is wideimmediate. Note, when you are compiling for -mcpu=power10, this fusion case doesn't often get used because we use PC-relative loads. But the machine does support it. In addition, it combines loads to a traditional floating point register, and then a move to a traditional Altivec register. Similarly, it will combine a move from a traditional Altivec register to a traditional floating point register, and then a store: lfd fy,32(rx) xxlor fy,vsrx xxlor vsrz,fy,fy stfd fy,32(rz) into: li rtmp,32 li rtmp,32 lxdx vsrz,2,rtmp stxdx vsrx.rz.rtmp Now on power9 and power10, this sequence is not generated because we have the lxsd and stxsd instructions (and plxsd/pstxsd in power10). So I suspect, we may want to move the p8 load fusion case support to fusion.md, and do it for power10 as well. Aaron Sawdey may have other thoughts, since he has been working on the power10 fusion support, and knows more what is actually implemented in current hardware. Then for inlining, we may want to exclude p8_fusion and p10_fusion in the comparison in rs6000_can_inline_p, since these are optimizations that don't affect the instructions generated. Note, there were so-called power9 fusion code that was originally in the power9 spec, but was not implemented in the hardware. I removed support for these in November 2018.