https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104698
Bug ID: 104698 Summary: Inefficient code for DI to TI sign extend on power10 Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: meissner at gcc dot gnu.org Target Milestone: --- On power10, signed conversion from DImode to TImode is inefficient for GCC 11 and the current GCC 12. GCC 10 does not do this optimization. On power10, GCC tries to generate the 'vextsd2q' instruction. However, to generate this instruction, it would typically generate a 'mtvsrsdd' instruction to get the TImode value into an Altivec register in the bottom 64-bits, then it does the vextsd2g instruction, and finally it generates 'mfvsrd' and 'mfvsrld' instructions to get the value back into the GPR registers. For power9, it generates a move instruction and then an arithmetic shift right 63 bits to fill the upper word with the copy of the sign bit. GCC should generate the following code sequences: 1) For GPR register to GPR register: Move register, and 'sradi' to create the sign bits in the upper word. 2) For GPR register to VSX register to Altivec register: Splat the value to fill the bottom 64 bits, and then do 'vextsd2q'. 3) For memory to GPR register, load the value into the low register, and fill the high register with the sign bit. 4) For memory to Altivec register, load the value with load VSX vector rightmost doubleword, and then do 'vextsd2q'.