https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104698

            Bug ID: 104698
           Summary: Inefficient code for DI to TI sign extend on power10
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: meissner at gcc dot gnu.org
  Target Milestone: ---

On power10, signed conversion from DImode to TImode is inefficient for GCC 11
and the current GCC 12.  GCC 10 does not do this optimization.

On power10, GCC tries to generate the 'vextsd2q' instruction.  However, to
generate this instruction, it would typically generate a 'mtvsrsdd' instruction
to get the TImode value into an Altivec register in the bottom 64-bits, then it
does the vextsd2g instruction, and finally it generates 'mfvsrd' and 'mfvsrld'
instructions to get the value back into the GPR registers.

For power9, it generates a move instruction and then an arithmetic shift right
63 bits to fill the upper word with the copy of the sign bit.

GCC should generate the following code sequences:

1) For GPR register to GPR register: Move register, and 'sradi' to create the
sign bits in the upper word.

2) For GPR register to VSX register to Altivec register: Splat the value to
fill the bottom 64 bits, and then do 'vextsd2q'.

3) For memory to GPR register, load the value into the low register, and fill
the high register with the sign bit.

4) For memory to Altivec register, load the value with load VSX vector
rightmost doubleword, and then do 'vextsd2q'.

Reply via email to