On 14/10/2021 上午 8:12, Segher Boessenkool wrote:
On Wed, Sep 29, 2021 at 04:32:19PM +0800, HAO CHEN GUI wrote:
   The patch punishes reload of alternative pair of "d, Z" for
movsi_internal1. The reload occurs if 'Z' doesn't match and generates an
additional insn. So the memory reload should be punished.
As David says, why only for loads?  But also, why not for lxsiwzx (and
stxsiwx) as well?

But, what for all other uses of lfiwzx?  And lfiwax?

We need to find out why the register allocator considers it a good idea
to use FP regs here, and fix *that*?

Let me explain why it uses FP regs.

In ira pass,  the cost of general, float and altivec registers are all zero. So 
it prefers 'GEN_OR_VSX_REGS' class and is finally assigned register vs32. Not 
sure if the altivec registers are preferable for 'GEN_OR_VSX_REGS'.

    r120: preferred GEN_OR_VSX_REGS, alternative NO_REGS, allocno 
GEN_OR_VSX_REGS

  a0(r120,l0) costs: BASE_REGS:0,0 GENERAL_REGS:0,0 FLOAT_REGS:0,0 
ALTIVEC_REGS:0,0 VSX_REGS:4000,4000 GEN_OR_FLOAT_REGS:6000,6000 
GEN_OR_VSX_REGS:6000,6000 LINK_REGS:12000,12000 CTR_REGS:12000,12000 
LINK_OR_CTR_REGS:12000,12000 SPEC_OR_GEN_REGS:12000,12000 ALL_REGS:36000,36000 
MEM:8000,8000

      Allocno a0r120 of GEN_OR_VSX_REGS(93) has 93 avail. regs  0 3-12 14-95, 
node:  0 3-12 14-95 (confl regs =  1-2 13 96-110)
      Forming thread from colorable bucket:
      Pushing a0(r120,l0)(cost 0)
      Popping a0(r120,l0)  --         assign reg 32
Disposition:
    0:r120 l0    32

In reload pass, the third alternative pair is "r,m" and the forth pair is "d,Z".  As the 'r' doesn't match the class of reg vs32, it needs a output 
reload and got a "reject++" as well as a "losers". For pair "d,Z", the second operands 'Z' doesn't match. It needs a input reload and got a 
"losers". But the memory reload is not punished if there is no addition modifies with the alternative. So it only got a "addr_losers". Overall cost 
of "r,m" is great than "d,Z". So it picked up FP register and 'lfiwzx' instruction.

            0 Non input pseudo reload: reject++
          alt=2,overall=7,losers=1,rld_nregs=1
          alt=3,overall=6,losers=1,rld_nregs=0

Gui Haochen


The extra insn you talk about is because this insn only allows indexed
addressing ([reg+reg] or [reg] addressing).  That is true for very many
insns.  Reload (well, LRA in the modern world) should know about such
extra costs.  Does it not?

gcc/
         * gcc/config/rs6000/rs6000.md (movsi_internal1): disparages
         slightly the alternative 'Z' of "lfiwzx" when reload is
needed.
"Disparage", no "s".  Changelog entries are written in the imperative.


Segher

Reply via email to