On 6/13/24 00:34, Surya Kumari Jangala wrote:
Hi Vladimir,
With my patch for PR111673 (scale the spill/restore cost of callee-save
register with the frequency of the entry bb in the routine assign_hard_reg() :
https://gcc.gnu.org/pipermail/gcc-patches/2023-October/631849.html), the
following Linaro aarch64 test failed due to an extra 'mov' instruction:

__SVBool_t __attribute__((noipa))
callee_pred (__SVBool_t p0, __SVBool_t p1, __SVBool_t p2, __SVBool_t p3,
              __SVBool_t mem0, __SVBool_t mem1)
{
   p0 = svbrkpa_z (p0, p1, p2);
   p0 = svbrkpb_z (p0, p3, mem0);
   return svbrka_z (p0, mem1);
}

With trunk:
         addvl   sp, sp, #-1
         str     p14, [sp]
         str     p15, [sp, #1, mul vl]
         ldr     p14, [x0]
         ldr     p15, [x1]
         brkpa   p0.b, p0/z, p1.b, p2.b
         brkpb   p0.b, p0/z, p3.b, p14.b
         brka    p0.b, p0/z, p15.b
         ldr     p14, [sp]
         ldr     p15, [sp, #1, mul vl]
         addvl   sp, sp, #1
         ret

With patch:
           addvl   sp, sp, #-1
         str     p14, [sp]
         str     p15, [sp, #1, mul vl]
         mov     p14.b, p3.b              // extra mov insn
         ldr     p15, [x0]
         ldr     p3, [x1]
         brkpa   p0.b, p0/z, p1.b, p2.b
         brkpb   p0.b, p0/z, p14.b, p15.b
         brka    p0.b, p0/z, p3.b
         ldr     p14, [sp]
         ldr     p15, [sp, #1, mul vl]
         addvl   sp, sp, #1
         ret

p0-p15 are predicate registers on aarch64 where p0-p3 are caller-save while
p4-p15 are callee-save.

The input RTL for ira pass:

1:   set r112, r68    #p0
2:   set r113, r69    #p1
3:   set r114, r70    #p2
4:   set r115, r71    #p3
5:   set r116, x0     #mem0, the 5th parameter
6:   set r108, mem(r116)
7:   set r117, x1     #mem1, the 6th parameter
8:   set r110, mem(r117)
9:   set r100, unspec_brkpa(r112, r113, r114)
10:  set r101, unspec_brkpb(r100, r115, r108)
11:  set r68,  unspec_brka(r101, r110)
12:  ret r68

Here, r68-r71 represent predicate hard regs p0-p3.
With my patch, r113 and r114 are being assigned memory by ira but with trunk 
they are
assigned registers. This in turn leads to a difference in decisions taken by LRA
ultimately leading to the extra mov instruction.

Register assignment w/ patch:

       Popping a5(r112,l0)  --         assign reg p0
       Popping a2(r100,l0)  --         assign reg p0
       Popping a0(r101,l0)  --         assign reg p0
       Popping a1(r110,l0)  --         assign reg p3
       Popping a3(r115,l0)  --         assign reg p2
       Popping a4(r108,l0)  --         assign reg p1
       Popping a6(r113,l0)  -- (memory is more profitable 8000 vs 9000) spill!
       Popping a7(r114,l0)  -- (memory is more profitable 8000 vs 9000) spill!
       Popping a8(r117,l0)  --         assign reg 1
       Popping a9(r116,l0)  --         assign reg 0


With patch, cost of memory is 8000 and it is lesser than the cost of callee-save
register (9000) and hence memory is assigned to r113 and r114. It is interesting
to see that all the callee-save registers are free but none is chosen.

The two instructions in which r113 is referenced are:
2:  set r113, r69 #p1
9:  set r100, unspec_brkpa(r112, r113, r114)

IRA computes the memory cost of an allocno in find_costs_and_classes(). In this 
routine
IRA scans each insn and computes memory cost and cost of register classes for 
each
operand in the insn.

So for insn 2, memory cost of r113 is set to 4000 because this is the cost of 
storing
r69 to memory if r113 is assigned to memory. The possible register classes of 
r113
are ALL_REGS, PR_REGS, PR_HI_REGS and PR_LO_REGS. The cost of moving r69
to r113 if r113 is assigned a register from each of the possible register 
classes is
computed. If r113 is assigned a reg in ALL_REGS, then the cost of the
move is 18000, while if r113 is assigned a register from any of the predicate 
register
classes, then the cost of the move is 2000. This cost is obtained from the array
“ira_register_move_cost”. After scanning insn 9, memory cost of r113
is increased to 8000 because if r113 is assigned memory, we need a load to read 
the
value before using it in the unspec_brkpa. But the register class cost is 
unchanged.

Later in setup_allocno_class_and_costs(), the ALLOCNO_CLASS of r113 is set to 
PR_REGS.
The ALLOCNO_MEMORY_COST of r113 is set to 8000.
The ALLOCNO_HARD_REG_COSTS of each register in PR_REGS is set to 2000.

During coloring, when r113 has to be assigned a register, the cost of 
callee-save
registers in PR_REGS is increased by the spill/restore cost. So the cost
of callee-save registers increases from 2000 to 9000. All the caller-save 
registers
have been assigned to other allocnos, so for r113 memory is assigned
as memory is cheaper than callee-save registers.

However, for r108, the cost is 0 for register classes PR_REGS, PR_HI_REGS and 
PR_LO_REGS.

References of r108:
6:   set r108, mem(r116)
10:  set r101, unspec_brkpb(r100, r115, r108)

It was surprising that while for r113, the cost of the predicate register 
classes
was 2000, for r108 the predicate register classes had a cost of 0.

After processing insn 6, the costs for r108 are:
op 0(r=108) MEM:4000(+4000) ALL_REGS:8000(+8000) PR_REGS:0(+0) PR_HI_REGS:0(+0) 
PR_LO_REGS:0(+0)

After processing insn 10:
op 3(r=108) MEM:8000(+4000) ALL_REGS:20000(+12000) PR_REGS:0(+0) 
PR_HI_REGS:0(+0) PR_LO_REGS:0(+0)


For insn 6, record_reg_classes() goes through each constraint for each operand.
For each alternative constraint, this routine computes costs of register classes
and memory cost for each pseudo register present in the insn. IRA computes 
register
class cost as the cost required to move value from/to a location of type 
specified
in the constraint to/from a register in the register class. IRA also calculates
the cost of using an alternative. After computing all the register class costs
for all the alternatives, IRA takes the minimum cost for each register class,
adds the alternative cost and assigns this to the pseudo r108.

For insn 6, the constraints are:
operand 1: Upa,m,Upa,
operand 2: Upa,Upa,m,

Here, Upa stands for predicate register and ‘m’ denotes memory.

For insn 6, the best alternative is Upa for operand 1 and m for operand 2. For 
this
alternative, the alternative cost is 0.
The costs for the register classes PR_REGS, PR_HI_REGS, PR_LO_REGS is 0 for
operand 1 (r108).
IRA uses the array “ira_may_move_out_cost” to compute the costs of the register
classes for r108 and this array has value 0 because the first index is a 
superset
of the second index. Here, first index is PR_REGS since the constraint is ‘Upa’
while second index is the predicate register class.

Why are different arrays being used? As we can see above, for r113 the cost of
the predicate register classes is 2000 because we are getting the value from
“ira_register_move_cost”. While for r108, the cost is obtained from
“ira_may_move_out_cost”. Why should r113 have a higher cost for predicate 
register
classes as compared to r108?

An interesting question.  The code originally was taken from old RA (particular regclass.c).  You can find it in any version of gcc before 4.4 version.  Since then it changed a lot for different reasons (mostly for solving different PRs).

I can not find where p113 can get cost from ira_register_move_cost.  The only possibility I see is

                     ...

                     else if (ira_reg_class_intersect
                               [pref_class][op_class] == NO_REGS)
                        alt_cost
                          += ira_register_move_cost[mode][pref_class][op_class];

It is when preferred class from the 1st iteration does not intersect with the operand class which looks ok for me.  May be I missed something.  Could you point me the place where p113 gets cost from ira_register_move_class.  Even better if you could me provide ira-dump (-fira-verbose=16 for the test case).

Thanks



Reply via email to