Hi Vladimir,

On 14/06/24 10:56 pm, Vladimir Makarov wrote:
> 
> On 6/13/24 00:34, Surya Kumari Jangala wrote:
>> Hi Vladimir,
>> With my patch for PR111673 (scale the spill/restore cost of callee-save
>> register with the frequency of the entry bb in the routine assign_hard_reg() 
>> :
>> https://gcc.gnu.org/pipermail/gcc-patches/2023-October/631849.html), the
>> following Linaro aarch64 test failed due to an extra 'mov' instruction:
>>
>> __SVBool_t __attribute__((noipa))
>> callee_pred (__SVBool_t p0, __SVBool_t p1, __SVBool_t p2, __SVBool_t p3,
>>               __SVBool_t mem0, __SVBool_t mem1)
>> {
>>    p0 = svbrkpa_z (p0, p1, p2);
>>    p0 = svbrkpb_z (p0, p3, mem0);
>>    return svbrka_z (p0, mem1);
>> }
>>
>> With trunk:
>>          addvl   sp, sp, #-1
>>          str     p14, [sp]
>>          str     p15, [sp, #1, mul vl]
>>          ldr     p14, [x0]
>>          ldr     p15, [x1]
>>          brkpa   p0.b, p0/z, p1.b, p2.b
>>          brkpb   p0.b, p0/z, p3.b, p14.b
>>          brka    p0.b, p0/z, p15.b
>>          ldr     p14, [sp]
>>          ldr     p15, [sp, #1, mul vl]
>>          addvl   sp, sp, #1
>>          ret
>>
>> With patch:
>>        addvl   sp, sp, #-1
>>          str     p14, [sp]
>>          str     p15, [sp, #1, mul vl]
>>          mov     p14.b, p3.b              // extra mov insn
>>          ldr     p15, [x0]
>>          ldr     p3, [x1]
>>          brkpa   p0.b, p0/z, p1.b, p2.b
>>          brkpb   p0.b, p0/z, p14.b, p15.b
>>          brka    p0.b, p0/z, p3.b
>>          ldr     p14, [sp]
>>          ldr     p15, [sp, #1, mul vl]
>>          addvl   sp, sp, #1
>>          ret
>>
>> p0-p15 are predicate registers on aarch64 where p0-p3 are caller-save while
>> p4-p15 are callee-save.
>>
>> The input RTL for ira pass:
>>
>> 1:   set r112, r68    #p0
>> 2:   set r113, r69    #p1
>> 3:   set r114, r70    #p2
>> 4:   set r115, r71    #p3
>> 5:   set r116, x0     #mem0, the 5th parameter
>> 6:   set r108, mem(r116)
>> 7:   set r117, x1     #mem1, the 6th parameter
>> 8:   set r110, mem(r117)
>> 9:   set r100, unspec_brkpa(r112, r113, r114)
>> 10:  set r101, unspec_brkpb(r100, r115, r108)
>> 11:  set r68,  unspec_brka(r101, r110)
>> 12:  ret r68
>>
>> Here, r68-r71 represent predicate hard regs p0-p3.
>> With my patch, r113 and r114 are being assigned memory by ira but with trunk 
>> they are
>> assigned registers. This in turn leads to a difference in decisions taken by 
>> LRA
>> ultimately leading to the extra mov instruction.
>>
>> Register assignment w/ patch:
>>
>>        Popping a5(r112,l0)  --         assign reg p0
>>        Popping a2(r100,l0)  --         assign reg p0
>>        Popping a0(r101,l0)  --         assign reg p0
>>        Popping a1(r110,l0)  --         assign reg p3
>>        Popping a3(r115,l0)  --         assign reg p2
>>        Popping a4(r108,l0)  --         assign reg p1
>>        Popping a6(r113,l0)  -- (memory is more profitable 8000 vs 9000) 
>> spill!
>>        Popping a7(r114,l0)  -- (memory is more profitable 8000 vs 9000) 
>> spill!
>>        Popping a8(r117,l0)  --         assign reg 1
>>        Popping a9(r116,l0)  --         assign reg 0
>>
>>
>> With patch, cost of memory is 8000 and it is lesser than the cost of 
>> callee-save
>> register (9000) and hence memory is assigned to r113 and r114. It is 
>> interesting
>> to see that all the callee-save registers are free but none is chosen.
>>
>> The two instructions in which r113 is referenced are:
>> 2:  set r113, r69 #p1
>> 9:  set r100, unspec_brkpa(r112, r113, r114)
>>
>> IRA computes the memory cost of an allocno in find_costs_and_classes(). In 
>> this routine
>> IRA scans each insn and computes memory cost and cost of register classes 
>> for each
>> operand in the insn.
>>
>> So for insn 2, memory cost of r113 is set to 4000 because this is the cost 
>> of storing
>> r69 to memory if r113 is assigned to memory. The possible register classes 
>> of r113
>> are ALL_REGS, PR_REGS, PR_HI_REGS and PR_LO_REGS. The cost of moving r69
>> to r113 if r113 is assigned a register from each of the possible register 
>> classes is
>> computed. If r113 is assigned a reg in ALL_REGS, then the cost of the
>> move is 18000, while if r113 is assigned a register from any of the 
>> predicate register
>> classes, then the cost of the move is 2000. This cost is obtained from the 
>> array
>> “ira_register_move_cost”. After scanning insn 9, memory cost of r113
>> is increased to 8000 because if r113 is assigned memory, we need a load to 
>> read the
>> value before using it in the unspec_brkpa. But the register class cost is 
>> unchanged.
>>
>> Later in setup_allocno_class_and_costs(), the ALLOCNO_CLASS of r113 is set 
>> to PR_REGS.
>> The ALLOCNO_MEMORY_COST of r113 is set to 8000.
>> The ALLOCNO_HARD_REG_COSTS of each register in PR_REGS is set to 2000.
>>
>> During coloring, when r113 has to be assigned a register, the cost of 
>> callee-save
>> registers in PR_REGS is increased by the spill/restore cost. So the cost
>> of callee-save registers increases from 2000 to 9000. All the caller-save 
>> registers
>> have been assigned to other allocnos, so for r113 memory is assigned
>> as memory is cheaper than callee-save registers.
>>
>> However, for r108, the cost is 0 for register classes PR_REGS, PR_HI_REGS 
>> and PR_LO_REGS.
>>
>> References of r108:
>> 6:   set r108, mem(r116)
>> 10:  set r101, unspec_brkpb(r100, r115, r108)
>>
>> It was surprising that while for r113, the cost of the predicate register 
>> classes
>> was 2000, for r108 the predicate register classes had a cost of 0.
>>
>> After processing insn 6, the costs for r108 are:
>> op 0(r=108) MEM:4000(+4000) ALL_REGS:8000(+8000) PR_REGS:0(+0) 
>> PR_HI_REGS:0(+0) PR_LO_REGS:0(+0)
>>
>> After processing insn 10:
>> op 3(r=108) MEM:8000(+4000) ALL_REGS:20000(+12000) PR_REGS:0(+0) 
>> PR_HI_REGS:0(+0) PR_LO_REGS:0(+0)
>>
>>
>> For insn 6, record_reg_classes() goes through each constraint for each 
>> operand.
>> For each alternative constraint, this routine computes costs of register 
>> classes
>> and memory cost for each pseudo register present in the insn. IRA computes 
>> register
>> class cost as the cost required to move value from/to a location of type 
>> specified
>> in the constraint to/from a register in the register class. IRA also 
>> calculates
>> the cost of using an alternative. After computing all the register class 
>> costs
>> for all the alternatives, IRA takes the minimum cost for each register class,
>> adds the alternative cost and assigns this to the pseudo r108.
>>
>> For insn 6, the constraints are:
>> operand 1: Upa,m,Upa,
>> operand 2: Upa,Upa,m,
>>
>> Here, Upa stands for predicate register and ‘m’ denotes memory.
>>
>> For insn 6, the best alternative is Upa for operand 1 and m for operand 2. 
>> For this
>> alternative, the alternative cost is 0.
>> The costs for the register classes PR_REGS, PR_HI_REGS, PR_LO_REGS is 0 for
>> operand 1 (r108).
>> IRA uses the array “ira_may_move_out_cost” to compute the costs of the 
>> register
>> classes for r108 and this array has value 0 because the first index is a 
>> superset
>> of the second index. Here, first index is PR_REGS since the constraint is 
>> ‘Upa’
>> while second index is the predicate register class.
>>
>> Why are different arrays being used? As we can see above, for r113 the cost 
>> of
>> the predicate register classes is 2000 because we are getting the value from
>> “ira_register_move_cost”. While for r108, the cost is obtained from
>> “ira_may_move_out_cost”. Why should r113 have a higher cost for predicate 
>> register
>> classes as compared to r108?
> 


> An interesting question.  The code originally was taken from old RA 
> (particular regclass.c).  You can find it in any version of gcc before 4.4 
> version.  Since then it changed a lot for different reasons (mostly for 
> solving different PRs).
> 
> I can not find where p113 can get cost from ira_register_move_cost.  The only 
> possibility I see is
> 
>                      ...
> 
>                      else if (ira_reg_class_intersect
>                                [pref_class][op_class] == NO_REGS)
>                         alt_cost
>                           += 
> ira_register_move_cost[mode][pref_class][op_class];
> 
> It is when preferred class from the 1st iteration does not intersect with the 
> operand class which looks ok for me.  May be I missed something.  Could you 
> point me the place where p113 gets cost from ira_register_move_class.  Even 
> better if you could me provide ira-dump (-fira-verbose=16 for the test case).
> 

p113 gets the cost from 'ira_register_move_cost' in the routine 
record_operand_costs(). In this routine, if we have a 'set' insn where both 
destination and source are registers, and one of them is a hard register and 
the other is a pseudo register, then the costs of the register classes of the 
pseudo register is obtained from the array 'ira_register_move_cost'. 
The following insn meets the criteria (r69 is a hard register)
2:  set r113, r69 #p1

Here is the code snippet:

        move_costs = ira_register_move_cost[cost_mode];
        ...
        for (k = cost_classes_ptr->num - 1; k >= 0; k--)
            {
              rclass = cost_classes[k];
              cost = (i == 0
                      ? move_costs[hard_reg_class][rclass]
                      : move_costs[rclass][hard_reg_class]);
              cost *= cost_factor;
              op_costs[i]->cost[k] = cost * frequency;
              ...
            }


Regards,
Surya

Reply via email to