Hi Vladimir, On 14/06/24 10:56 pm, Vladimir Makarov wrote: > > On 6/13/24 00:34, Surya Kumari Jangala wrote: >> Hi Vladimir, >> With my patch for PR111673 (scale the spill/restore cost of callee-save >> register with the frequency of the entry bb in the routine assign_hard_reg() >> : >> https://gcc.gnu.org/pipermail/gcc-patches/2023-October/631849.html), the >> following Linaro aarch64 test failed due to an extra 'mov' instruction: >> >> __SVBool_t __attribute__((noipa)) >> callee_pred (__SVBool_t p0, __SVBool_t p1, __SVBool_t p2, __SVBool_t p3, >> __SVBool_t mem0, __SVBool_t mem1) >> { >> p0 = svbrkpa_z (p0, p1, p2); >> p0 = svbrkpb_z (p0, p3, mem0); >> return svbrka_z (p0, mem1); >> } >> >> With trunk: >> addvl sp, sp, #-1 >> str p14, [sp] >> str p15, [sp, #1, mul vl] >> ldr p14, [x0] >> ldr p15, [x1] >> brkpa p0.b, p0/z, p1.b, p2.b >> brkpb p0.b, p0/z, p3.b, p14.b >> brka p0.b, p0/z, p15.b >> ldr p14, [sp] >> ldr p15, [sp, #1, mul vl] >> addvl sp, sp, #1 >> ret >> >> With patch: >> addvl sp, sp, #-1 >> str p14, [sp] >> str p15, [sp, #1, mul vl] >> mov p14.b, p3.b // extra mov insn >> ldr p15, [x0] >> ldr p3, [x1] >> brkpa p0.b, p0/z, p1.b, p2.b >> brkpb p0.b, p0/z, p14.b, p15.b >> brka p0.b, p0/z, p3.b >> ldr p14, [sp] >> ldr p15, [sp, #1, mul vl] >> addvl sp, sp, #1 >> ret >> >> p0-p15 are predicate registers on aarch64 where p0-p3 are caller-save while >> p4-p15 are callee-save. >> >> The input RTL for ira pass: >> >> 1: set r112, r68 #p0 >> 2: set r113, r69 #p1 >> 3: set r114, r70 #p2 >> 4: set r115, r71 #p3 >> 5: set r116, x0 #mem0, the 5th parameter >> 6: set r108, mem(r116) >> 7: set r117, x1 #mem1, the 6th parameter >> 8: set r110, mem(r117) >> 9: set r100, unspec_brkpa(r112, r113, r114) >> 10: set r101, unspec_brkpb(r100, r115, r108) >> 11: set r68, unspec_brka(r101, r110) >> 12: ret r68 >> >> Here, r68-r71 represent predicate hard regs p0-p3. >> With my patch, r113 and r114 are being assigned memory by ira but with trunk >> they are >> assigned registers. This in turn leads to a difference in decisions taken by >> LRA >> ultimately leading to the extra mov instruction. >> >> Register assignment w/ patch: >> >> Popping a5(r112,l0) -- assign reg p0 >> Popping a2(r100,l0) -- assign reg p0 >> Popping a0(r101,l0) -- assign reg p0 >> Popping a1(r110,l0) -- assign reg p3 >> Popping a3(r115,l0) -- assign reg p2 >> Popping a4(r108,l0) -- assign reg p1 >> Popping a6(r113,l0) -- (memory is more profitable 8000 vs 9000) >> spill! >> Popping a7(r114,l0) -- (memory is more profitable 8000 vs 9000) >> spill! >> Popping a8(r117,l0) -- assign reg 1 >> Popping a9(r116,l0) -- assign reg 0 >> >> >> With patch, cost of memory is 8000 and it is lesser than the cost of >> callee-save >> register (9000) and hence memory is assigned to r113 and r114. It is >> interesting >> to see that all the callee-save registers are free but none is chosen. >> >> The two instructions in which r113 is referenced are: >> 2: set r113, r69 #p1 >> 9: set r100, unspec_brkpa(r112, r113, r114) >> >> IRA computes the memory cost of an allocno in find_costs_and_classes(). In >> this routine >> IRA scans each insn and computes memory cost and cost of register classes >> for each >> operand in the insn. >> >> So for insn 2, memory cost of r113 is set to 4000 because this is the cost >> of storing >> r69 to memory if r113 is assigned to memory. The possible register classes >> of r113 >> are ALL_REGS, PR_REGS, PR_HI_REGS and PR_LO_REGS. The cost of moving r69 >> to r113 if r113 is assigned a register from each of the possible register >> classes is >> computed. If r113 is assigned a reg in ALL_REGS, then the cost of the >> move is 18000, while if r113 is assigned a register from any of the >> predicate register >> classes, then the cost of the move is 2000. This cost is obtained from the >> array >> “ira_register_move_cost”. After scanning insn 9, memory cost of r113 >> is increased to 8000 because if r113 is assigned memory, we need a load to >> read the >> value before using it in the unspec_brkpa. But the register class cost is >> unchanged. >> >> Later in setup_allocno_class_and_costs(), the ALLOCNO_CLASS of r113 is set >> to PR_REGS. >> The ALLOCNO_MEMORY_COST of r113 is set to 8000. >> The ALLOCNO_HARD_REG_COSTS of each register in PR_REGS is set to 2000. >> >> During coloring, when r113 has to be assigned a register, the cost of >> callee-save >> registers in PR_REGS is increased by the spill/restore cost. So the cost >> of callee-save registers increases from 2000 to 9000. All the caller-save >> registers >> have been assigned to other allocnos, so for r113 memory is assigned >> as memory is cheaper than callee-save registers. >> >> However, for r108, the cost is 0 for register classes PR_REGS, PR_HI_REGS >> and PR_LO_REGS. >> >> References of r108: >> 6: set r108, mem(r116) >> 10: set r101, unspec_brkpb(r100, r115, r108) >> >> It was surprising that while for r113, the cost of the predicate register >> classes >> was 2000, for r108 the predicate register classes had a cost of 0. >> >> After processing insn 6, the costs for r108 are: >> op 0(r=108) MEM:4000(+4000) ALL_REGS:8000(+8000) PR_REGS:0(+0) >> PR_HI_REGS:0(+0) PR_LO_REGS:0(+0) >> >> After processing insn 10: >> op 3(r=108) MEM:8000(+4000) ALL_REGS:20000(+12000) PR_REGS:0(+0) >> PR_HI_REGS:0(+0) PR_LO_REGS:0(+0) >> >> >> For insn 6, record_reg_classes() goes through each constraint for each >> operand. >> For each alternative constraint, this routine computes costs of register >> classes >> and memory cost for each pseudo register present in the insn. IRA computes >> register >> class cost as the cost required to move value from/to a location of type >> specified >> in the constraint to/from a register in the register class. IRA also >> calculates >> the cost of using an alternative. After computing all the register class >> costs >> for all the alternatives, IRA takes the minimum cost for each register class, >> adds the alternative cost and assigns this to the pseudo r108. >> >> For insn 6, the constraints are: >> operand 1: Upa,m,Upa, >> operand 2: Upa,Upa,m, >> >> Here, Upa stands for predicate register and ‘m’ denotes memory. >> >> For insn 6, the best alternative is Upa for operand 1 and m for operand 2. >> For this >> alternative, the alternative cost is 0. >> The costs for the register classes PR_REGS, PR_HI_REGS, PR_LO_REGS is 0 for >> operand 1 (r108). >> IRA uses the array “ira_may_move_out_cost” to compute the costs of the >> register >> classes for r108 and this array has value 0 because the first index is a >> superset >> of the second index. Here, first index is PR_REGS since the constraint is >> ‘Upa’ >> while second index is the predicate register class. >> >> Why are different arrays being used? As we can see above, for r113 the cost >> of >> the predicate register classes is 2000 because we are getting the value from >> “ira_register_move_cost”. While for r108, the cost is obtained from >> “ira_may_move_out_cost”. Why should r113 have a higher cost for predicate >> register >> classes as compared to r108? >
> An interesting question. The code originally was taken from old RA > (particular regclass.c). You can find it in any version of gcc before 4.4 > version. Since then it changed a lot for different reasons (mostly for > solving different PRs). > > I can not find where p113 can get cost from ira_register_move_cost. The only > possibility I see is > > ... > > else if (ira_reg_class_intersect > [pref_class][op_class] == NO_REGS) > alt_cost > += > ira_register_move_cost[mode][pref_class][op_class]; > > It is when preferred class from the 1st iteration does not intersect with the > operand class which looks ok for me. May be I missed something. Could you > point me the place where p113 gets cost from ira_register_move_class. Even > better if you could me provide ira-dump (-fira-verbose=16 for the test case). > p113 gets the cost from 'ira_register_move_cost' in the routine record_operand_costs(). In this routine, if we have a 'set' insn where both destination and source are registers, and one of them is a hard register and the other is a pseudo register, then the costs of the register classes of the pseudo register is obtained from the array 'ira_register_move_cost'. The following insn meets the criteria (r69 is a hard register) 2: set r113, r69 #p1 Here is the code snippet: move_costs = ira_register_move_cost[cost_mode]; ... for (k = cost_classes_ptr->num - 1; k >= 0; k--) { rclass = cost_classes[k]; cost = (i == 0 ? move_costs[hard_reg_class][rclass] : move_costs[rclass][hard_reg_class]); cost *= cost_factor; op_costs[i]->cost[k] = cost * frequency; ... } Regards, Surya