Re: “ira_may_move_out_cost” vs “ira_register_move_cost”
On 6/18/24 03:38, Surya Kumari Jangala wrote: Hi Vladimir, On 14/06/24 10:56 pm, Vladimir Makarov wrote: On 6/13/24 00:34, Surya Kumari Jangala wrote: Hi Vladimir, With my patch for PR111673 (scale the spill/restore cost of callee-save register with the frequency of the entry bb in the routine assign_hard_reg() : https://gcc.gnu.org/pipermail/gcc-patches/2023-October/631849.html), the following Linaro aarch64 test failed due to an extra 'mov' instruction: __SVBool_t __attribute__((noipa)) callee_pred (__SVBool_t p0, __SVBool_t p1, __SVBool_t p2, __SVBool_t p3, __SVBool_t mem0, __SVBool_t mem1) { p0 = svbrkpa_z (p0, p1, p2); p0 = svbrkpb_z (p0, p3, mem0); return svbrka_z (p0, mem1); } With trunk: addvl sp, sp, #-1 str p14, [sp] str p15, [sp, #1, mul vl] ldr p14, [x0] ldr p15, [x1] brkpa p0.b, p0/z, p1.b, p2.b brkpb p0.b, p0/z, p3.b, p14.b brka p0.b, p0/z, p15.b ldr p14, [sp] ldr p15, [sp, #1, mul vl] addvl sp, sp, #1 ret With patch: addvl sp, sp, #-1 str p14, [sp] str p15, [sp, #1, mul vl] mov p14.b, p3.b // extra mov insn ldr p15, [x0] ldr p3, [x1] brkpa p0.b, p0/z, p1.b, p2.b brkpb p0.b, p0/z, p14.b, p15.b brka p0.b, p0/z, p3.b ldr p14, [sp] ldr p15, [sp, #1, mul vl] addvl sp, sp, #1 ret p0-p15 are predicate registers on aarch64 where p0-p3 are caller-save while p4-p15 are callee-save. The input RTL for ira pass: 1: set r112, r68 #p0 2: set r113, r69 #p1 3: set r114, r70 #p2 4: set r115, r71 #p3 5: set r116, x0 #mem0, the 5th parameter 6: set r108, mem(r116) 7: set r117, x1 #mem1, the 6th parameter 8: set r110, mem(r117) 9: set r100, unspec_brkpa(r112, r113, r114) 10: set r101, unspec_brkpb(r100, r115, r108) 11: set r68, unspec_brka(r101, r110) 12: ret r68 Here, r68-r71 represent predicate hard regs p0-p3. With my patch, r113 and r114 are being assigned memory by ira but with trunk they are assigned registers. This in turn leads to a difference in decisions taken by LRA ultimately leading to the extra mov instruction. Register assignment w/ patch: Popping a5(r112,l0) -- assign reg p0 Popping a2(r100,l0) -- assign reg p0 Popping a0(r101,l0) -- assign reg p0 Popping a1(r110,l0) -- assign reg p3 Popping a3(r115,l0) -- assign reg p2 Popping a4(r108,l0) -- assign reg p1 Popping a6(r113,l0) -- (memory is more profitable 8000 vs 9000) spill! Popping a7(r114,l0) -- (memory is more profitable 8000 vs 9000) spill! Popping a8(r117,l0) -- assign reg 1 Popping a9(r116,l0) -- assign reg 0 With patch, cost of memory is 8000 and it is lesser than the cost of callee-save register (9000) and hence memory is assigned to r113 and r114. It is interesting to see that all the callee-save registers are free but none is chosen. The two instructions in which r113 is referenced are: 2: set r113, r69 #p1 9: set r100, unspec_brkpa(r112, r113, r114) IRA computes the memory cost of an allocno in find_costs_and_classes(). In this routine IRA scans each insn and computes memory cost and cost of register classes for each operand in the insn. So for insn 2, memory cost of r113 is set to 4000 because this is the cost of storing r69 to memory if r113 is assigned to memory. The possible register classes of r113 are ALL_REGS, PR_REGS, PR_HI_REGS and PR_LO_REGS. The cost of moving r69 to r113 if r113 is assigned a register from each of the possible register classes is computed. If r113 is assigned a reg in ALL_REGS, then the cost of the move is 18000, while if r113 is assigned a register from any of the predicate register classes, then the cost of the move is 2000. This cost is obtained from the array “ira_register_move_cost”. After scanning insn 9, memory cost of r113 is increased to 8000 because if r113 is assigned memory, we need a load to read the value before using it in the unspec_brkpa. But the register class cost is unchanged. Later in setup_allocno_class_and_costs(), the ALLOCNO_CLASS of r113 is set to PR_REGS. The ALLOCNO_MEMORY_COST of r113 is set to 8000. The ALLOCNO_HARD_REG_COSTS of each register in PR_REGS is set to 2000. During coloring, when r113 has to be assigned a register, the cost of callee-save registers in PR_REGS is increased by the spill/restore cost. So the cost of callee-save registers increases from 2000 to 9000. All the caller-save registers have been assigned to other allocnos, so for r113 memory is assigned as memory is cheaper than callee-save registers. However, for r108, the cost is 0 for register classes PR_REGS, PR_HI_REGS
Re: “ira_may_move_out_cost” vs “ira_register_move_cost”
Hi Vladimir, On 14/06/24 10:56 pm, Vladimir Makarov wrote: > > On 6/13/24 00:34, Surya Kumari Jangala wrote: >> Hi Vladimir, >> With my patch for PR111673 (scale the spill/restore cost of callee-save >> register with the frequency of the entry bb in the routine assign_hard_reg() >> : >> https://gcc.gnu.org/pipermail/gcc-patches/2023-October/631849.html), the >> following Linaro aarch64 test failed due to an extra 'mov' instruction: >> >> __SVBool_t __attribute__((noipa)) >> callee_pred (__SVBool_t p0, __SVBool_t p1, __SVBool_t p2, __SVBool_t p3, >> __SVBool_t mem0, __SVBool_t mem1) >> { >> p0 = svbrkpa_z (p0, p1, p2); >> p0 = svbrkpb_z (p0, p3, mem0); >> return svbrka_z (p0, mem1); >> } >> >> With trunk: >> addvl sp, sp, #-1 >> str p14, [sp] >> str p15, [sp, #1, mul vl] >> ldr p14, [x0] >> ldr p15, [x1] >> brkpa p0.b, p0/z, p1.b, p2.b >> brkpb p0.b, p0/z, p3.b, p14.b >> brka p0.b, p0/z, p15.b >> ldr p14, [sp] >> ldr p15, [sp, #1, mul vl] >> addvl sp, sp, #1 >> ret >> >> With patch: >> addvl sp, sp, #-1 >> str p14, [sp] >> str p15, [sp, #1, mul vl] >> mov p14.b, p3.b // extra mov insn >> ldr p15, [x0] >> ldr p3, [x1] >> brkpa p0.b, p0/z, p1.b, p2.b >> brkpb p0.b, p0/z, p14.b, p15.b >> brka p0.b, p0/z, p3.b >> ldr p14, [sp] >> ldr p15, [sp, #1, mul vl] >> addvl sp, sp, #1 >> ret >> >> p0-p15 are predicate registers on aarch64 where p0-p3 are caller-save while >> p4-p15 are callee-save. >> >> The input RTL for ira pass: >> >> 1: set r112, r68 #p0 >> 2: set r113, r69 #p1 >> 3: set r114, r70 #p2 >> 4: set r115, r71 #p3 >> 5: set r116, x0 #mem0, the 5th parameter >> 6: set r108, mem(r116) >> 7: set r117, x1 #mem1, the 6th parameter >> 8: set r110, mem(r117) >> 9: set r100, unspec_brkpa(r112, r113, r114) >> 10: set r101, unspec_brkpb(r100, r115, r108) >> 11: set r68, unspec_brka(r101, r110) >> 12: ret r68 >> >> Here, r68-r71 represent predicate hard regs p0-p3. >> With my patch, r113 and r114 are being assigned memory by ira but with trunk >> they are >> assigned registers. This in turn leads to a difference in decisions taken by >> LRA >> ultimately leading to the extra mov instruction. >> >> Register assignment w/ patch: >> >> Popping a5(r112,l0) -- assign reg p0 >> Popping a2(r100,l0) -- assign reg p0 >> Popping a0(r101,l0) -- assign reg p0 >> Popping a1(r110,l0) -- assign reg p3 >> Popping a3(r115,l0) -- assign reg p2 >> Popping a4(r108,l0) -- assign reg p1 >> Popping a6(r113,l0) -- (memory is more profitable 8000 vs 9000) >> spill! >> Popping a7(r114,l0) -- (memory is more profitable 8000 vs 9000) >> spill! >> Popping a8(r117,l0) -- assign reg 1 >> Popping a9(r116,l0) -- assign reg 0 >> >> >> With patch, cost of memory is 8000 and it is lesser than the cost of >> callee-save >> register (9000) and hence memory is assigned to r113 and r114. It is >> interesting >> to see that all the callee-save registers are free but none is chosen. >> >> The two instructions in which r113 is referenced are: >> 2: set r113, r69 #p1 >> 9: set r100, unspec_brkpa(r112, r113, r114) >> >> IRA computes the memory cost of an allocno in find_costs_and_classes(). In >> this routine >> IRA scans each insn and computes memory cost and cost of register classes >> for each >> operand in the insn. >> >> So for insn 2, memory cost of r113 is set to 4000 because this is the cost >> of storing >> r69 to memory if r113 is assigned to memory. The possible register classes >> of r113 >> are ALL_REGS, PR_REGS, PR_HI_REGS and PR_LO_REGS. The cost of moving r69 >> to r113 if r113 is assigned a register from each of the possible register >> classes is >> computed. If r113 is assigned a reg in ALL_REGS, then the cost of the >> move is 18000, while if r113 is assigned a register from any of the >> predicate register >> classes, then the cost of the move is 2000. This cost is obtained from the >> array >> “ira_register_move_cost”. After scanning insn 9, memory cost of r113 >> is increased to 8000 because if r113 is assigned memory, we need a load to >> read the >> value before using it in the unspec_brkpa. But the register class cost is >> unchanged. >> >> Later in setup_allocno_class_and_costs(), the ALLOCNO_CLASS of r113 is set >> to PR_REGS. >> The ALLOCNO_MEMORY_COST of r113 is set to 8000. >> The ALLOCNO_HARD_REG_COSTS of each register in PR_REGS is set to 2000. >> >> During coloring, when r113 has to be assigned a register, the cost of >> callee-save >> registers in PR_REGS is increased by the spill/restore cost.
Re: “ira_may_move_out_cost” vs “ira_register_move_cost”
On 6/13/24 00:34, Surya Kumari Jangala wrote: Hi Vladimir, With my patch for PR111673 (scale the spill/restore cost of callee-save register with the frequency of the entry bb in the routine assign_hard_reg() : https://gcc.gnu.org/pipermail/gcc-patches/2023-October/631849.html), the following Linaro aarch64 test failed due to an extra 'mov' instruction: __SVBool_t __attribute__((noipa)) callee_pred (__SVBool_t p0, __SVBool_t p1, __SVBool_t p2, __SVBool_t p3, __SVBool_t mem0, __SVBool_t mem1) { p0 = svbrkpa_z (p0, p1, p2); p0 = svbrkpb_z (p0, p3, mem0); return svbrka_z (p0, mem1); } With trunk: addvl sp, sp, #-1 str p14, [sp] str p15, [sp, #1, mul vl] ldr p14, [x0] ldr p15, [x1] brkpa p0.b, p0/z, p1.b, p2.b brkpb p0.b, p0/z, p3.b, p14.b brkap0.b, p0/z, p15.b ldr p14, [sp] ldr p15, [sp, #1, mul vl] addvl sp, sp, #1 ret With patch: addvl sp, sp, #-1 str p14, [sp] str p15, [sp, #1, mul vl] mov p14.b, p3.b // extra mov insn ldr p15, [x0] ldr p3, [x1] brkpa p0.b, p0/z, p1.b, p2.b brkpb p0.b, p0/z, p14.b, p15.b brkap0.b, p0/z, p3.b ldr p14, [sp] ldr p15, [sp, #1, mul vl] addvl sp, sp, #1 ret p0-p15 are predicate registers on aarch64 where p0-p3 are caller-save while p4-p15 are callee-save. The input RTL for ira pass: 1: set r112, r68#p0 2: set r113, r69#p1 3: set r114, r70#p2 4: set r115, r71#p3 5: set r116, x0 #mem0, the 5th parameter 6: set r108, mem(r116) 7: set r117, x1 #mem1, the 6th parameter 8: set r110, mem(r117) 9: set r100, unspec_brkpa(r112, r113, r114) 10: set r101, unspec_brkpb(r100, r115, r108) 11: set r68, unspec_brka(r101, r110) 12: ret r68 Here, r68-r71 represent predicate hard regs p0-p3. With my patch, r113 and r114 are being assigned memory by ira but with trunk they are assigned registers. This in turn leads to a difference in decisions taken by LRA ultimately leading to the extra mov instruction. Register assignment w/ patch: Popping a5(r112,l0) -- assign reg p0 Popping a2(r100,l0) -- assign reg p0 Popping a0(r101,l0) -- assign reg p0 Popping a1(r110,l0) -- assign reg p3 Popping a3(r115,l0) -- assign reg p2 Popping a4(r108,l0) -- assign reg p1 Popping a6(r113,l0) -- (memory is more profitable 8000 vs 9000) spill! Popping a7(r114,l0) -- (memory is more profitable 8000 vs 9000) spill! Popping a8(r117,l0) -- assign reg 1 Popping a9(r116,l0) -- assign reg 0 With patch, cost of memory is 8000 and it is lesser than the cost of callee-save register (9000) and hence memory is assigned to r113 and r114. It is interesting to see that all the callee-save registers are free but none is chosen. The two instructions in which r113 is referenced are: 2: set r113, r69 #p1 9: set r100, unspec_brkpa(r112, r113, r114) IRA computes the memory cost of an allocno in find_costs_and_classes(). In this routine IRA scans each insn and computes memory cost and cost of register classes for each operand in the insn. So for insn 2, memory cost of r113 is set to 4000 because this is the cost of storing r69 to memory if r113 is assigned to memory. The possible register classes of r113 are ALL_REGS, PR_REGS, PR_HI_REGS and PR_LO_REGS. The cost of moving r69 to r113 if r113 is assigned a register from each of the possible register classes is computed. If r113 is assigned a reg in ALL_REGS, then the cost of the move is 18000, while if r113 is assigned a register from any of the predicate register classes, then the cost of the move is 2000. This cost is obtained from the array “ira_register_move_cost”. After scanning insn 9, memory cost of r113 is increased to 8000 because if r113 is assigned memory, we need a load to read the value before using it in the unspec_brkpa. But the register class cost is unchanged. Later in setup_allocno_class_and_costs(), the ALLOCNO_CLASS of r113 is set to PR_REGS. The ALLOCNO_MEMORY_COST of r113 is set to 8000. The ALLOCNO_HARD_REG_COSTS of each register in PR_REGS is set to 2000. During coloring, when r113 has to be assigned a register, the cost of callee-save registers in PR_REGS is increased by the spill/restore cost. So the cost of callee-save registers increases from 2000 to 9000. All the caller-save registers have been assigned to other allocnos, so for r113 memory is assigned as memory is cheaper than callee-save registers. However, for r108, the cost is 0 for register classes PR_REGS, PR_HI_REGS and PR_LO_REGS. References of r108: 6: set r108, mem(r116) 10: set r101, unspec_brkpb(r100, r115, r108) It was surprising that while for
“ira_may_move_out_cost” vs “ira_register_move_cost”
Hi Vladimir, With my patch for PR111673 (scale the spill/restore cost of callee-save register with the frequency of the entry bb in the routine assign_hard_reg() : https://gcc.gnu.org/pipermail/gcc-patches/2023-October/631849.html), the following Linaro aarch64 test failed due to an extra 'mov' instruction: __SVBool_t __attribute__((noipa)) callee_pred (__SVBool_t p0, __SVBool_t p1, __SVBool_t p2, __SVBool_t p3, __SVBool_t mem0, __SVBool_t mem1) { p0 = svbrkpa_z (p0, p1, p2); p0 = svbrkpb_z (p0, p3, mem0); return svbrka_z (p0, mem1); } With trunk: addvl sp, sp, #-1 str p14, [sp] str p15, [sp, #1, mul vl] ldr p14, [x0] ldr p15, [x1] brkpa p0.b, p0/z, p1.b, p2.b brkpb p0.b, p0/z, p3.b, p14.b brkap0.b, p0/z, p15.b ldr p14, [sp] ldr p15, [sp, #1, mul vl] addvl sp, sp, #1 ret With patch: addvl sp, sp, #-1 str p14, [sp] str p15, [sp, #1, mul vl] mov p14.b, p3.b // extra mov insn ldr p15, [x0] ldr p3, [x1] brkpa p0.b, p0/z, p1.b, p2.b brkpb p0.b, p0/z, p14.b, p15.b brkap0.b, p0/z, p3.b ldr p14, [sp] ldr p15, [sp, #1, mul vl] addvl sp, sp, #1 ret p0-p15 are predicate registers on aarch64 where p0-p3 are caller-save while p4-p15 are callee-save. The input RTL for ira pass: 1: set r112, r68#p0 2: set r113, r69#p1 3: set r114, r70#p2 4: set r115, r71#p3 5: set r116, x0 #mem0, the 5th parameter 6: set r108, mem(r116) 7: set r117, x1 #mem1, the 6th parameter 8: set r110, mem(r117) 9: set r100, unspec_brkpa(r112, r113, r114) 10: set r101, unspec_brkpb(r100, r115, r108) 11: set r68, unspec_brka(r101, r110) 12: ret r68 Here, r68-r71 represent predicate hard regs p0-p3. With my patch, r113 and r114 are being assigned memory by ira but with trunk they are assigned registers. This in turn leads to a difference in decisions taken by LRA ultimately leading to the extra mov instruction. Register assignment w/ patch: Popping a5(r112,l0) -- assign reg p0 Popping a2(r100,l0) -- assign reg p0 Popping a0(r101,l0) -- assign reg p0 Popping a1(r110,l0) -- assign reg p3 Popping a3(r115,l0) -- assign reg p2 Popping a4(r108,l0) -- assign reg p1 Popping a6(r113,l0) -- (memory is more profitable 8000 vs 9000) spill! Popping a7(r114,l0) -- (memory is more profitable 8000 vs 9000) spill! Popping a8(r117,l0) -- assign reg 1 Popping a9(r116,l0) -- assign reg 0 With patch, cost of memory is 8000 and it is lesser than the cost of callee-save register (9000) and hence memory is assigned to r113 and r114. It is interesting to see that all the callee-save registers are free but none is chosen. The two instructions in which r113 is referenced are: 2: set r113, r69 #p1 9: set r100, unspec_brkpa(r112, r113, r114) IRA computes the memory cost of an allocno in find_costs_and_classes(). In this routine IRA scans each insn and computes memory cost and cost of register classes for each operand in the insn. So for insn 2, memory cost of r113 is set to 4000 because this is the cost of storing r69 to memory if r113 is assigned to memory. The possible register classes of r113 are ALL_REGS, PR_REGS, PR_HI_REGS and PR_LO_REGS. The cost of moving r69 to r113 if r113 is assigned a register from each of the possible register classes is computed. If r113 is assigned a reg in ALL_REGS, then the cost of the move is 18000, while if r113 is assigned a register from any of the predicate register classes, then the cost of the move is 2000. This cost is obtained from the array “ira_register_move_cost”. After scanning insn 9, memory cost of r113 is increased to 8000 because if r113 is assigned memory, we need a load to read the value before using it in the unspec_brkpa. But the register class cost is unchanged. Later in setup_allocno_class_and_costs(), the ALLOCNO_CLASS of r113 is set to PR_REGS. The ALLOCNO_MEMORY_COST of r113 is set to 8000. The ALLOCNO_HARD_REG_COSTS of each register in PR_REGS is set to 2000. During coloring, when r113 has to be assigned a register, the cost of callee-save registers in PR_REGS is increased by the spill/restore cost. So the cost of callee-save registers increases from 2000 to 9000. All the caller-save registers have been assigned to other allocnos, so for r113 memory is assigned as memory is cheaper than callee-save registers. However, for r108, the cost is 0 for register classes PR_REGS, PR_HI_REGS and PR_LO_REGS. References of r108: 6: set r108, mem(r116) 10: set r101, unspec_brkpb(r100, r115, r108) It was surprising that while for r113, the cost of the predicate register classes was 2000, for