Re: LRA for avr: Maintain live range info for pseudos assigned to FP?
On Thu, 2023-10-05 at 15:33 -0400, Vladimir Makarov wrote: > EXTERNAL EMAIL: Do not click links or open attachments unless you know the > content is safe > > On 9/7/23 07:21, senthilkumar.selva...@microchip.com wrote: > > Hi, > > > >One more execution failure for the avr target, this time from > >gcc.c-torture/execute/bitfld-3.c. > > > >Steps to reproduce > > > >Enable LRA in avr.cc by removing TARGET_LRA_P hook, build with > > > > $ make all-host && make install-host > > > >and then > > > > $ avr-gcc gcc/testsuite/gcc.c-torture/execute/bitfld-3.c -S -Os -mmcu=avr51 > > -fdump-rtl-all > > > >When lra_update_fp2sp_elimination runs and pseudos assigned to the > >FP have to be spilled to stack slots, they sometimes end up in a > >slot that already has a reg with an overlapping live range. This is > >because lra_reg_info[regno].live_ranges is NULL for such spilled > >pseudos, and therefore when assign_stack_slot_num_and_sort_pseduos > >checks if lra_intersected_live_ranges_p, it always returns false. > > > >In the below reload dump, all the pseudos assigned to FP get > >allocated to slot 0. The live ranges for some of them (r1241 for > >e.g.) conflicts with r603 that was originally assigned to slot 0, > >but they still end up in the same slot, causing the execution failure. > > > Sorry for the delay with the answer, Senthil. Avr is unusual target and > needs some changes in LRA but the changes improves LRA portability. So > thank you for your work on porting LRA to AVR. > > The patch is ok for me. The only comment is that making calculation of > the set only once would be nice. Live range calculation in LRA can take > a lot of time, code of update_pseudo_point is hot and the worst the set > will be really used rarely but it is calculated every time. > > You can commit the current patch and I'll do it by myself or, if you > want, you can modify the patch by yourself and submit it for review and > I'll review as soon as possible. Either way works for me. Apologies for the extreme delay in responding - had to sort out some medical issues. Is it ok if I commit the patch now? I have one more patch in ira.cc, after which I'm hoping the regression results would be good enough to switch the avr target to LRA. Regards Senthil > > > diff --git a/gcc/lra-lives.cc b/gcc/lra-lives.cc > > index f60e564da82..e4289f13979 100644 > > --- a/gcc/lra-lives.cc > > +++ b/gcc/lra-lives.cc > > @@ -250,7 +250,17 @@ update_pseudo_point (int regno, int point, enum > > point_type type) > > if (HARD_REGISTER_NUM_P (regno)) > > return; > > > > - if (complete_info_p || lra_get_regno_hard_regno (regno) < 0) > > + /* Pseudos assigned to the FP register could potentially get spilled > > + to stack slots when lra_update_fp2sp_elimination runs, so keep > > + their live range info up to date, even if they aren't in memory > > + right now. */ > > + int hard_regno = lra_get_regno_hard_regno (regno); > > + HARD_REG_SET set; > > + CLEAR_HARD_REG_SET(set); > > + add_to_hard_reg_set (&set, Pmode, HARD_FRAME_POINTER_REGNUM); > > + > > + if (complete_info_p || hard_regno < 0 > > + || overlaps_hard_reg_set_p (set, PSEUDO_REGNO_MODE (regno), > > hard_regno)) > > { > > if (type == DEF_POINT) > > { > > > >
LRA for avr: Maintain live range info for pseudos assigned to FP?
Hi, One more execution failure for the avr target, this time from gcc.c-torture/execute/bitfld-3.c. Steps to reproduce Enable LRA in avr.cc by removing TARGET_LRA_P hook, build with $ make all-host && make install-host and then $ avr-gcc gcc/testsuite/gcc.c-torture/execute/bitfld-3.c -S -Os -mmcu=avr51 -fdump-rtl-all When lra_update_fp2sp_elimination runs and pseudos assigned to the FP have to be spilled to stack slots, they sometimes end up in a slot that already has a reg with an overlapping live range. This is because lra_reg_info[regno].live_ranges is NULL for such spilled pseudos, and therefore when assign_stack_slot_num_and_sort_pseduos checks if lra_intersected_live_ranges_p, it always returns false. In the below reload dump, all the pseudos assigned to FP get allocated to slot 0. The live ranges for some of them (r1241 for e.g.) conflicts with r603 that was originally assigned to slot 0, but they still end up in the same slot, causing the execution failure. r472: [86..87] r473: [0..85] ... r603: [254..655] ... r1241: [282..283] r1242: [268..269] r1243: [254..255] r1244: [238..239] r1245: [222..223] r1314: [88..89] Ranges after the compression: r591: [0..1] r603: [0..1] r604: [0..1] r605: [0..1] r606: [0..1] r607: [0..1] r623: [0..1] r624: [0..1] r635: [0..1] r636: [0..1] r637: [0..1] r638: [0..1] r639: [0..1] r668: [0..1] r669: [0..1] r670: [0..1] r671: [0..1] r672: [0..1] Frame pointer can not be eliminated anymore Spilling non-eliminable hard regs: 28 29 Spilling r472(28) Spilling r473(28) Spilling r589(29) Spilling r590(28) Spilling r704(29) Spilling r1241(28) Spilling r1242(28) Spilling r1243(28) Spilling r1244(28) Spilling r1245(28) Spilling r1314(28) Slot 0 regnos (width = 0): 603 13141245124412431242 1241704 590 589 473 472 Live ranges for those pseudos is NULL because when lra_create_live_ranges_1 ran with all_p = false, they were not in memory (they were assigned to FP). Computing live range info for pseudos assigned to FP fixes the problem for avr, Is that the right fix for this problem? After applying the below patch, the reload dump looks like this Ranges after the compression: r472: [2..3] r473: [0..1] r589: [16..17] r590: [16..17] r591: [16..17] r603: [10..17] r604: [10..17] r605: [10..17] r606: [10..17] r607: [10..17] r623: [6..17] r624: [6..17] r635: [8..17] r636: [8..17] r637: [8..17] r638: [8..17] r639: [8..17] r668: [6..17] r669: [6..17] r670: [6..17] r671: [6..17] r672: [6..17] r704: [0..1] r1241: [14..15] r1242: [12..13] r1243: [10..11] r1244: [8..9] r1245: [6..7] r1314: [4..5] Frame pointer can not be eliminated anymore Spilling non-eliminable hard regs: 28 29 Spilling r472(28) Spilling r473(28) Spilling r589(29) Spilling r590(28) Spilling r704(29) Spilling r1241(28) Spilling r1242(28) Spilling r1243(28) Spilling r1244(28) Spilling r1245(28) Spilling r1314(28) Slot 0 regnos (width = 0): 603 131412451244473 472 Slot 1 regnos (width = 0): 604 704 ... Slot 17 regnos (width = 0):591 124312421241 Slot 18 regnos (width = 0):589 Slot 19 regnos (width = 0):590 Regards Senthil diff --git a/gcc/lra-lives.cc b/gcc/lra-lives.cc index f60e564da82..e4289f13979 100644 --- a/gcc/lra-lives.cc +++ b/gcc/lra-lives.cc @@ -250,7 +250,17 @@ update_pseudo_point (int regno, int point, enum point_type type) if (HARD_REGISTER_NUM_P (regno)) return; - if (complete_info_p || lra_get_regno_hard_regno (regno) < 0) + /* Pseudos assigned to the FP register could potentially get spilled + to stack slots when lra_update_fp2sp_elimination runs, so keep + their live range info up to date, even if they aren't in memory + right now. */ + int hard_regno = lra_get_regno_hard_regno (regno); + HARD_REG_SET set; + CLEAR_HARD_REG_SET(set); + add_to_hard_reg_set (&set, Pmode, HARD_FRAME_POINTER_REGNUM); + + if (complete_info_p || hard_regno < 0 + || overlaps_hard_reg_set_p (set, PSEUDO_REGNO_MODE (regno), hard_regno)) { if (type == DEF_POINT) {
LRA for avr: Clear mem rtx when resizing stack slots?
Hi, The below (slightly) reduced test-case from gcc.c-torture/execute/simd-2.c produces an execution failure for avr. typedef short __attribute__((vector_size (16))) vecint; vecint i = { 150, 100, 150, 200, 0, 0, 0, 0 }; vecint j = { 10, 13, 20, 30, 1, 1, 1, 1 }; vecint k; union { vecint v; short i[8]; } res; void verify (int a1, int a2, int a3, int a4, int b1, int b2, int b3, int b4) { if (a1 != b1 || a2 != b2 || a3 != b3 || a4 != b4) abort (); } int main () { k = i + j; res.v = k; verify (res.i[0], res.i[1], res.i[2], res.i[3], 160, 113, 170, 230); k = i * j; res.v = k; verify (res.i[0], res.i[1], res.i[2], res.i[3], 1500, 1300, 3000, 6000); k = i / j; res.v = k; verify (res.i[0], res.i[1], res.i[2], res.i[3], 15, 7, 7, 6); k = i & j; res.v = k; verify (res.i[0], res.i[1], res.i[2], res.i[3], 2, 4, 20, 8); k = i | j; res.v = k; verify (res.i[0], res.i[1], res.i[2], res.i[3], 158, 109, 150, 222); k = i ^ j; res.v = k; verify (res.i[0], res.i[1], res.i[2], res.i[3], 156, 105, 130, 214); k = -i; res.v = k; verify (res.i[0], res.i[1], res.i[2], res.i[3], -150, -100, -150, -200); exit (0); } To reproduce, enable LRA in avr.c by removing TARGET_LRA_P hook, build with make all-host && make install-host, and compile $ avr-gcc -mmcu=avr51 ~/code/personal/gcc/gcc/testsuite/gcc.c-torture/execute/simd-2.c -Os -S -o out.s -fdump-rtl-all-raw The pass before dse2 has the below instructions to store the lowest two bytes of i (i.e 150) at stack offsets 23 and 21. (insn 2403 2402 2404 2 (parallel [ (set (reg:QI 24 r24 [orig:102 _131 ] [102]) (mem/c:QI (const:HI (plus:HI (symbol_ref:HI ("i") [flags 0x2] ) (const_int 1 [0x1]))) [1 i+1 S1 A8])) (clobber (reg:CC 36 cc)) ]) 89 {movqi_insn} (nil)) (insn 2404 2403 2405 2 (parallel [ (set (mem/c:QI (plus:HI (reg/f:HI 28 r28) (const_int 23 [0x17])) [4 %sfp+23 S1 A8]) (reg:QI 24 r24 [orig:102 _131 ] [102])) (clobber (reg:CC 36 cc)) ]) 89 {movqi_insn} (nil)) (insn 2405 2404 2406 2 (parallel [ (set (reg:QI 24 r24 [orig:99 _128 ] [99]) (mem/c:QI (symbol_ref:HI ("i") [flags 0x2] ) [1 i+0 S1 A128])) (clobber (reg:CC 36 cc)) ]) 89 {movqi_insn} (nil)) (insn 2406 2405 2407 2 (parallel [ (set (mem/c:QI (plus:HI (reg/f:HI 28 r28) (const_int 21 [0x15])) [4 %sfp+21 S1 A8]) (reg:QI 24 r24 [orig:99 _128 ] [99])) (clobber (reg:CC 36 cc)) ]) 89 {movqi_insn} (nil)) dse2 deletes these insns, as it sees the below clobber before the values in the stack offsets are used. (insn 448 2083 2084 15 (clobber (mem/c:TI (plus:HI (reg/f:HI 28 r28) (const_int 9 [0x9])) [4 %sfp+9 S16 A8])) -1 (nil)) DSE figures everything from stack offset 9 to 9 + 16 = 25 (16 is GET_MODE_SIZE (TImode)) is clobbered, so it deletes the dead stores. The reason for the dead stores is overlapping stack offsets for two spilled pseudos. IRA has (insn 34 32 36 2 (set (reg:QI 102 [ _131 ]) (mem/c:QI (const:HI (plus:HI (symbol_ref:HI ("i") [flags 0x2] ) (const_int 1 [0x1]))) [1 i+1 S1 A8])) 86 {movqi_insn_split} (nil)) (insn 36 34 38 2 (set (reg:QI 99 [ _128 ]) (mem/c:QI (symbol_ref:HI ("i") [flags 0x2] ) [1 i+0 S1 A128])) 86 {movqi_insn_split} (nil)) (insn 448 430 432 15 (clobber (reg:TI 147 [ _176 ])) -1 (nil)) and LRA spills 102 and 99 to distinct stack slots, like so Slot 10 regnos (width = 0): 99 Slot 12 regnos (width = 0): 102 However, when spilling 147, Breakpoint 1, add_pseudo_to_slot (regno=147, slot_num=4) at /home/i41766/code/personal/gcc/gcc/lra-spills.cc:331 331 lra_reg_info[regno].biggest_mode); (gdb) p slots[slot_num] $1 = {regno = 51, hard_regno = -1, align = 8, size = {> = {coeffs = {2}}, }, mem = 0x7fffea5eb1c8, live_ranges = 0x33b29c0} (gdb) p debug_rtx(slots[slot_num].mem) (mem/c:BLK (plus:HI (reg/f:HI 28 r28) (const_int 9 [0x9])) [0 A8]) it reuses slot 4, originally 2 bytes wide and with stack offset 9. Live ranges for r51 and r147 do not conflict, so this ok, and it resizes the stack slot to be as wide as TImode. However, it does not reset slots[slot_num].mem, and assign_mem_slot when called for r147 sets the same stack offset rtx as pseudo_slots[147].mem. This appears wrong to me, as the bigger slot may conflict with stack offsets assigned to other slots. The below patch resets slots[slot_num].mem if the resized slot is bigger than before. assign_mem_slot later gets a new stack offset from function.cc:assign_stack_local and sets it as the slot's MEM rtx. diff --git a/gcc/lra-spills.cc b/gcc/lra-spills.c
Re: LRA for avr: help with FP and elimination
On Tue, 2023-07-18 at 11:04 -0400, Vladimir Makarov wrote: > EXTERNAL EMAIL: Do not click links or open attachments unless you know the > content is safe > > On 7/17/23 03:17, senthilkumar.selva...@microchip.com wrote: > > On Fri, 2023-07-14 at 09:29 -0400, Vladimir Makarov wrote: > > > If you send me the preprocessed test, I could start to work on it to fix > > > the problems. I think it is hard to fix them right for a person having > > > a little experience with LRA. > > > > > > > > Ok, this is a reduced test case that reproduces the failure. > > > > $ cat case.c > > typedef int HItype __attribute__ ((mode (HI))); > > HItype > > __mulvhi3 (HItype a, HItype b) > > { > >HItype w; > > > >if (__builtin_mul_overflow (a, b, &w)) > > __builtin_trap (); > > > >return w; > > } > > > > On latest master, this trivial patch turns on LRA for avr > > --- gcc/config/avr/avr.cc > > +++ gcc/config/avr/avr.cc > > @@ -15244,9 +15244,6 @@ avr_float_lib_compare_returns_bool (machine_mode > > mode, enum rtx_code) > > #undef TARGET_CONVERT_TO_TYPE > > #define TARGET_CONVERT_TO_TYPE avr_convert_to_type > > > > -#undef TARGET_LRA_P > > -#define TARGET_LRA_P hook_bool_void_false > > - > > #undef TARGET_ADDR_SPACE_SUBSET_P > > #define TARGET_ADDR_SPACE_SUBSET_P avr_addr_space_subset_p > > > > Then configuring and building for avr without attempting to build libgcc > > > > $ configure --target=avr --prefix= --enable-languages=c && make > > all-host && make install-host > > > > And finally to reproduce the failure > > $ /bin/avr-gcc -mmcu=avr25 case.c -Os > > Thank you. I've reproduced the bug and started to work on it > yesterday. The problem is a bit tricky than I initially thought but I > believe I'll fix it on this week. > > Hi Vlad, I can confirm your commit (https://gcc.gnu.org/git?p=gcc.git;a=commit;h=2971ff7b1d564ac04b537d907c70e6093af70832) fixes the above problem, thank you. However, I see execution failures if a pseudo assigned to FP has to be spilled because of stack slot creation. To reproduce, build the compiler just like above, and then do $ avr-gcc -mmcu=avr51 /gcc/testsuite/gcc.c-torture/execute/20050224-1.c -O2 -S -fdump-rtl-all The execution failure occurs at this point movw r24,r2 sbiw r24,36 brne .L8 r2 is never set anywhere at all in the assembly. The relevant insns (in the IRA dump) are (insn 3 15 4 3 (set (reg/v:HI 51 [ j ]) (const_int 0 [0])) "gcc/gcc/testsuite/gcc.c-torture/execute/20050224-1.c":19:21 101 {*movhi_split} (expr_list:REG_EQUAL (const_int 0 [0]) (nil))) ... (insn 28 27 67 8 (parallel [ (set (reg/v:HI 51 [ j ]) (plus:HI (reg/v:HI 51 [ j ]) (const_int 1 [0x1]))) (clobber (scratch:QI)) ]) "/home/i41766/code/personal/gcc/gcc/testsuite/gcc.c-torture/execute/20050224-1.c":28:8 175 {addhi3_clobber} (nil)) ... (jump_insn 44 43 45 13 (parallel [ (set (pc) (if_then_else (ne (reg/v:HI 51 [ j ]) (const_int 36 [0x24])) (label_ref:HI 103) (pc))) (clobber (scratch:QI)) ]) "/home/i41766/code/personal/gcc/gcc/testsuite/gcc.c-torture/execute/20050224-1.c":11:16 discrim 1 713 {cbranchhi4_insn} (expr_list:REG_DEAD (reg/v:HI 51 [ j ]) (int_list:REG_BR_PROB 7 (nil))) -> 103) LRA deletes insns 3 and 28, and uses r2 in the jump_insn. In the reload dump, for pseudo r51, I'm seeing this subreg regs: Frame pointer can not be eliminated anymore Spilling non-eliminable hard regs: 28 29 Spilling r51(28) Slot 0 regnos (width = 0): 46 Slot 1 regnos (width = 0): 45 lra_update_fp2sp_elimination calls spill_pseudos with HARD_FRAME_POINTER_REGNUM, and that sets reg_renumber[51] to -1. Later down the line, process_bb_lives is called with dead_insn_p=true from lra_create_lives_ranges_1 on the relevant BB (#8), and df_get_live_out on that BB does not contain 51 (even though previous calls to the same BB did). Breakpoint 8, process_bb_lives (bb=0x7fffea570240, curr_point=@0x7fffd838: 25, dead_insn_p=true) at gcc/gcc/lra-lives.cc:664 664 function_abi last_call_abi = default_function_abi; (gdb) n 666 reg_live_out = df_get_live_out (bb); (gdb) 667 sparseset_clear (pseudos_live); (gdb) p debug_bitmap(reg_live_out) first = 0x321c128 current = 0x321c128 indx = 0 0x321c128 next = (nil) prev = (nil) indx = 0 bits = { 28 32 34 43 44 47 48 49 50 } process_bb_lives then considers the insn setting 51 (and the reload insns LRA created) as dead, and removes them. BB 8 Insn 67: point = 31, n_alt = -1 Insn 114: point = 31, n_alt = 3 Deleting dead insn 114 deleting insn with uid = 114. Insn 28: point = 31, n_alt = 1 Deleting dead insn 28 deleting insn with uid = 28. Insn 113: point = 31, n_al
LRA for avr: Arithmetic on stack pointer
Hi, After turning on FP -> SP elimination after Vlad fixed an elimination issue in https://gcc.gnu.org/git?p=gcc.git;a=commit;h=2971ff7b1d564ac04b537d907c70e6093af70832, I'm now running into reload failure if arithmetic is done on SP. For a call to a vararg functions, the avr target pushes args into the stack, calls the function, and then adjusts the SP back to where it was before the arg pushing occurred. So for code like extern int foo(int, ...); int bar(void) { long double l = 1.2345E6; foo(0, l); return 0; } and $ avr-gcc -mmcu=avr51 -Os ../20031208-1.c Reload sees (insn 5 2 6 2 (set (reg:QI 44) (const_int 65 [0x41])) "../20031208-1.c":4:3 86 {movqi_insn_split} (expr_list:REG_EQUIV (const_int 65 [0x41]) (nil))) (insn 6 5 7 2 (set (mem:QI (post_dec:HI (reg/f:HI 32 __SP_L__)) [0 S1 A8]) (reg:QI 44)) "../20031208-1.c":4:3 1 {pushqi1} (expr_list:REG_DEAD (reg:QI 44) (expr_list:REG_ARGS_SIZE (const_int 1 [0x1]) (nil ... ... (call_insn 19 18 21 2 (parallel [ (set (reg:HI 24 r24) (call (mem:HI (symbol_ref:HI ("foo") [flags 0x41] ) [0 foo S2 A8]) (const_int 10 [0xa]))) (use (const_int 0 [0])) ]) "../20031208-1.c":4:3 776 {call_value_insn} (expr_list:REG_UNUSED (reg:HI 24 r24) (expr_list:REG_CALL_DECL (symbol_ref:HI ("foo") [flags 0x41] ) (nil))) (nil)) (insn 21 19 25 2 (set (reg/f:HI 32 __SP_L__) (plus:HI (reg/f:HI 32 __SP_L__) (const_int 10 [0xa]))) "../20031208-1.c":5:10 discrim 1 165 {*addhi3_split} (expr_list:REG_UNUSED (reg:QI 33 __SP_H__) (expr_list:REG_ARGS_SIZE (const_int 0 [0]) (nil LRA doesn't pick any of the 4 alternatives for insn 21 Considering alt=0 of insn 21: (0) =??r (1) %0 (2) r Staticly defined alt reject+=12 Considering alt=1 of insn 21: (0) d (1) 0 (2) s Considering alt=2 of insn 21: (0) !w (1) 0 (2) IJYIJ Staticly defined alt reject+=600 Considering alt=3 of insn 21: (0) d (1) 0 (2) nYnn Considering alt=0 of insn 21: (0) =??r (1) %0 (2) r Staticly defined alt reject+=12 Considering alt=1 of insn 21: (0) d (1) 0 (2) s Considering alt=2 of insn 21: (0) !w (1) 0 (2) IJYIJ Staticly defined alt reject+=600 Considering alt=3 of insn 21: (0) d (1) 0 (2) nYnn whereas classic reload does. Reloads for insn # 21 Reload 0: reload_in (HI) = (reg/f:HI 32 __SP_L__) reload_out (HI) = (reg/f:HI 32 __SP_L__) LD_REGS, RELOAD_OTHER (opnum = 0) reload_in_reg: (reg/f:HI 32 __SP_L__) reload_out_reg: (reg/f:HI 32 __SP_L__) reload_reg_rtx: (reg:HI 24 r24) Digging through the code, I found lra-constraints.c:2687, which forbids output reload of SP, with a comment that says /* Never do output reload of stack pointer. It makes impossible to do elimination when SP is changed in RTL. */ Just to check if that is indeed the problem, I commented out the check and goto fail; that follows that comment, and reload/code-gen worked fine. Considering alt=0 of insn 21: (0) =??r (1) %0 (2) r Staticly defined alt reject+=12 0 Non-pseudo reload: reject+=2 0 Small class reload: reject+=3 0 Non input pseudo reload: reject++ overall=24,losers=1 -- refuse Considering alt=1 of insn 21: (0) d (1) 0 (2) s 0 Non-pseudo reload: reject+=2 0 Small class reload: reject+=3 0 Non input pseudo reload: reject++ 1 Matching alt: reject+=2 1 Non-pseudo reload: reject+=2 1 Small class reload: reject+=3 1 Non input pseudo reload: reject++ overall=26,losers=2 -- refuse Considering alt=2 of insn 21: (0) !w (1) 0 (2) IJYIJ Staticly defined alt reject+=600 0 Non-pseudo reload: reject+=2 0 Non input pseudo reload: reject++ overall=609,losers=1 -- refuse Considering alt=3 of insn 21: (0) d (1) 0 (2) nYnn 0 Non-pseudo reload: reject+=2 0 Small class reload: reject+=3 0 Non input pseudo reload: reject++ 1 Matching alt: reject+=2 1 Non-pseudo reload: reject+=2 1 Small class reload: reject+=3 1 Non input pseudo reload: reject++ overall=26,losers=2 -- refuse Choosing alt 3 in insn 21: (0) d (1) 0 (2) nYnn {*addhi3_split} (sp_off=-10) Creating newreg=50 from oldreg=32, assigning class LD_REGS to r50 21: r50:HI=r50:HI+0xa REG_UNUSED __SP_H__:QI REG_ARGS_SIZE 0 Inserting insn reload before: 29: r50:HI=__SP_L__:HI Inserting insn reload after: 30: __SP_L__:HI=r50:HI eventually generating (
Re: LRA for avr: Handling hard regs set directly at expand
On Wed, 2023-08-02 at 12:54 -0400, Vladimir Makarov wrote: > EXTERNAL EMAIL: Do not click links or open attachments unless you know the > content is safe > > On 7/17/23 07:33, senthilkumar.selva...@microchip.com wrote: > > Hi, > > > >The avr target has a bunch of patterns that directly set hard regs at > > expand time, like so > > > > (define_expand "cpymemhi" > >[(parallel [(set (match_operand:BLK 0 "memory_operand" "") > > (match_operand:BLK 1 "memory_operand" "")) > >(use (match_operand:HI 2 "const_int_operand" "")) > >(use (match_operand:HI 3 "const_int_operand" ""))])] > >"" > >{ > > if (avr_emit_cpymemhi (operands)) > >DONE; > > > > FAIL; > >}) > > > > where avr_emit_cpymemhi generates > > > > (insn 14 13 15 4 (set (reg:HI 30 r30) > > (reg:HI 48 [ ivtmp.10 ])) "pr53505.c":21:22 -1 > > (nil)) > > (insn 15 14 16 4 (set (reg:HI 26 r26) > > (reg/f:HI 38 virtual-stack-vars)) "pr53505.c":21:22 -1 > > (nil)) > > (insn 16 15 17 4 (parallel [ > > (set (mem:BLK (reg:HI 26 r26) [0 A8]) > > (mem:BLK (reg:HI 30 r30) [0 A8])) > > (unspec [ > > (const_int 0 [0]) > > ] UNSPEC_CPYMEM) > > (use (reg:QI 52)) > > (clobber (reg:HI 26 r26)) > > (clobber (reg:HI 30 r30)) > > (clobber (reg:QI 0 r0)) > > (clobber (reg:QI 52)) > > ]) "pr53505.c":21:22 -1 > > (nil)) > > > > Classic reload knows about these - find_reg masks out bad_spill_regs, and > > bad_spill_regs > > when ORed with chain->live_throughout in order_regs_for_reload picks up r30. > > > > LRA, however, appears to not consider that, and proceeds to use such regs > > as reload regs. > > For the same source, it generates > > > > Choosing alt 0 in insn 15: (0) =r (1) r {*movhi_split} > >Creating newreg=70, assigning class GENERAL_REGS to r70 > > 15: r26:HI=r70:HI > >REG_EQUAL r28:HI+0x1 > > Inserting insn reload before: > > 58: r70:HI=r28:HI+0x1 > > > > Choosing alt 3 in insn 58: (0) d (1) 0 (2) nYnn {*addhi3_split} > >Creating newreg=71 from oldreg=70, assigning class LD_REGS to r71 > > 58: r71:HI=r71:HI+0x1 > > Inserting insn reload before: > > 59: r71:HI=r28:HI > > Inserting insn reload after: > > 60: r70:HI=r71:HI > > > > ** Assignment #1: ** > > > >Assigning to 71 (cl=LD_REGS, orig=70, freq=3000, tfirst=71, > > tfreq=3000)... > > Assign 30 to reload r71 (freq=3000) > > Hard reg 26 is preferable by r70 with profit 1000 > > Hard reg 30 is preferable by r70 with profit 1000 > >Assigning to 70 (cl=GENERAL_REGS, orig=70, freq=2000, tfirst=70, > > tfreq=2000)... > > Assign 30 to reload r70 (freq=2000) > > > > > > (insn 14 13 59 3 (set (reg:HI 30 r30) > > (reg:HI 18 r18 [orig:48 ivtmp.10 ] [48])) "pr53505.c":21:22 101 > > {*movhi_split} > > (nil)) > > (insn 59 14 58 3 (set (reg:HI 30 r30 [70]) > > (reg/f:HI 28 r28)) "pr53505.c":21:22 101 {*movhi_split} > > (nil)) > > (insn 58 59 15 3 (set (reg:HI 30 r30 [70]) > > (plus:HI (reg:HI 30 r30 [70]) > > (const_int 1 [0x1]))) "pr53505.c":21:22 165 {*addhi3_split} > > (nil)) > > (insn 15 58 16 3 (set (reg:HI 26 r26) > > (reg:HI 30 r30 [70])) "pr53505.c":21:22 101 {*movhi_split} > > (expr_list:REG_EQUAL (plus:HI (reg/f:HI 28 r28) > > (const_int 1 [0x1])) > > (nil))) > > (insn 16 15 17 3 (parallel [ > > (set (mem:BLK (reg:HI 26 r26) [0 A8]) > > (mem:BLK (reg:HI 30 r30) [0 A8])) > > (unspec [ > > (const_int 0 [0]) > > ] UNSPEC_CPYMEM) > > (use (reg:QI 22 r22 [52])) > > (clobber (reg:HI 26 r26)) > > (clobber (reg:HI 30 r30)) > > (clobber (reg:QI 0 r0)) > > (clobber (reg:QI 22 r22 [52])) > > ]) "pr53505.c":21:22 132 {cpymem_qi} > > (nil)) > > > > LRA generates insn 59 that clobbers r30 set in insn 14, causing an execution > > failure down the line. > > > > How should the avr backend deal with this? > > > Sorry for the big delay with the answer. I was on vacation. > > There are probably some ways to fix it by changing patterns as other > people suggested but I'd like to see the current patterns work for LRA > as well. > > Could you send me the test case on which I could reproduce the problem > and work on implementing such functionality. > > Thanks for taking your time to look at this. To reproduce the behavior, apply the below patch on master diff --git gcc/config/avr/avr.cc gcc/config/avr/avr.cc index 25f3f4c22e0..a9ab8259339 100644 --- gcc/config/avr/avr.cc +++ gcc/config/avr/avr.cc @@ -1574,6 +1574,9 @@ avr_allocate_stack_slots_for_args (void)
Re: LRA for avr: Handling hard regs set directly at expand
On Thu, 2023-07-27 at 15:11 +0200, Georg-Johann Lay wrote: > EXTERNAL EMAIL: Do not click links or open attachments unless you know the > content is safe > > Am 17.07.23 um 13:33 schrieb SenthilKumar.Selvaraj--- via Gcc: > > Hi, > > > >The avr target has a bunch of patterns that directly set hard regs at > > expand time, like so > > The correct approach would be to use usual predicates together with > constraints that describe the register instead of hard regs, e.g. > (match_operand:HI n "register_operand" "R18_2") for a 2-byte register > that starts at R18 instead of (reg:HI 18). I deprecated and removed > constraints starting with "R" long ago in order to get "R" free for that > purpose. > > Some years ago I tried such constraints (and hence also zoo of new > register classes that are required to accommodate them). The resulting > code quality was so bad that I quickly abandoned that approach, and IIRC > there were also spill fails. Appears that reload / ira was overwhelmed > by the multitude of new reg classes and took sub-optimal decisions. > > The way out was more of explicit hard regs in expand, together with > awkward functionalities like avr_fix_operands (PR63633) and the > functions that use it. That way we get correct code without performance > penalties in unrelated places. > > Most of such insns are explicitly modelling hand-written asm functions > in libgcc, because most of these functions have a footprint smaller than > the default ABI. And some functions have an interface not complying to > default ABI. > > For the case of cpymem etc from below, explicit hard registers were used > because register allocator did a bad job when using constraints like "e" > (X, Y, or Z). I guessed that much. Yes, using constraints works - I used "x" and "z" that directly correspond to REG_X and REG_Z (ignore the weird operand numbering). diff --git a/gcc/config/avr/avr.md b/gcc/config/avr/avr.md index be0f8dcbe0e..6c6c4e4e212 100644 --- a/gcc/config/avr/avr.md +++ b/gcc/config/avr/avr.md @@ -1148,20 +1148,20 @@ ;; "cpymem_qi" ;; "cpymem_hi" (define_insn_and_split "cpymem_" - [(set (mem:BLK (reg:HI REG_X)) -(mem:BLK (reg:HI REG_Z))) + [(set (mem:BLK (match_operand:HI 3 "register_operand" "+x")) +(mem:BLK (match_operand:HI 4 "register_operand" "+z"))) (unspec [(match_operand:QI 0 "const_int_operand" "n")] UNSPEC_CPYMEM) (use (match_operand:QIHI 1 "register_operand" "")) - (clobber (reg:HI REG_X)) - (clobber (reg:HI REG_Z)) + (clobber (match_dup 3)) + (clobber (match_dup 4)) (clobber (reg:QI LPM_REGNO)) (clobber (match_operand:QIHI 2 "register_operand" "=1"))] "" "#" "&& reload_completed" - [(parallel [(set (mem:BLK (reg:HI REG_X)) - (mem:BLK (reg:HI REG_Z))) + [(parallel [(set (mem:BLK (match_dup 3)) + (mem:BLK (match_dup 4))) (unspec [(match_dup 0)] UNSPEC_CPYMEM) (use (match_dup 1)) I know you did these changes a long time ago, but do you happen to have any test cases lying around that I can use to see if LRA does a better job than classic reload? Vladimir, given that classic reload handled such hardcoded hard regs just fine, should LRA also be able to deal with them the same way? Or is this something that LRA is not going to support? Regards Senthil > > Johann > > > > (define_expand "cpymemhi" > >[(parallel [(set (match_operand:BLK 0 "memory_operand" "") > > (match_operand:BLK 1 "memory_operand" "")) > >(use (match_operand:HI 2 "const_int_operand" "")) > >(use (match_operand:HI 3 "const_int_operand" ""))])] > >"" > >{ > > if (avr_emit_cpymemhi (operands)) > >DONE; > > > > FAIL; > >}) > > > > where avr_emit_cpymemhi generates > > > > (insn 14 13 15 4 (set (reg:HI 30 r30) > > (reg:HI 48 [ ivtmp.10 ])) "pr53505.c":21:22 -1 > > (nil)) > > (insn 15 14 16 4 (set (reg:HI 26 r26) > > (reg/f:HI 38 virtual-stack-vars)) "pr53505.c":21:22 -1 > > (nil)) > > (insn 16 15 17 4 (parallel [ > > (set (mem:BLK (reg:HI 26 r26) [0 A8]) > > (mem:BLK (reg:HI 30 r30) [0 A8])) > > (unspec [ > > (const_int 0 [0]) > > ]
LRA for avr: Clobber with match_dup
Hi, The avr backend has this define_insn_and_split (define_insn_and_split "*tablejump_split" [(set (pc) (unspec:HI [(match_operand:HI 0 "register_operand" "!z,*r,z")] UNSPEC_INDEX_JMP)) (use (label_ref (match_operand 1 "" ""))) (clobber (match_dup 0)) (clobber (const_int 0))] "!AVR_HAVE_EIJMP_EICALL" "#" "&& reload_completed" [(parallel [(set (pc) (unspec:HI [(match_dup 0)] UNSPEC_INDEX_JMP)) (use (label_ref (match_dup 1))) (clobber (match_dup 0)) (clobber (const_int 0)) (clobber (reg:CC REG_CC))])] "" [(set_attr "isa" "rjmp,rjmp,jmp")]) Note the (clobber (match_dup 0)). When building $ avr-gcc -mmcu=avr51 gcc/gcc/testsuite/gcc.c-torture/compile/930120-1.c -O3 -funroll-loops -fdump-rtl-all The web pass transforms this insn from (jump_insn 120 538 124 25 (parallel [ (set (pc) (unspec:HI [ (reg:HI 138) ] UNSPEC_INDEX_JMP)) (use (label_ref 121)) (clobber (reg:HI 138)) (clobber (const_int 0 [0])) ]) "gcc/gcc/testsuite/gcc.c-torture/compile/930120-1.c":55:5 779 {*tablejump_split} (expr_list:REG_DEAD (reg:HI 138) (expr_list:REG_UNUSED (reg:HI 138) (nil))) -> 121) to Web oldreg=138 newreg=279 Updating insn 120 (138->279) (jump_insn 120 538 124 25 (parallel [ (set (pc) (unspec:HI [ (reg:HI 138) ] UNSPEC_INDEX_JMP)) (use (label_ref 121)) (clobber (reg:HI 279)) (clobber (const_int 0 [0])) ]) "gcc/gcc/testsuite/gcc.c-torture/compile/930120-1.c":55:5 779 {*tablejump_split} (expr_list:REG_DEAD (reg:HI 138) (expr_list:REG_UNUSED (reg:HI 138) (nil))) -> 121) Note the reg in the clobber is now 279, and not 138. With classic reload, however, this gets set back to whatever hardreg was assigned to r138. (jump_insn 120 538 121 26 (parallel [ (set (pc) (unspec:HI [ (reg/f:HI 30 r30 [138]) ] UNSPEC_INDEX_JMP)) (use (label_ref 121)) (clobber (reg/f:HI 30 r30 [138])) (clobber (const_int 0 [0])) ]) "gcc/gcc/testsuite/gcc.c-torture/compile/930120-1.c":55:5 779 {*tablejump_split} (nil) -> 121) With LRA, however, the pseudo reg remains unassigned, eventually causing an ICE in cselib_invalidate_regno. (jump_insn 120 538 121 26 (parallel [ (set (pc) (unspec:HI [ (reg/f:HI 30 r30 [138]) ] UNSPEC_INDEX_JMP)) (use (label_ref 121)) (clobber (reg:HI 279)) (clobber (const_int 0 [0])) ]) "gcc/gcc/testsuite/gcc.c-torture/compile/930120-1.c":55:5 779 {*tablejump_split} (nil) -> 121) Is this something that LRA should be able to fix? Making operand 0 read/write in define_insn_and_split prevents the web pass from creating a new pseudo for the clobber, avoiding the whole problem. (define_insn_and_split "*tablejump_split" [(set (pc) (unspec:HI [(match_operand:HI 0 "register_operand" "+!z,*r,z")] UNSPEC_INDEX_JMP)) (use (label_ref (match_operand 1 "" ""))) (clobber (match_dup 0)) (clobber (const_int 0))] "!AVR_HAVE_EIJMP_EICALL" "#" "&& reload_completed" [(parallel [(set (pc) (unspec:HI [(match_dup 0)] UNSPEC_INDEX_JMP)) (use (label_ref (match_dup 1))) (clobber (match_dup 0)) (clobber (const_int 0)) (clobber (reg:CC REG_CC))])] "" [(set_attr "isa" "rjmp,rjmp,jmp")]) However, https://gcc.gnu.org/onlinedocs/gccint/Side-Effects.html has this to say for clobber "There is one other known use for clobbering a pseudo register in a parallel: when one of the input operands of the insn is also clobbered by the insn. In this case, using the same pseudo register in the clobber and elsewhere in the insn produces the expected results." so I'm not sure if that's the right way to fix it. I could add a matching constraint (i.e "=0,0,0"), but until reload runs, the clobber reg would be different from input reg, so that seems wrong to me too. Any ideas? Regards Senthil
LRA for avr: Handling hard regs set directly at expand
Hi, The avr target has a bunch of patterns that directly set hard regs at expand time, like so (define_expand "cpymemhi" [(parallel [(set (match_operand:BLK 0 "memory_operand" "") (match_operand:BLK 1 "memory_operand" "")) (use (match_operand:HI 2 "const_int_operand" "")) (use (match_operand:HI 3 "const_int_operand" ""))])] "" { if (avr_emit_cpymemhi (operands)) DONE; FAIL; }) where avr_emit_cpymemhi generates (insn 14 13 15 4 (set (reg:HI 30 r30) (reg:HI 48 [ ivtmp.10 ])) "pr53505.c":21:22 -1 (nil)) (insn 15 14 16 4 (set (reg:HI 26 r26) (reg/f:HI 38 virtual-stack-vars)) "pr53505.c":21:22 -1 (nil)) (insn 16 15 17 4 (parallel [ (set (mem:BLK (reg:HI 26 r26) [0 A8]) (mem:BLK (reg:HI 30 r30) [0 A8])) (unspec [ (const_int 0 [0]) ] UNSPEC_CPYMEM) (use (reg:QI 52)) (clobber (reg:HI 26 r26)) (clobber (reg:HI 30 r30)) (clobber (reg:QI 0 r0)) (clobber (reg:QI 52)) ]) "pr53505.c":21:22 -1 (nil)) Classic reload knows about these - find_reg masks out bad_spill_regs, and bad_spill_regs when ORed with chain->live_throughout in order_regs_for_reload picks up r30. LRA, however, appears to not consider that, and proceeds to use such regs as reload regs. For the same source, it generates Choosing alt 0 in insn 15: (0) =r (1) r {*movhi_split} Creating newreg=70, assigning class GENERAL_REGS to r70 15: r26:HI=r70:HI REG_EQUAL r28:HI+0x1 Inserting insn reload before: 58: r70:HI=r28:HI+0x1 Choosing alt 3 in insn 58: (0) d (1) 0 (2) nYnn {*addhi3_split} Creating newreg=71 from oldreg=70, assigning class LD_REGS to r71 58: r71:HI=r71:HI+0x1 Inserting insn reload before: 59: r71:HI=r28:HI Inserting insn reload after: 60: r70:HI=r71:HI ** Assignment #1: ** Assigning to 71 (cl=LD_REGS, orig=70, freq=3000, tfirst=71, tfreq=3000)... Assign 30 to reload r71 (freq=3000) Hard reg 26 is preferable by r70 with profit 1000 Hard reg 30 is preferable by r70 with profit 1000 Assigning to 70 (cl=GENERAL_REGS, orig=70, freq=2000, tfirst=70, tfreq=2000)... Assign 30 to reload r70 (freq=2000) (insn 14 13 59 3 (set (reg:HI 30 r30) (reg:HI 18 r18 [orig:48 ivtmp.10 ] [48])) "pr53505.c":21:22 101 {*movhi_split} (nil)) (insn 59 14 58 3 (set (reg:HI 30 r30 [70]) (reg/f:HI 28 r28)) "pr53505.c":21:22 101 {*movhi_split} (nil)) (insn 58 59 15 3 (set (reg:HI 30 r30 [70]) (plus:HI (reg:HI 30 r30 [70]) (const_int 1 [0x1]))) "pr53505.c":21:22 165 {*addhi3_split} (nil)) (insn 15 58 16 3 (set (reg:HI 26 r26) (reg:HI 30 r30 [70])) "pr53505.c":21:22 101 {*movhi_split} (expr_list:REG_EQUAL (plus:HI (reg/f:HI 28 r28) (const_int 1 [0x1])) (nil))) (insn 16 15 17 3 (parallel [ (set (mem:BLK (reg:HI 26 r26) [0 A8]) (mem:BLK (reg:HI 30 r30) [0 A8])) (unspec [ (const_int 0 [0]) ] UNSPEC_CPYMEM) (use (reg:QI 22 r22 [52])) (clobber (reg:HI 26 r26)) (clobber (reg:HI 30 r30)) (clobber (reg:QI 0 r0)) (clobber (reg:QI 22 r22 [52])) ]) "pr53505.c":21:22 132 {cpymem_qi} (nil)) LRA generates insn 59 that clobbers r30 set in insn 14, causing an execution failure down the line. How should the avr backend deal with this? Regards Senthil
Re: LRA for avr: help with FP and elimination
On Fri, 2023-07-14 at 09:29 -0400, Vladimir Makarov wrote: > EXTERNAL EMAIL: Do not click links or open attachments unless you know the > content is safe > > On 7/13/23 05:27, SenthilKumar.Selvaraj--- via Gcc wrote: > > Hi, > > > >I've been spending some (spare) time checking what it would take to > >make LRA work for the avr target. > > > >Right after I removed the TARGET_LRA_P hook disabling LRA, building > >libgcc failed with a weird ICE. > > On the avr, the stack pointer (SP) > >is not used to access stack slots > It is very uncommon target then. > > - TARGET_CAN_ELIMINATE returns false > >if frame_pointer_needed, and TARGET_FRAME_POINTER_REQUIRED returns true > >if get_frame_size() > 0. > > > >With LRA, however, reload generates > > > > (insn 159 239 240 7 (set (mem/c:QI (plus:HI (reg/f:HI 32 __SP_L__) > > (const_int 1 [0x1])) [2 %sfp+1 S1 A8]) > > (reg:QI 24 r24 [orig:86 a ] [86])) "case.c":7:7 86 > > {movqi_insn_split} > > (nil)) > > > >and the backend code errors out when it finds SP is being used as a > >pointer register. > > > >Digging through the RTL dumps, I found the following. For the > >following insn sequence in *.ira > > > > (insn 189 128 159 7 (set (reg:HI 58 [ b ]) > > (const_int 0 [0])) "case.c":7:7 101 {*movhi_split} > > (nil)) > > (insn 159 189 160 7 (set (subreg:QI (reg:HI 58 [ b ]) 0) > > (reg:QI 86 [ a ])) "case.c":7:7 86 {movqi_insn_split} > > (nil)) > > (insn 160 159 32 7 (set (subreg:QI (reg:HI 58 [ b ]) 1) > > (reg:QI 87 [ a+1 ])) "case.c":7:7 86 {movqi_insn_split} > > (nil)) > > > >1. For r58, IRA picks R28:R29, which is the frame pointer for avr. > > > >Popping a13(r58,l0) -- assign reg 28 > > > >2. LRA sees the subreg in insn 159 and generates a reload reg > >(r125). simplify_subreg_regno (lra-constraints.cc:1810) however > >bails (returns -1) if the reg involved is FRAME_POINTER_REGNUM and > >reload isn't completed yet. LRA therefore decides rclass for the > >pseudo reg is NO_REGS. > > > > > > Creating newreg=125 from oldreg=58, assigning class NO_REGS to subreg reg > > r125 > >159: r125:HI#0=r86:QI > > > >4. As rclass is NO_REGS, LRA picks an insn alternative that involves > > memory. > >That is my understanding, please correct me if I'm wrong. > > > > 0 Small class reload: reject+=3 > > 0 Non input pseudo reload: reject++ > > Cycle danger: overall += LRA_MAX_REJECT > >alt=0,overall=610,losers=1,rld_nregs=1 > > 0 Small class reload: reject+=3 > > 0 Non input pseudo reload: reject++ > > alt=1: Bad operand -- refuse > > 0 Non pseudo reload: reject++ > >alt=2,overall=1,losers=0,rld_nregs=0 > >Choosing alt 2 in insn 159: (0) Qm (1) rY00 {movqi_insn_split} > > > >5. LRA creates stack slots, and then uses the FP register to access > >the slots. This is despite r58 already being assigned R28:R29. > > > >6. TARGET_FRAME_POINTER_REQUIRED is never called, and therefore > > frame_pointer_needed is not set, despite the creation of stack > > slots. TARGET_CAN_ELIMINATE therefore okays elimination of FP to SP, > > and this eventually causes the ICE when the avr backend sees SP being > > used as a pointer register. > > > >This is the relevant sequence after reload > > > > (insn 189 128 239 7 (set (reg:HI 28 r28 [orig:58 b ] [58]) > > (const_int 0 [0])) "case.c":7:7 101 {*movhi_split} > > (nil)) > > (insn 239 189 159 7 (set (mem/c:HI (plus:HI (reg/f:HI 32 __SP_L__) > > (const_int 1 [0x1])) [2 %sfp+1 S2 A8]) > > (reg:HI 28 r28 [orig:58 b ] [58])) "case.c":7:7 101 {*movhi_split} > > (nil)) > > (insn 159 239 240 7 (set (mem/c:QI (plus:HI (reg/f:HI 32 __SP_L__) > > (const_int 1 [0x1])) [2 %sfp+1 S1 A8]) > > (reg:QI 24 r24 [orig:86 a ] [86])) "case.c":7:7 86 > > {movqi_insn_split} > > (nil)) > > (insn 240 159 241 7 (set (reg:HI 28 r28 [orig:58 b ] [58]) > > (mem/c:HI (plus:HI (reg/f:HI 32 __SP_L__) > > (const_int 1 [0x1])) [2 %sfp+1 S2 A8])) "case.c":7:7 101 >
LRA for avr: help with FP and elimination
Hi, I've been spending some (spare) time checking what it would take to make LRA work for the avr target. Right after I removed the TARGET_LRA_P hook disabling LRA, building libgcc failed with a weird ICE. On the avr, the stack pointer (SP) is not used to access stack slots - TARGET_CAN_ELIMINATE returns false if frame_pointer_needed, and TARGET_FRAME_POINTER_REQUIRED returns true if get_frame_size() > 0. With LRA, however, reload generates (insn 159 239 240 7 (set (mem/c:QI (plus:HI (reg/f:HI 32 __SP_L__) (const_int 1 [0x1])) [2 %sfp+1 S1 A8]) (reg:QI 24 r24 [orig:86 a ] [86])) "case.c":7:7 86 {movqi_insn_split} (nil)) and the backend code errors out when it finds SP is being used as a pointer register. Digging through the RTL dumps, I found the following. For the following insn sequence in *.ira (insn 189 128 159 7 (set (reg:HI 58 [ b ]) (const_int 0 [0])) "case.c":7:7 101 {*movhi_split} (nil)) (insn 159 189 160 7 (set (subreg:QI (reg:HI 58 [ b ]) 0) (reg:QI 86 [ a ])) "case.c":7:7 86 {movqi_insn_split} (nil)) (insn 160 159 32 7 (set (subreg:QI (reg:HI 58 [ b ]) 1) (reg:QI 87 [ a+1 ])) "case.c":7:7 86 {movqi_insn_split} (nil)) 1. For r58, IRA picks R28:R29, which is the frame pointer for avr. Popping a13(r58,l0) -- assign reg 28 2. LRA sees the subreg in insn 159 and generates a reload reg (r125). simplify_subreg_regno (lra-constraints.cc:1810) however bails (returns -1) if the reg involved is FRAME_POINTER_REGNUM and reload isn't completed yet. LRA therefore decides rclass for the pseudo reg is NO_REGS. Creating newreg=125 from oldreg=58, assigning class NO_REGS to subreg reg r125 159: r125:HI#0=r86:QI 4. As rclass is NO_REGS, LRA picks an insn alternative that involves memory. That is my understanding, please correct me if I'm wrong. 0 Small class reload: reject+=3 0 Non input pseudo reload: reject++ Cycle danger: overall += LRA_MAX_REJECT alt=0,overall=610,losers=1,rld_nregs=1 0 Small class reload: reject+=3 0 Non input pseudo reload: reject++ alt=1: Bad operand -- refuse 0 Non pseudo reload: reject++ alt=2,overall=1,losers=0,rld_nregs=0 Choosing alt 2 in insn 159: (0) Qm (1) rY00 {movqi_insn_split} 5. LRA creates stack slots, and then uses the FP register to access the slots. This is despite r58 already being assigned R28:R29. 6. TARGET_FRAME_POINTER_REQUIRED is never called, and therefore frame_pointer_needed is not set, despite the creation of stack slots. TARGET_CAN_ELIMINATE therefore okays elimination of FP to SP, and this eventually causes the ICE when the avr backend sees SP being used as a pointer register. This is the relevant sequence after reload (insn 189 128 239 7 (set (reg:HI 28 r28 [orig:58 b ] [58]) (const_int 0 [0])) "case.c":7:7 101 {*movhi_split} (nil)) (insn 239 189 159 7 (set (mem/c:HI (plus:HI (reg/f:HI 32 __SP_L__) (const_int 1 [0x1])) [2 %sfp+1 S2 A8]) (reg:HI 28 r28 [orig:58 b ] [58])) "case.c":7:7 101 {*movhi_split} (nil)) (insn 159 239 240 7 (set (mem/c:QI (plus:HI (reg/f:HI 32 __SP_L__) (const_int 1 [0x1])) [2 %sfp+1 S1 A8]) (reg:QI 24 r24 [orig:86 a ] [86])) "case.c":7:7 86 {movqi_insn_split} (nil)) (insn 240 159 241 7 (set (reg:HI 28 r28 [orig:58 b ] [58]) (mem/c:HI (plus:HI (reg/f:HI 32 __SP_L__) (const_int 1 [0x1])) [2 %sfp+1 S2 A8])) "case.c":7:7 101 {*movhi_split} (nil)) (insn 241 240 160 7 (set (mem/c:HI (plus:HI (reg/f:HI 32 __SP_L__) (const_int 1 [0x1])) [2 %sfp+1 S2 A8]) (reg:HI 28 r28 [orig:58 b ] [58])) "case.c":7:7 101 {*movhi_split} (nil)) (insn 160 241 242 7 (set (mem/c:QI (plus:HI (reg/f:HI 32 __SP_L__) (const_int 2 [0x2])) [2 %sfp+2 S1 A8]) (reg:QI 18 r18 [orig:87 a+1 ] [87])) "case.c":7:7 86 {movqi_insn_split} (nil)) (insn 242 160 33 7 (set (reg:HI 28 r28 [orig:58 b ] [58]) (mem/c:HI (plus:HI (reg/f:HI 32 __SP_L__) (const_int 1 [0x1])) [2 %sfp+1 S2 A8])) "case.c":7:7 101 {*movhi_split} (nil)) For choices other than FP, simplify_subreg_regno returns the correct part of the wider HImode reg, so rclass is not NO_REGS, and things workout fine. I checked what classic reload does in the same situation - it picks a different register (R25) instead of spilling to a stack slot. (insn 189 128 159 7 (set (reg:HI 28 r28 [orig:58 b ] [58]) (const_int 0 [0])) "case.c":7:7 101 {*movhi_split} (nil)) (insn 159 189 226 7 (set (reg:QI 25 r25) (reg:QI 24 r24 [orig:86 a ] [86])) "case.c":7:7 86 {movqi_insn_split} (nil)) (insn 226 159 160 7 (set (reg:QI 28 r28) (reg:QI 25 r25)) "case.c":7:7 86 {movqi_insn_split} (nil)) (insn 160 226 227 7 (set (reg:QI 25 r25)