Re: LRA for avr: Maintain live range info for pseudos assigned to FP?

2023-11-19 Thread SenthilKumar.Selvaraj--- via Gcc
On Thu, 2023-10-05 at 15:33 -0400, Vladimir Makarov wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the 
> content is safe
> 
> On 9/7/23 07:21, senthilkumar.selva...@microchip.com wrote:
> > Hi,
> > 
> >One more execution failure for the avr target, this time from
> >gcc.c-torture/execute/bitfld-3.c.
> > 
> >Steps to reproduce
> > 
> >Enable LRA in avr.cc by removing TARGET_LRA_P hook, build with
> > 
> > $  make all-host && make install-host
> > 
> >and then
> > 
> > $ avr-gcc gcc/testsuite/gcc.c-torture/execute/bitfld-3.c -S -Os -mmcu=avr51 
> > -fdump-rtl-all
> > 
> >When lra_update_fp2sp_elimination runs and pseudos assigned to the
> >FP have to be spilled to stack slots, they sometimes end up in a
> >slot that already has a reg with an overlapping live range.  This is
> >because lra_reg_info[regno].live_ranges is NULL for such spilled
> >pseudos, and therefore when assign_stack_slot_num_and_sort_pseduos
> >checks if lra_intersected_live_ranges_p, it always returns false.
> > 
> >In the below reload dump, all the pseudos assigned to FP get
> >allocated to slot 0. The live ranges for some of them (r1241 for
> >e.g.) conflicts with r603 that was originally assigned to slot 0,
> >but they still end up in the same slot, causing the execution failure.
> > 
> Sorry for the delay with the answer, Senthil.  Avr is unusual target and
> needs some changes in LRA but the changes improves LRA portability.  So
> thank you for your work on porting LRA to AVR.
> 
> The patch is ok for me.  The only comment is that making calculation of
> the set only once would be nice. Live range calculation in LRA can take
> a lot of time, code of update_pseudo_point is hot and the worst the set
> will be really used rarely but it is calculated every time.
> 
> You can commit the current patch and I'll do it by myself or, if you
> want, you can modify the patch by yourself and submit it for review and
> I'll review as soon as possible.  Either way works for me.

Apologies for the extreme delay in responding - had to sort out some medical 
issues.

Is it ok if I commit the patch now? I have one more patch in ira.cc, after
which I'm hoping the regression results would be good enough to switch the 
avr target to LRA.

Regards
Senthil

> 
> > diff --git a/gcc/lra-lives.cc b/gcc/lra-lives.cc
> > index f60e564da82..e4289f13979 100644
> > --- a/gcc/lra-lives.cc
> > +++ b/gcc/lra-lives.cc
> > @@ -250,7 +250,17 @@ update_pseudo_point (int regno, int point, enum 
> > point_type type)
> > if (HARD_REGISTER_NUM_P (regno))
> >   return;
> > 
> > -  if (complete_info_p || lra_get_regno_hard_regno (regno) < 0)
> > +  /* Pseudos assigned to the FP register could potentially get spilled
> > + to stack slots when lra_update_fp2sp_elimination runs, so keep
> > + their live range info up to date, even if they aren't in memory
> > + right now. */
> > +  int hard_regno = lra_get_regno_hard_regno (regno);
> > +  HARD_REG_SET set;
> > +  CLEAR_HARD_REG_SET(set);
> > +  add_to_hard_reg_set (&set, Pmode, HARD_FRAME_POINTER_REGNUM);
> > +
> > +  if (complete_info_p || hard_regno < 0
> > + || overlaps_hard_reg_set_p (set, PSEUDO_REGNO_MODE (regno), 
> > hard_regno))
> >   {
> > if (type == DEF_POINT)
> >  {
> > 
> > 



LRA for avr: Maintain live range info for pseudos assigned to FP?

2023-09-07 Thread SenthilKumar.Selvaraj--- via Gcc
Hi,

  One more execution failure for the avr target, this time from
  gcc.c-torture/execute/bitfld-3.c.

  Steps to reproduce

  Enable LRA in avr.cc by removing TARGET_LRA_P hook, build with

$  make all-host && make install-host

  and then

$ avr-gcc gcc/testsuite/gcc.c-torture/execute/bitfld-3.c -S -Os -mmcu=avr51 
-fdump-rtl-all

  When lra_update_fp2sp_elimination runs and pseudos assigned to the
  FP have to be spilled to stack slots, they sometimes end up in a
  slot that already has a reg with an overlapping live range.  This is
  because lra_reg_info[regno].live_ranges is NULL for such spilled
  pseudos, and therefore when assign_stack_slot_num_and_sort_pseduos
  checks if lra_intersected_live_ranges_p, it always returns false.

  In the below reload dump, all the pseudos assigned to FP get
  allocated to slot 0. The live ranges for some of them (r1241 for
  e.g.) conflicts with r603 that was originally assigned to slot 0,
  but they still end up in the same slot, causing the execution failure.


 r472: [86..87]
 r473: [0..85]
 ...
 r603: [254..655]
 ...
 r1241: [282..283]
 r1242: [268..269]
 r1243: [254..255]
 r1244: [238..239]
 r1245: [222..223]
 r1314: [88..89]

Ranges after the compression:
 r591: [0..1]
 r603: [0..1]
 r604: [0..1]
 r605: [0..1]
 r606: [0..1]
 r607: [0..1]
 r623: [0..1]
 r624: [0..1]
 r635: [0..1]
 r636: [0..1]
 r637: [0..1]
 r638: [0..1]
 r639: [0..1]
 r668: [0..1]
 r669: [0..1]
 r670: [0..1]
 r671: [0..1]
 r672: [0..1]
   Frame pointer can not be eliminated anymore
   Spilling non-eliminable hard regs: 28 29
 Spilling r472(28)
 Spilling r473(28)
 Spilling r589(29)
 Spilling r590(28)
 Spilling r704(29)
 Spilling r1241(28)
 Spilling r1242(28)
 Spilling r1243(28)
 Spilling r1244(28)
 Spilling r1245(28)
 Spilling r1314(28)
  Slot 0 regnos (width = 0): 603 13141245124412431242   
 1241704 590 589 473 472

  Live ranges for those pseudos is NULL because when
  lra_create_live_ranges_1 ran with all_p = false, they were not in
  memory (they were assigned to FP). Computing live range info for
  pseudos assigned to FP fixes the problem for avr,

  Is that the right fix for this problem? After applying the below
  patch, the reload dump looks like this


Ranges after the compression:
 r472: [2..3]
 r473: [0..1]
 r589: [16..17]
 r590: [16..17]
 r591: [16..17]
 r603: [10..17]
 r604: [10..17]
 r605: [10..17]
 r606: [10..17]
 r607: [10..17]
 r623: [6..17]
 r624: [6..17]
 r635: [8..17]
 r636: [8..17]
 r637: [8..17]
 r638: [8..17]
 r639: [8..17]
 r668: [6..17]
 r669: [6..17]
 r670: [6..17]
 r671: [6..17]
 r672: [6..17]
 r704: [0..1]
 r1241: [14..15]
 r1242: [12..13]
 r1243: [10..11]
 r1244: [8..9]
 r1245: [6..7]
 r1314: [4..5]
   Frame pointer can not be eliminated anymore
   Spilling non-eliminable hard regs: 28 29
 Spilling r472(28)
 Spilling r473(28)
 Spilling r589(29)
 Spilling r590(28)
 Spilling r704(29)
 Spilling r1241(28)
 Spilling r1242(28)
 Spilling r1243(28)
 Spilling r1244(28)
 Spilling r1245(28)
 Spilling r1314(28)
  Slot 0 regnos (width = 0): 603 131412451244473 472
  Slot 1 regnos (width = 0): 604 704
  ...
  Slot 17 regnos (width = 0):591 124312421241
  Slot 18 regnos (width = 0):589
  Slot 19 regnos (width = 0):590

Regards
Senthil

diff --git a/gcc/lra-lives.cc b/gcc/lra-lives.cc
index f60e564da82..e4289f13979 100644
--- a/gcc/lra-lives.cc
+++ b/gcc/lra-lives.cc
@@ -250,7 +250,17 @@ update_pseudo_point (int regno, int point, enum point_type 
type)
   if (HARD_REGISTER_NUM_P (regno))
 return;
 
-  if (complete_info_p || lra_get_regno_hard_regno (regno) < 0)
+  /* Pseudos assigned to the FP register could potentially get spilled
+ to stack slots when lra_update_fp2sp_elimination runs, so keep
+ their live range info up to date, even if they aren't in memory
+ right now. */
+  int hard_regno = lra_get_regno_hard_regno (regno);
+  HARD_REG_SET set;
+  CLEAR_HARD_REG_SET(set);
+  add_to_hard_reg_set (&set, Pmode, HARD_FRAME_POINTER_REGNUM);
+
+  if (complete_info_p || hard_regno < 0
+ || overlaps_hard_reg_set_p (set, PSEUDO_REGNO_MODE (regno), hard_regno))
 {
   if (type == DEF_POINT)
{




LRA for avr: Clear mem rtx when resizing stack slots?

2023-09-01 Thread SenthilKumar.Selvaraj--- via Gcc
Hi,

  The below (slightly) reduced test-case from gcc.c-torture/execute/simd-2.c
  produces an execution failure for avr.

typedef short __attribute__((vector_size (16))) vecint;

vecint i = { 150, 100, 150, 200, 0, 0, 0, 0 };
vecint j = { 10, 13, 20, 30, 1, 1, 1, 1 };
vecint k;

union {
  vecint v;
  short i[8];
} res;

void
verify (int a1, int a2, int a3, int a4,
int b1, int b2, int b3, int b4)
{
  if (a1 != b1
  || a2 != b2
  || a3 != b3
  || a4 != b4)
abort ();
}

int
main ()
{
  k = i + j;
  res.v = k;

  verify (res.i[0], res.i[1], res.i[2], res.i[3], 160, 113, 170, 230);

  k = i * j;
  res.v = k;

  verify (res.i[0], res.i[1], res.i[2], res.i[3], 1500, 1300, 3000, 6000);

  k = i / j;
  res.v = k;

  verify (res.i[0], res.i[1], res.i[2], res.i[3], 15, 7, 7, 6);

  k = i & j;
  res.v = k;

  verify (res.i[0], res.i[1], res.i[2], res.i[3], 2, 4, 20, 8);

  k = i | j;
  res.v = k;

  verify (res.i[0], res.i[1], res.i[2], res.i[3], 158, 109, 150, 222);

  k = i ^ j;
  res.v = k;

  verify (res.i[0], res.i[1], res.i[2], res.i[3], 156, 105, 130, 214);

  k = -i;
  res.v = k;
  verify (res.i[0], res.i[1], res.i[2], res.i[3],
  -150, -100, -150, -200);

  exit (0);
}

  To reproduce, enable LRA in avr.c by removing TARGET_LRA_P hook, build with
  make all-host && make install-host, and compile

$ avr-gcc -mmcu=avr51 
~/code/personal/gcc/gcc/testsuite/gcc.c-torture/execute/simd-2.c -Os  -S -o 
out.s -fdump-rtl-all-raw

  The pass before dse2 has the below instructions to store the lowest two bytes
  of i (i.e 150) at stack offsets 23 and 21.

(insn 2403 2402 2404 2 (parallel [
(set (reg:QI 24 r24 [orig:102 _131 ] [102])
(mem/c:QI (const:HI (plus:HI (symbol_ref:HI ("i") [flags 0x2]  
)
(const_int 1 [0x1]))) [1 i+1 S1 A8]))
(clobber (reg:CC 36 cc))
]) 89 {movqi_insn}
 (nil))
(insn 2404 2403 2405 2 (parallel [
(set (mem/c:QI (plus:HI (reg/f:HI 28 r28)
(const_int 23 [0x17])) [4 %sfp+23 S1 A8])
(reg:QI 24 r24 [orig:102 _131 ] [102]))
(clobber (reg:CC 36 cc))
]) 89 {movqi_insn}
 (nil))
(insn 2405 2404 2406 2 (parallel [
(set (reg:QI 24 r24 [orig:99 _128 ] [99])
(mem/c:QI (symbol_ref:HI ("i") [flags 0x2]  ) [1 i+0 S1 A128]))
(clobber (reg:CC 36 cc))
]) 89 {movqi_insn}
 (nil))
(insn 2406 2405 2407 2 (parallel [
(set (mem/c:QI (plus:HI (reg/f:HI 28 r28)
(const_int 21 [0x15])) [4 %sfp+21 S1 A8])
(reg:QI 24 r24 [orig:99 _128 ] [99]))
(clobber (reg:CC 36 cc))
]) 89 {movqi_insn}
 (nil))

  dse2 deletes these insns, as it sees the below clobber before
  the values in the stack offsets are used.

(insn 448 2083 2084 15 (clobber (mem/c:TI (plus:HI (reg/f:HI 28 r28)
(const_int 9 [0x9])) [4 %sfp+9 S16 A8])) -1
 (nil))
  
   DSE figures everything from stack offset 9 to 9 + 16 = 25 
   (16 is GET_MODE_SIZE (TImode)) is clobbered, so it deletes the dead stores.

   The reason for the dead stores is overlapping stack offsets for two spilled 
pseudos.

   IRA has

(insn 34 32 36 2 (set (reg:QI 102 [ _131 ])
(mem/c:QI (const:HI (plus:HI (symbol_ref:HI ("i") [flags 0x2]  
)
(const_int 1 [0x1]))) [1 i+1 S1 A8])) 86 {movqi_insn_split}
 (nil))
(insn 36 34 38 2 (set (reg:QI 99 [ _128 ])
(mem/c:QI (symbol_ref:HI ("i") [flags 0x2]  ) [1 i+0 S1 A128])) 86 {movqi_insn_split}
 (nil))

(insn 448 430 432 15 (clobber (reg:TI 147 [ _176 ])) -1
 (nil))

   and LRA spills 102 and 99 to distinct stack slots, like so

Slot 10 regnos (width = 0):  99
Slot 12 regnos (width = 0):  102

   However, when spilling 147,

Breakpoint 1, add_pseudo_to_slot (regno=147, slot_num=4) at 
/home/i41766/code/personal/gcc/gcc/lra-spills.cc:331
331  
lra_reg_info[regno].biggest_mode);
(gdb) p slots[slot_num]
$1 = {regno = 51, hard_regno = -1, align = 8, size = {> = 
{coeffs = {2}}, }, 
  mem = 0x7fffea5eb1c8, live_ranges = 0x33b29c0}
(gdb) p debug_rtx(slots[slot_num].mem)
(mem/c:BLK (plus:HI (reg/f:HI 28 r28)
(const_int 9 [0x9])) [0  A8])

  it reuses slot 4, originally 2 bytes wide and with stack offset 9. Live 
ranges for 
  r51 and r147 do not conflict, so this ok, and it resizes the stack slot to be 
as
  wide as TImode. However, it does not reset slots[slot_num].mem, and 
assign_mem_slot
  when called for r147 sets the same stack offset rtx as pseudo_slots[147].mem.

  This appears wrong to me, as the bigger slot may conflict with stack offsets
  assigned to other slots. The below patch resets slots[slot_num].mem if the
  resized slot is bigger than before. assign_mem_slot later gets a new stack 
offset
  from function.cc:assign_stack_local and sets it as the slot's MEM rtx.

diff --git a/gcc/lra-spills.cc b/gcc/lra-spills.c

Re: LRA for avr: help with FP and elimination

2023-08-10 Thread SenthilKumar.Selvaraj--- via Gcc
On Tue, 2023-07-18 at 11:04 -0400, Vladimir Makarov wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the 
> content is safe
> 
> On 7/17/23 03:17, senthilkumar.selva...@microchip.com wrote:
> > On Fri, 2023-07-14 at 09:29 -0400, Vladimir Makarov wrote:
> > > If you send me the preprocessed test, I could start to work on it to fix
> > > the problems.  I think it is hard to fix them right for a person having
> > > a little experience with LRA.
> > > 
> > > 
> > Ok, this is a reduced test case that reproduces the failure.
> > 
> > $ cat case.c
> > typedef int HItype __attribute__ ((mode (HI)));
> > HItype
> > __mulvhi3 (HItype a, HItype b)
> > {
> >HItype w;
> > 
> >if (__builtin_mul_overflow (a, b, &w))
> >  __builtin_trap ();
> > 
> >return w;
> > }
> > 
> > On latest master, this trivial patch turns on LRA for avr
> > --- gcc/config/avr/avr.cc
> > +++ gcc/config/avr/avr.cc
> > @@ -15244,9 +15244,6 @@ avr_float_lib_compare_returns_bool (machine_mode 
> > mode, enum rtx_code)
> >   #undef  TARGET_CONVERT_TO_TYPE
> >   #define TARGET_CONVERT_TO_TYPE avr_convert_to_type
> > 
> > -#undef TARGET_LRA_P
> > -#define TARGET_LRA_P hook_bool_void_false
> > -
> >   #undef  TARGET_ADDR_SPACE_SUBSET_P
> >   #define TARGET_ADDR_SPACE_SUBSET_P avr_addr_space_subset_p
> > 
> > Then configuring and building for avr without attempting to build libgcc
> > 
> > $ configure --target=avr --prefix= --enable-languages=c && make 
> > all-host && make install-host
> > 
> > And finally to reproduce the failure
> > $ /bin/avr-gcc -mmcu=avr25 case.c -Os
> 
> Thank you.  I've reproduced the bug and started to work on it
> yesterday.  The problem is a bit tricky than I initially thought but I
> believe I'll fix it on this week.
> 
> 
Hi Vlad,

  I can confirm your commit 
(https://gcc.gnu.org/git?p=gcc.git;a=commit;h=2971ff7b1d564ac04b537d907c70e6093af70832)
  fixes the above problem, thank you. However, I see execution failures if a 
  pseudo assigned to FP has to be spilled because of stack slot creation.

  To reproduce, build the compiler just like above, and then do

$ avr-gcc -mmcu=avr51 
/gcc/testsuite/gcc.c-torture/execute/20050224-1.c -O2 -S 
-fdump-rtl-all

  The execution failure occurs at this point

movw r24,r2
sbiw r24,36
brne .L8

   r2 is never set anywhere at all in the assembly.

The relevant insns (in the IRA dump) are

(insn 3 15 4 3 (set (reg/v:HI 51 [ j ])
(const_int 0 [0])) 
"gcc/gcc/testsuite/gcc.c-torture/execute/20050224-1.c":19:21 101 {*movhi_split}
 (expr_list:REG_EQUAL (const_int 0 [0])
(nil)))
...
(insn 28 27 67 8 (parallel [
(set (reg/v:HI 51 [ j ])
(plus:HI (reg/v:HI 51 [ j ])
(const_int 1 [0x1])))
(clobber (scratch:QI))
]) 
"/home/i41766/code/personal/gcc/gcc/testsuite/gcc.c-torture/execute/20050224-1.c":28:8
 175 {addhi3_clobber}
 (nil))
...
(jump_insn 44 43 45 13 (parallel [
(set (pc)
(if_then_else (ne (reg/v:HI 51 [ j ])
(const_int 36 [0x24]))
(label_ref:HI 103)
(pc)))
(clobber (scratch:QI))
]) 
"/home/i41766/code/personal/gcc/gcc/testsuite/gcc.c-torture/execute/20050224-1.c":11:16
 discrim 1 713 {cbranchhi4_insn}
 (expr_list:REG_DEAD (reg/v:HI 51 [ j ])
(int_list:REG_BR_PROB 7 (nil)))
 -> 103)

  LRA deletes insns 3 and 28, and uses r2 in the jump_insn.

  In the reload dump, for pseudo r51, I'm seeing this
subreg regs:
   Frame pointer can not be eliminated anymore
   Spilling non-eliminable hard regs: 28 29
 Spilling r51(28)
  Slot 0 regnos (width = 0): 46
  Slot 1 regnos (width = 0): 45

  lra_update_fp2sp_elimination calls spill_pseudos with
  HARD_FRAME_POINTER_REGNUM, and that sets reg_renumber[51] to -1.

  Later down the line, process_bb_lives is called with dead_insn_p=true from
  lra_create_lives_ranges_1 on the relevant BB (#8), and df_get_live_out
  on that BB does not contain 51 (even though previous calls to the same BB 
did).
  
Breakpoint 8, process_bb_lives (bb=0x7fffea570240, curr_point=@0x7fffd838: 
25, dead_insn_p=true) at gcc/gcc/lra-lives.cc:664
664   function_abi last_call_abi = default_function_abi;
(gdb) n
666   reg_live_out = df_get_live_out (bb);
(gdb) 
667   sparseset_clear (pseudos_live);
(gdb) p debug_bitmap(reg_live_out)

first = 0x321c128 current = 0x321c128 indx = 0
0x321c128 next = (nil) prev = (nil) indx = 0
bits = { 28 32 34 43 44 47 48 49 50 }

  process_bb_lives then considers the insn setting 51 (and
  the reload insns LRA created) as dead, and removes them.

BB 8
   Insn 67: point = 31, n_alt = -1
   Insn 114: point = 31, n_alt = 3
   Deleting dead insn 114
deleting insn with uid = 114.
   Insn 28: point = 31, n_alt = 1
   Deleting dead insn 28
deleting insn with uid = 28.
   Insn 113: point = 31, n_al

LRA for avr: Arithmetic on stack pointer

2023-08-09 Thread SenthilKumar.Selvaraj--- via Gcc
Hi,

  After turning on FP -> SP elimination after Vlad fixed
  an elimination issue in 
https://gcc.gnu.org/git?p=gcc.git;a=commit;h=2971ff7b1d564ac04b537d907c70e6093af70832,
  I'm now running into reload failure if arithmetic is done on SP.

  For a call to a vararg functions, the avr target pushes args into the stack,
  calls the function, and then adjusts the SP back to where it was before the
  arg pushing occurred.

  So for code like

extern int foo(int, ...);
int bar(void) {
  long double l = 1.2345E6;
  foo(0, l);
  return 0;
}

  and 

$ avr-gcc -mmcu=avr51 -Os ../20031208-1.c
  
  Reload sees


(insn 5 2 6 2 (set (reg:QI 44)
(const_int 65 [0x41])) "../20031208-1.c":4:3 86 {movqi_insn_split}
 (expr_list:REG_EQUIV (const_int 65 [0x41])
(nil)))
(insn 6 5 7 2 (set (mem:QI (post_dec:HI (reg/f:HI 32 __SP_L__)) [0  S1 A8])
(reg:QI 44)) "../20031208-1.c":4:3 1 {pushqi1}
 (expr_list:REG_DEAD (reg:QI 44)
(expr_list:REG_ARGS_SIZE (const_int 1 [0x1])
(nil
...

...
(call_insn 19 18 21 2 (parallel [
(set (reg:HI 24 r24)
(call (mem:HI (symbol_ref:HI ("foo") [flags 0x41]  
) [0 foo S2 A8])
(const_int 10 [0xa])))
(use (const_int 0 [0]))
]) "../20031208-1.c":4:3 776 {call_value_insn}
 (expr_list:REG_UNUSED (reg:HI 24 r24)
(expr_list:REG_CALL_DECL (symbol_ref:HI ("foo") [flags 0x41]  
)
(nil)))
(nil))
(insn 21 19 25 2 (set (reg/f:HI 32 __SP_L__)
(plus:HI (reg/f:HI 32 __SP_L__)
(const_int 10 [0xa]))) "../20031208-1.c":5:10 discrim 1 165 
{*addhi3_split}
 (expr_list:REG_UNUSED (reg:QI 33 __SP_H__)
(expr_list:REG_ARGS_SIZE (const_int 0 [0])
(nil

  LRA doesn't pick any of the 4 alternatives for insn 21

Considering alt=0 of insn 21:   (0) =??r  (1) %0  (2) r
Staticly defined alt reject+=12
 Considering alt=1 of insn 21:   (0) d  (1) 0  (2) s
 Considering alt=2 of insn 21:   (0) !w  (1) 0  (2) IJYIJ
Staticly defined alt reject+=600
 Considering alt=3 of insn 21:   (0) d  (1) 0  (2) nYnn
 Considering alt=0 of insn 21:   (0) =??r  (1) %0  (2) r
Staticly defined alt reject+=12
 Considering alt=1 of insn 21:   (0) d  (1) 0  (2) s
 Considering alt=2 of insn 21:   (0) !w  (1) 0  (2) IJYIJ
Staticly defined alt reject+=600
 Considering alt=3 of insn 21:   (0) d  (1) 0  (2) nYnn

  whereas classic reload does.

Reloads for insn # 21
Reload 0: reload_in (HI) = (reg/f:HI 32 __SP_L__)
reload_out (HI) = (reg/f:HI 32 __SP_L__)
LD_REGS, RELOAD_OTHER (opnum = 0)
reload_in_reg: (reg/f:HI 32 __SP_L__)
reload_out_reg: (reg/f:HI 32 __SP_L__)
reload_reg_rtx: (reg:HI 24 r24)


  Digging through the code, I found lra-constraints.c:2687, which forbids
  output reload of SP, with a comment that says

  /* Never do output reload of stack pointer.  It makes
 impossible to do elimination when SP is changed in
 RTL.  */

   Just to check if that is indeed the problem, I commented out the check
   and goto fail; that follows that comment, and reload/code-gen worked fine.

Considering alt=0 of insn 21:   (0) =??r  (1) %0  (2) r
Staticly defined alt reject+=12
0 Non-pseudo reload: reject+=2
0 Small class reload: reject+=3
0 Non input pseudo reload: reject++
overall=24,losers=1 -- refuse
 Considering alt=1 of insn 21:   (0) d  (1) 0  (2) s
0 Non-pseudo reload: reject+=2
0 Small class reload: reject+=3
0 Non input pseudo reload: reject++
1 Matching alt: reject+=2
1 Non-pseudo reload: reject+=2
1 Small class reload: reject+=3
1 Non input pseudo reload: reject++
overall=26,losers=2 -- refuse
 Considering alt=2 of insn 21:   (0) !w  (1) 0  (2) IJYIJ
Staticly defined alt reject+=600
0 Non-pseudo reload: reject+=2
0 Non input pseudo reload: reject++
overall=609,losers=1 -- refuse
 Considering alt=3 of insn 21:   (0) d  (1) 0  (2) nYnn
0 Non-pseudo reload: reject+=2
0 Small class reload: reject+=3
0 Non input pseudo reload: reject++
1 Matching alt: reject+=2
1 Non-pseudo reload: reject+=2
1 Small class reload: reject+=3
1 Non input pseudo reload: reject++
overall=26,losers=2 -- refuse
  Choosing alt 3 in insn 21:  (0) d  (1) 0  (2) nYnn {*addhi3_split} 
(sp_off=-10)
  Creating newreg=50 from oldreg=32, assigning class LD_REGS to r50
   21: r50:HI=r50:HI+0xa
  REG_UNUSED __SP_H__:QI
  REG_ARGS_SIZE 0
Inserting insn reload before:
   29: r50:HI=__SP_L__:HI
Inserting insn reload after:
   30: __SP_L__:HI=r50:HI

  eventually generating

(

Re: LRA for avr: Handling hard regs set directly at expand

2023-08-02 Thread SenthilKumar.Selvaraj--- via Gcc
On Wed, 2023-08-02 at 12:54 -0400, Vladimir Makarov wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the 
> content is safe
> 
> On 7/17/23 07:33, senthilkumar.selva...@microchip.com wrote:
> > Hi,
> > 
> >The avr target has a bunch of patterns that directly set hard regs at 
> > expand time, like so
> > 
> > (define_expand "cpymemhi"
> >[(parallel [(set (match_operand:BLK 0 "memory_operand" "")
> > (match_operand:BLK 1 "memory_operand" ""))
> >(use (match_operand:HI 2 "const_int_operand" ""))
> >(use (match_operand:HI 3 "const_int_operand" ""))])]
> >""
> >{
> >  if (avr_emit_cpymemhi (operands))
> >DONE;
> > 
> >  FAIL;
> >})
> > 
> > where avr_emit_cpymemhi generates
> > 
> > (insn 14 13 15 4 (set (reg:HI 30 r30)
> >  (reg:HI 48 [ ivtmp.10 ])) "pr53505.c":21:22 -1
> >   (nil))
> > (insn 15 14 16 4 (set (reg:HI 26 r26)
> >  (reg/f:HI 38 virtual-stack-vars)) "pr53505.c":21:22 -1
> >   (nil))
> > (insn 16 15 17 4 (parallel [
> >  (set (mem:BLK (reg:HI 26 r26) [0  A8])
> >  (mem:BLK (reg:HI 30 r30) [0  A8]))
> >  (unspec [
> >  (const_int 0 [0])
> >  ] UNSPEC_CPYMEM)
> >  (use (reg:QI 52))
> >  (clobber (reg:HI 26 r26))
> >  (clobber (reg:HI 30 r30))
> >  (clobber (reg:QI 0 r0))
> >  (clobber (reg:QI 52))
> >  ]) "pr53505.c":21:22 -1
> >   (nil))
> > 
> > Classic reload knows about these - find_reg masks out bad_spill_regs, and 
> > bad_spill_regs
> > when ORed with chain->live_throughout in order_regs_for_reload picks up r30.
> > 
> > LRA, however, appears to not consider that, and proceeds to use such regs 
> > as reload regs.
> > For the same source, it generates
> > 
> >   Choosing alt 0 in insn 15:  (0) =r  (1) r {*movhi_split}
> >Creating newreg=70, assigning class GENERAL_REGS to r70
> > 15: r26:HI=r70:HI
> >REG_EQUAL r28:HI+0x1
> >  Inserting insn reload before:
> > 58: r70:HI=r28:HI+0x1
> > 
> > Choosing alt 3 in insn 58:  (0) d  (1) 0  (2) nYnn {*addhi3_split}
> >Creating newreg=71 from oldreg=70, assigning class LD_REGS to r71
> > 58: r71:HI=r71:HI+0x1
> >  Inserting insn reload before:
> > 59: r71:HI=r28:HI
> >  Inserting insn reload after:
> > 60: r70:HI=r71:HI
> > 
> > ** Assignment #1: **
> > 
> >Assigning to 71 (cl=LD_REGS, orig=70, freq=3000, tfirst=71, 
> > tfreq=3000)...
> >  Assign 30 to reload r71 (freq=3000)
> >   Hard reg 26 is preferable by r70 with profit 1000
> >   Hard reg 30 is preferable by r70 with profit 1000
> >Assigning to 70 (cl=GENERAL_REGS, orig=70, freq=2000, tfirst=70, 
> > tfreq=2000)...
> >  Assign 30 to reload r70 (freq=2000)
> > 
> > 
> > (insn 14 13 59 3 (set (reg:HI 30 r30)
> >  (reg:HI 18 r18 [orig:48 ivtmp.10 ] [48])) "pr53505.c":21:22 101 
> > {*movhi_split}
> >   (nil))
> > (insn 59 14 58 3 (set (reg:HI 30 r30 [70])
> >  (reg/f:HI 28 r28)) "pr53505.c":21:22 101 {*movhi_split}
> >   (nil))
> > (insn 58 59 15 3 (set (reg:HI 30 r30 [70])
> >  (plus:HI (reg:HI 30 r30 [70])
> >  (const_int 1 [0x1]))) "pr53505.c":21:22 165 {*addhi3_split}
> >   (nil))
> > (insn 15 58 16 3 (set (reg:HI 26 r26)
> >  (reg:HI 30 r30 [70])) "pr53505.c":21:22 101 {*movhi_split}
> >   (expr_list:REG_EQUAL (plus:HI (reg/f:HI 28 r28)
> >  (const_int 1 [0x1]))
> >  (nil)))
> > (insn 16 15 17 3 (parallel [
> >  (set (mem:BLK (reg:HI 26 r26) [0  A8])
> >  (mem:BLK (reg:HI 30 r30) [0  A8]))
> >  (unspec [
> >  (const_int 0 [0])
> >  ] UNSPEC_CPYMEM)
> >  (use (reg:QI 22 r22 [52]))
> >  (clobber (reg:HI 26 r26))
> >  (clobber (reg:HI 30 r30))
> >  (clobber (reg:QI 0 r0))
> >  (clobber (reg:QI 22 r22 [52]))
> >  ]) "pr53505.c":21:22 132 {cpymem_qi}
> >   (nil))
> > 
> > LRA generates insn 59 that clobbers r30 set in insn 14, causing an execution
> > failure down the line.
> > 
> > How should the avr backend deal with this?
> > 
> Sorry for the big delay with the answer.  I was on vacation.
> 
> There are probably some ways to fix it by changing patterns as other
> people suggested but I'd like to see the current patterns work for LRA
> as well.
> 
> Could you send me the test case on which I could reproduce the problem
> and work on implementing such functionality.
> 
> 
Thanks for taking your time to look at this.

To reproduce the behavior, apply the below patch on master

diff --git gcc/config/avr/avr.cc gcc/config/avr/avr.cc
index 25f3f4c22e0..a9ab8259339 100644
--- gcc/config/avr/avr.cc
+++ gcc/config/avr/avr.cc
@@ -1574,6 +1574,9 @@ avr_allocate_stack_slots_for_args (void)

Re: LRA for avr: Handling hard regs set directly at expand

2023-07-27 Thread SenthilKumar.Selvaraj--- via Gcc
On Thu, 2023-07-27 at 15:11 +0200, Georg-Johann Lay wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the 
> content is safe
> 
> Am 17.07.23 um 13:33 schrieb SenthilKumar.Selvaraj--- via Gcc:
> > Hi,
> > 
> >The avr target has a bunch of patterns that directly set hard regs at 
> > expand time, like so
> 
> The correct approach would be to use usual predicates together with
> constraints that describe the register instead of hard regs, e.g.
> (match_operand:HI n "register_operand" "R18_2") for a 2-byte register
> that starts at R18 instead of (reg:HI 18).  I deprecated and removed
> constraints starting with "R" long ago in order to get "R" free for that
> purpose.
> 
> Some years ago I tried such constraints (and hence also zoo of new
> register classes that are required to accommodate them).  The resulting
> code quality was so bad that I quickly abandoned that approach, and IIRC
> there were also spill fails.  Appears that reload / ira was overwhelmed
> by the multitude of new reg classes and took sub-optimal decisions.
> 
> The way out was more of explicit hard regs in expand, together with
> awkward functionalities like avr_fix_operands (PR63633) and the
> functions that use it.  That way we get correct code without performance
> penalties in unrelated places.
> 
> Most of such insns are explicitly modelling hand-written asm functions
> in libgcc, because most of these functions have a footprint smaller than
> the default ABI.  And some functions have an interface not complying to
> default ABI.
> 
> For the case of cpymem etc from below, explicit hard registers were used
> because register allocator did a bad job when using constraints like "e"
> (X, Y, or Z).

I guessed that much. Yes, using constraints works - I used "x" and "z" that
directly correspond to REG_X and REG_Z (ignore the weird operand numbering).

diff --git a/gcc/config/avr/avr.md b/gcc/config/avr/avr.md
index be0f8dcbe0e..6c6c4e4e212 100644
--- a/gcc/config/avr/avr.md
+++ b/gcc/config/avr/avr.md
@@ -1148,20 +1148,20 @@
 ;; "cpymem_qi"
 ;; "cpymem_hi"
 (define_insn_and_split "cpymem_"
-  [(set (mem:BLK (reg:HI REG_X))
-(mem:BLK (reg:HI REG_Z)))
+  [(set (mem:BLK (match_operand:HI 3 "register_operand" "+x"))
+(mem:BLK (match_operand:HI 4 "register_operand" "+z")))
(unspec [(match_operand:QI 0 "const_int_operand" "n")]
UNSPEC_CPYMEM)
(use (match_operand:QIHI 1 "register_operand" ""))
-   (clobber (reg:HI REG_X))
-   (clobber (reg:HI REG_Z))
+   (clobber (match_dup 3))
+   (clobber (match_dup 4))
(clobber (reg:QI LPM_REGNO))
(clobber (match_operand:QIHI 2 "register_operand" "=1"))]
   ""
   "#"
   "&& reload_completed"
-  [(parallel [(set (mem:BLK (reg:HI REG_X))
-   (mem:BLK (reg:HI REG_Z)))
+  [(parallel [(set (mem:BLK (match_dup 3))
+   (mem:BLK (match_dup 4)))
   (unspec [(match_dup 0)]
   UNSPEC_CPYMEM)
   (use (match_dup 1))

I know you did these changes a long time ago, but do you happen to have any
test cases lying around that I can use to see if LRA does a better job than
classic reload?

Vladimir, given that classic reload handled such hardcoded hard regs just 
fine, should LRA also be able to deal with them the same way? Or is this
something that LRA is not going to support?

Regards
Senthil

> 
> Johann
> 
> 
> > (define_expand "cpymemhi"
> >[(parallel [(set (match_operand:BLK 0 "memory_operand" "")
> > (match_operand:BLK 1 "memory_operand" ""))
> >(use (match_operand:HI 2 "const_int_operand" ""))
> >(use (match_operand:HI 3 "const_int_operand" ""))])]
> >""
> >{
> >  if (avr_emit_cpymemhi (operands))
> >DONE;
> > 
> >  FAIL;
> >})
> > 
> > where avr_emit_cpymemhi generates
> > 
> > (insn 14 13 15 4 (set (reg:HI 30 r30)
> >  (reg:HI 48 [ ivtmp.10 ])) "pr53505.c":21:22 -1
> >   (nil))
> > (insn 15 14 16 4 (set (reg:HI 26 r26)
> >  (reg/f:HI 38 virtual-stack-vars)) "pr53505.c":21:22 -1
> >   (nil))
> > (insn 16 15 17 4 (parallel [
> >  (set (mem:BLK (reg:HI 26 r26) [0  A8])
> >  (mem:BLK (reg:HI 30 r30) [0  A8]))
> >  (unspec [
> >  (const_int 0 [0])
> >  ]

LRA for avr: Clobber with match_dup

2023-07-20 Thread SenthilKumar.Selvaraj--- via Gcc
Hi,

  The avr backend has this define_insn_and_split

(define_insn_and_split "*tablejump_split"
  [(set (pc)
(unspec:HI [(match_operand:HI 0 "register_operand" "!z,*r,z")]
   UNSPEC_INDEX_JMP))
   (use (label_ref (match_operand 1 "" "")))
   (clobber (match_dup 0))
   (clobber (const_int 0))]
  "!AVR_HAVE_EIJMP_EICALL"
  "#"
  "&& reload_completed"
  [(parallel [(set (pc)
   (unspec:HI [(match_dup 0)]
  UNSPEC_INDEX_JMP))
  (use (label_ref (match_dup 1)))
  (clobber (match_dup 0))
  (clobber (const_int 0))
  (clobber (reg:CC REG_CC))])]
  ""
  [(set_attr "isa" "rjmp,rjmp,jmp")])

Note the (clobber (match_dup 0)). When building 

$ avr-gcc -mmcu=avr51 gcc/gcc/testsuite/gcc.c-torture/compile/930120-1.c -O3 
-funroll-loops -fdump-rtl-all

The web pass transforms this insn from

(jump_insn 120 538 124 25 (parallel [
(set (pc)
(unspec:HI [
(reg:HI 138)
] UNSPEC_INDEX_JMP))
(use (label_ref 121))
(clobber (reg:HI 138))
(clobber (const_int 0 [0]))
]) "gcc/gcc/testsuite/gcc.c-torture/compile/930120-1.c":55:5 779 
{*tablejump_split}
 (expr_list:REG_DEAD (reg:HI 138)
(expr_list:REG_UNUSED (reg:HI 138)
(nil)))
 -> 121)

to

 Web oldreg=138 newreg=279
Updating insn 120 (138->279)

(jump_insn 120 538 124 25 (parallel [
(set (pc)
(unspec:HI [
(reg:HI 138)
] UNSPEC_INDEX_JMP))
(use (label_ref 121))
(clobber (reg:HI 279))
(clobber (const_int 0 [0]))
]) "gcc/gcc/testsuite/gcc.c-torture/compile/930120-1.c":55:5 779 
{*tablejump_split}
 (expr_list:REG_DEAD (reg:HI 138)
(expr_list:REG_UNUSED (reg:HI 138)
(nil)))
 -> 121)

Note the reg in the clobber is now 279, and not 138.

With classic reload, however, this gets set back to whatever hardreg was 
assigned to r138.

(jump_insn 120 538 121 26 (parallel [
(set (pc)
(unspec:HI [
(reg/f:HI 30 r30 [138])
] UNSPEC_INDEX_JMP))
(use (label_ref 121))
(clobber (reg/f:HI 30 r30 [138]))
(clobber (const_int 0 [0]))
]) "gcc/gcc/testsuite/gcc.c-torture/compile/930120-1.c":55:5 779 
{*tablejump_split}
 (nil)
 -> 121)

With LRA, however, the pseudo reg remains unassigned, eventually causing an ICE 
in cselib_invalidate_regno.

(jump_insn 120 538 121 26 (parallel [
(set (pc)
(unspec:HI [
(reg/f:HI 30 r30 [138])
] UNSPEC_INDEX_JMP))
(use (label_ref 121))
(clobber (reg:HI 279))
(clobber (const_int 0 [0]))
]) "gcc/gcc/testsuite/gcc.c-torture/compile/930120-1.c":55:5 779 
{*tablejump_split}
 (nil)
 -> 121)

Is this something that LRA should be able to fix?

Making operand 0 read/write in define_insn_and_split prevents the web pass from 
creating a new pseudo
for the clobber, avoiding the whole problem.

(define_insn_and_split "*tablejump_split"
  [(set (pc)
(unspec:HI [(match_operand:HI 0 "register_operand" "+!z,*r,z")]
  
 UNSPEC_INDEX_JMP))
   (use (label_ref (match_operand 1 "" "")))
   (clobber (match_dup 0))
   (clobber (const_int 0))]
 
"!AVR_HAVE_EIJMP_EICALL"
  "#"
  "&& reload_completed"
  [(parallel [(set (pc)
   (unspec:HI [(match_dup 0)]
  
UNSPEC_INDEX_JMP))
  (use (label_ref (match_dup 1)))
  (clobber (match_dup 0))
  
(clobber (const_int 0))
  (clobber (reg:CC REG_CC))])]
  ""
  [(set_attr "isa" "rjmp,rjmp,jmp")])


However, https://gcc.gnu.org/onlinedocs/gccint/Side-Effects.html has this to 
say for clobber

"There is one other known use for clobbering a pseudo register in a parallel: 
when one of the input operands of the insn is also
clobbered by the insn. In this case, using the same pseudo register in the 
clobber and elsewhere in the insn produces the expected
results."

so I'm not sure if that's the right way to fix it. I could add a matching 
constraint (i.e "=0,0,0"), but until reload runs,
the clobber reg would be different from input reg, so that seems wrong to me 
too.

Any ideas?

Regards
Senthil


LRA for avr: Handling hard regs set directly at expand

2023-07-17 Thread SenthilKumar.Selvaraj--- via Gcc
Hi,

  The avr target has a bunch of patterns that directly set hard regs at expand 
time, like so

(define_expand "cpymemhi"
  [(parallel [(set (match_operand:BLK 0 "memory_operand" "")
   (match_operand:BLK 1 "memory_operand" ""))
  (use (match_operand:HI 2 "const_int_operand" ""))
  (use (match_operand:HI 3 "const_int_operand" ""))])]
  ""
  {
if (avr_emit_cpymemhi (operands))
  DONE;

FAIL;
  })

where avr_emit_cpymemhi generates

(insn 14 13 15 4 (set (reg:HI 30 r30)
(reg:HI 48 [ ivtmp.10 ])) "pr53505.c":21:22 -1
 (nil))
(insn 15 14 16 4 (set (reg:HI 26 r26)
(reg/f:HI 38 virtual-stack-vars)) "pr53505.c":21:22 -1
 (nil))
(insn 16 15 17 4 (parallel [
(set (mem:BLK (reg:HI 26 r26) [0  A8])
(mem:BLK (reg:HI 30 r30) [0  A8]))
(unspec [
(const_int 0 [0])
] UNSPEC_CPYMEM)
(use (reg:QI 52))
(clobber (reg:HI 26 r26))
(clobber (reg:HI 30 r30))
(clobber (reg:QI 0 r0))
(clobber (reg:QI 52))
]) "pr53505.c":21:22 -1
 (nil))

Classic reload knows about these - find_reg masks out bad_spill_regs, and 
bad_spill_regs
when ORed with chain->live_throughout in order_regs_for_reload picks up r30.

LRA, however, appears to not consider that, and proceeds to use such regs as 
reload regs.
For the same source, it generates

 Choosing alt 0 in insn 15:  (0) =r  (1) r {*movhi_split}
  Creating newreg=70, assigning class GENERAL_REGS to r70
   15: r26:HI=r70:HI
  REG_EQUAL r28:HI+0x1
Inserting insn reload before:
   58: r70:HI=r28:HI+0x1

Choosing alt 3 in insn 58:  (0) d  (1) 0  (2) nYnn {*addhi3_split}
  Creating newreg=71 from oldreg=70, assigning class LD_REGS to r71
   58: r71:HI=r71:HI+0x1
Inserting insn reload before:
   59: r71:HI=r28:HI
Inserting insn reload after:
   60: r70:HI=r71:HI

** Assignment #1: **

 Assigning to 71 (cl=LD_REGS, orig=70, freq=3000, tfirst=71, 
tfreq=3000)...
   Assign 30 to reload r71 (freq=3000)
Hard reg 26 is preferable by r70 with profit 1000
Hard reg 30 is preferable by r70 with profit 1000
 Assigning to 70 (cl=GENERAL_REGS, orig=70, freq=2000, tfirst=70, 
tfreq=2000)...
   Assign 30 to reload r70 (freq=2000)


(insn 14 13 59 3 (set (reg:HI 30 r30)
(reg:HI 18 r18 [orig:48 ivtmp.10 ] [48])) "pr53505.c":21:22 101 
{*movhi_split}
 (nil))
(insn 59 14 58 3 (set (reg:HI 30 r30 [70])
(reg/f:HI 28 r28)) "pr53505.c":21:22 101 {*movhi_split}
 (nil))
(insn 58 59 15 3 (set (reg:HI 30 r30 [70])
(plus:HI (reg:HI 30 r30 [70])
(const_int 1 [0x1]))) "pr53505.c":21:22 165 {*addhi3_split}
 (nil))
(insn 15 58 16 3 (set (reg:HI 26 r26)
(reg:HI 30 r30 [70])) "pr53505.c":21:22 101 {*movhi_split}
 (expr_list:REG_EQUAL (plus:HI (reg/f:HI 28 r28)
(const_int 1 [0x1]))
(nil)))
(insn 16 15 17 3 (parallel [
(set (mem:BLK (reg:HI 26 r26) [0  A8])
(mem:BLK (reg:HI 30 r30) [0  A8]))
(unspec [
(const_int 0 [0])
] UNSPEC_CPYMEM)
(use (reg:QI 22 r22 [52]))
(clobber (reg:HI 26 r26))
(clobber (reg:HI 30 r30))
(clobber (reg:QI 0 r0))
(clobber (reg:QI 22 r22 [52]))
]) "pr53505.c":21:22 132 {cpymem_qi}
 (nil))

LRA generates insn 59 that clobbers r30 set in insn 14, causing an execution
failure down the line.

How should the avr backend deal with this?

Regards
Senthil




Re: LRA for avr: help with FP and elimination

2023-07-17 Thread SenthilKumar.Selvaraj--- via Gcc
On Fri, 2023-07-14 at 09:29 -0400, Vladimir Makarov wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the 
> content is safe
> 
> On 7/13/23 05:27, SenthilKumar.Selvaraj--- via Gcc wrote:
> > Hi,
> > 
> >I've been spending some (spare) time checking what it would take to
> >make LRA work for the avr target.
> > 
> >Right after I removed the TARGET_LRA_P hook disabling LRA, building
> >libgcc failed with a weird ICE.
> >   On the avr, the stack pointer (SP)
> >is not used to access stack slots
> It is very uncommon target then.
> >   - TARGET_CAN_ELIMINATE returns false
> >if frame_pointer_needed, and TARGET_FRAME_POINTER_REQUIRED returns true
> >if get_frame_size() > 0.
> > 
> >With LRA, however, reload generates
> > 
> > (insn 159 239 240 7 (set (mem/c:QI (plus:HI (reg/f:HI 32 __SP_L__)
> >  (const_int 1 [0x1])) [2 %sfp+1 S1 A8])
> >  (reg:QI 24 r24 [orig:86 a ] [86])) "case.c":7:7 86 
> > {movqi_insn_split}
> >   (nil))
> > 
> >and the backend code errors out when it finds SP is being used as a
> >pointer register.
> > 
> >Digging through the RTL dumps, I found the following. For the
> >following insn sequence in *.ira
> > 
> > (insn 189 128 159 7 (set (reg:HI 58 [ b ])
> >  (const_int 0 [0])) "case.c":7:7 101 {*movhi_split}
> >   (nil))
> > (insn 159 189 160 7 (set (subreg:QI (reg:HI 58 [ b ]) 0)
> >  (reg:QI 86 [ a ])) "case.c":7:7 86 {movqi_insn_split}
> >   (nil))
> > (insn 160 159 32 7 (set (subreg:QI (reg:HI 58 [ b ]) 1)
> >  (reg:QI 87 [ a+1 ])) "case.c":7:7 86 {movqi_insn_split}
> >   (nil))
> > 
> >1. For r58, IRA picks R28:R29, which is the frame pointer for avr.
> > 
> >Popping a13(r58,l0)  -- assign reg 28
> > 
> >2. LRA sees the subreg in insn 159 and generates a reload reg
> >(r125).  simplify_subreg_regno (lra-constraints.cc:1810) however
> >bails (returns -1) if the reg involved is FRAME_POINTER_REGNUM and
> >reload isn't completed yet. LRA therefore decides rclass for the
> >pseudo reg is NO_REGS.
> > 
> > 
> > Creating newreg=125 from oldreg=58, assigning class NO_REGS to subreg reg 
> > r125
> >159: r125:HI#0=r86:QI
> > 
> >4. As rclass is NO_REGS, LRA picks an insn alternative that involves 
> > memory.
> >That is my understanding, please correct me if I'm wrong.
> > 
> >  0 Small class reload: reject+=3
> >  0 Non input pseudo reload: reject++
> >  Cycle danger: overall += LRA_MAX_REJECT
> >alt=0,overall=610,losers=1,rld_nregs=1
> >  0 Small class reload: reject+=3
> >  0 Non input pseudo reload: reject++
> >  alt=1: Bad operand -- refuse
> >  0 Non pseudo reload: reject++
> >alt=2,overall=1,losers=0,rld_nregs=0
> >Choosing alt 2 in insn 159:  (0) Qm  (1) rY00 {movqi_insn_split}
> > 
> >5. LRA creates stack slots, and then uses the FP register to access
> >the slots. This is despite r58 already being assigned R28:R29.
> > 
> >6. TARGET_FRAME_POINTER_REQUIRED is never called, and therefore
> >   frame_pointer_needed is not set, despite the creation of stack
> >   slots. TARGET_CAN_ELIMINATE therefore okays elimination of FP to SP,
> >   and this eventually causes the ICE when the avr backend sees SP being
> >   used as a pointer register.
> > 
> >This is the relevant sequence after reload
> > 
> > (insn 189 128 239 7 (set (reg:HI 28 r28 [orig:58 b ] [58])
> >  (const_int 0 [0])) "case.c":7:7 101 {*movhi_split}
> >   (nil))
> > (insn 239 189 159 7 (set (mem/c:HI (plus:HI (reg/f:HI 32 __SP_L__)
> >  (const_int 1 [0x1])) [2 %sfp+1 S2 A8])
> >  (reg:HI 28 r28 [orig:58 b ] [58])) "case.c":7:7 101 {*movhi_split}
> >   (nil))
> > (insn 159 239 240 7 (set (mem/c:QI (plus:HI (reg/f:HI 32 __SP_L__)
> >  (const_int 1 [0x1])) [2 %sfp+1 S1 A8])
> >  (reg:QI 24 r24 [orig:86 a ] [86])) "case.c":7:7 86 
> > {movqi_insn_split}
> >   (nil))
> > (insn 240 159 241 7 (set (reg:HI 28 r28 [orig:58 b ] [58])
> >  (mem/c:HI (plus:HI (reg/f:HI 32 __SP_L__)
> >  (const_int 1 [0x1])) [2 %sfp+1 S2 A8])) "case.c":7:7 101 
> 

LRA for avr: help with FP and elimination

2023-07-13 Thread SenthilKumar.Selvaraj--- via Gcc
Hi,

  I've been spending some (spare) time checking what it would take to
  make LRA work for the avr target.

  Right after I removed the TARGET_LRA_P hook disabling LRA, building
  libgcc failed with a weird ICE. On the avr, the stack pointer (SP)
  is not used to access stack slots - TARGET_CAN_ELIMINATE returns false
  if frame_pointer_needed, and TARGET_FRAME_POINTER_REQUIRED returns true
  if get_frame_size() > 0.

  With LRA, however, reload generates

(insn 159 239 240 7 (set (mem/c:QI (plus:HI (reg/f:HI 32 __SP_L__)
(const_int 1 [0x1])) [2 %sfp+1 S1 A8])
(reg:QI 24 r24 [orig:86 a ] [86])) "case.c":7:7 86 {movqi_insn_split}
 (nil))

  and the backend code errors out when it finds SP is being used as a
  pointer register.

  Digging through the RTL dumps, I found the following. For the
  following insn sequence in *.ira

(insn 189 128 159 7 (set (reg:HI 58 [ b ])
(const_int 0 [0])) "case.c":7:7 101 {*movhi_split}
 (nil))
(insn 159 189 160 7 (set (subreg:QI (reg:HI 58 [ b ]) 0)
(reg:QI 86 [ a ])) "case.c":7:7 86 {movqi_insn_split}
 (nil))
(insn 160 159 32 7 (set (subreg:QI (reg:HI 58 [ b ]) 1)
(reg:QI 87 [ a+1 ])) "case.c":7:7 86 {movqi_insn_split}
 (nil))

  1. For r58, IRA picks R28:R29, which is the frame pointer for avr.

  Popping a13(r58,l0)  -- assign reg 28

  2. LRA sees the subreg in insn 159 and generates a reload reg
  (r125).  simplify_subreg_regno (lra-constraints.cc:1810) however
  bails (returns -1) if the reg involved is FRAME_POINTER_REGNUM and
  reload isn't completed yet. LRA therefore decides rclass for the
  pseudo reg is NO_REGS.


Creating newreg=125 from oldreg=58, assigning class NO_REGS to subreg reg r125
  159: r125:HI#0=r86:QI

  4. As rclass is NO_REGS, LRA picks an insn alternative that involves memory.
  That is my understanding, please correct me if I'm wrong.

0 Small class reload: reject+=3
0 Non input pseudo reload: reject++
Cycle danger: overall += LRA_MAX_REJECT
  alt=0,overall=610,losers=1,rld_nregs=1
0 Small class reload: reject+=3
0 Non input pseudo reload: reject++
alt=1: Bad operand -- refuse
0 Non pseudo reload: reject++
  alt=2,overall=1,losers=0,rld_nregs=0
 Choosing alt 2 in insn 159:  (0) Qm  (1) rY00 {movqi_insn_split}

  5. LRA creates stack slots, and then uses the FP register to access
  the slots. This is despite r58 already being assigned R28:R29. 

  6. TARGET_FRAME_POINTER_REQUIRED is never called, and therefore
 frame_pointer_needed is not set, despite the creation of stack
 slots. TARGET_CAN_ELIMINATE therefore okays elimination of FP to SP,
 and this eventually causes the ICE when the avr backend sees SP being
 used as a pointer register.

  This is the relevant sequence after reload

(insn 189 128 239 7 (set (reg:HI 28 r28 [orig:58 b ] [58])
(const_int 0 [0])) "case.c":7:7 101 {*movhi_split}
 (nil))
(insn 239 189 159 7 (set (mem/c:HI (plus:HI (reg/f:HI 32 __SP_L__)
(const_int 1 [0x1])) [2 %sfp+1 S2 A8])
(reg:HI 28 r28 [orig:58 b ] [58])) "case.c":7:7 101 {*movhi_split}
 (nil))
(insn 159 239 240 7 (set (mem/c:QI (plus:HI (reg/f:HI 32 __SP_L__)
(const_int 1 [0x1])) [2 %sfp+1 S1 A8])
(reg:QI 24 r24 [orig:86 a ] [86])) "case.c":7:7 86 {movqi_insn_split}
 (nil))
(insn 240 159 241 7 (set (reg:HI 28 r28 [orig:58 b ] [58])
(mem/c:HI (plus:HI (reg/f:HI 32 __SP_L__)
(const_int 1 [0x1])) [2 %sfp+1 S2 A8])) "case.c":7:7 101 
{*movhi_split}
 (nil))
(insn 241 240 160 7 (set (mem/c:HI (plus:HI (reg/f:HI 32 __SP_L__)
(const_int 1 [0x1])) [2 %sfp+1 S2 A8])
(reg:HI 28 r28 [orig:58 b ] [58])) "case.c":7:7 101 {*movhi_split}
 (nil))
(insn 160 241 242 7 (set (mem/c:QI (plus:HI (reg/f:HI 32 __SP_L__)
(const_int 2 [0x2])) [2 %sfp+2 S1 A8])
(reg:QI 18 r18 [orig:87 a+1 ] [87])) "case.c":7:7 86 {movqi_insn_split}
 (nil))
(insn 242 160 33 7 (set (reg:HI 28 r28 [orig:58 b ] [58])
(mem/c:HI (plus:HI (reg/f:HI 32 __SP_L__)
(const_int 1 [0x1])) [2 %sfp+1 S2 A8])) "case.c":7:7 101 
{*movhi_split}
 (nil))

  For choices other than FP, simplify_subreg_regno returns the correct part
  of the wider HImode reg, so rclass is not NO_REGS, and things workout fine.

  I checked what classic reload does in the same situation - it picks a
  different register (R25) instead of spilling to a stack slot.


(insn 189 128 159 7 (set (reg:HI 28 r28 [orig:58 b ] [58])
(const_int 0 [0])) "case.c":7:7 101 {*movhi_split}
 (nil))
(insn 159 189 226 7 (set (reg:QI 25 r25)
(reg:QI 24 r24 [orig:86 a ] [86])) "case.c":7:7 86 {movqi_insn_split}
 (nil))
(insn 226 159 160 7 (set (reg:QI 28 r28)
(reg:QI 25 r25)) "case.c":7:7 86 {movqi_insn_split}
 (nil))
(insn 160 226 227 7 (set (reg:QI 25 r25)