On Fri, 2023-07-14 at 09:29 -0400, Vladimir Makarov wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the 
> content is safe
> 
> On 7/13/23 05:27, SenthilKumar.Selvaraj--- via Gcc wrote:
> > Hi,
> > 
> >    I've been spending some (spare) time checking what it would take to
> >    make LRA work for the avr target.
> > 
> >    Right after I removed the TARGET_LRA_P hook disabling LRA, building
> >    libgcc failed with a weird ICE.
> >   On the avr, the stack pointer (SP)
> >    is not used to access stack slots
> It is very uncommon target then.
> >   - TARGET_CAN_ELIMINATE returns false
> >    if frame_pointer_needed, and TARGET_FRAME_POINTER_REQUIRED returns true
> >    if get_frame_size() > 0.
> > 
> >    With LRA, however, reload generates
> > 
> > (insn 159 239 240 7 (set (mem/c:QI (plus:HI (reg/f:HI 32 __SP_L__)
> >                  (const_int 1 [0x1])) [2 %sfp+1 S1 A8])
> >          (reg:QI 24 r24 [orig:86 a ] [86])) "case.c":7:7 86 
> > {movqi_insn_split}
> >       (nil))
> > 
> >    and the backend code errors out when it finds SP is being used as a
> >    pointer register.
> > 
> >    Digging through the RTL dumps, I found the following. For the
> >    following insn sequence in *.ira
> > 
> > (insn 189 128 159 7 (set (reg:HI 58 [ b ])
> >          (const_int 0 [0])) "case.c":7:7 101 {*movhi_split}
> >       (nil))
> > (insn 159 189 160 7 (set (subreg:QI (reg:HI 58 [ b ]) 0)
> >          (reg:QI 86 [ a ])) "case.c":7:7 86 {movqi_insn_split}
> >       (nil))
> > (insn 160 159 32 7 (set (subreg:QI (reg:HI 58 [ b ]) 1)
> >          (reg:QI 87 [ a+1 ])) "case.c":7:7 86 {movqi_insn_split}
> >       (nil))
> > 
> >    1. For r58, IRA picks R28:R29, which is the frame pointer for avr.
> > 
> >        Popping a13(r58,l0)  --         assign reg 28
> > 
> >    2. LRA sees the subreg in insn 159 and generates a reload reg
> >    (r125).  simplify_subreg_regno (lra-constraints.cc:1810) however
> >    bails (returns -1) if the reg involved is FRAME_POINTER_REGNUM and
> >    reload isn't completed yet. LRA therefore decides rclass for the
> >    pseudo reg is NO_REGS.
> > 
> > <snip>
> > Creating newreg=125 from oldreg=58, assigning class NO_REGS to subreg reg 
> > r125
> >    159: r125:HI#0=r86:QI
> > 
> >    4. As rclass is NO_REGS, LRA picks an insn alternative that involves 
> > memory.
> >    That is my understanding, please correct me if I'm wrong.
> > <snip>
> >              0 Small class reload: reject+=3
> >              0 Non input pseudo reload: reject++
> >              Cycle danger: overall += LRA_MAX_REJECT
> >            alt=0,overall=610,losers=1,rld_nregs=1
> >              0 Small class reload: reject+=3
> >              0 Non input pseudo reload: reject++
> >              alt=1: Bad operand -- refuse
> >              0 Non pseudo reload: reject++
> >            alt=2,overall=1,losers=0,rld_nregs=0
> >        Choosing alt 2 in insn 159:  (0) Qm  (1) rY00 {movqi_insn_split}
> > 
> >    5. LRA creates stack slots, and then uses the FP register to access
> >    the slots. This is despite r58 already being assigned R28:R29.
> > 
> >    6. TARGET_FRAME_POINTER_REQUIRED is never called, and therefore
> >       frame_pointer_needed is not set, despite the creation of stack
> >       slots. TARGET_CAN_ELIMINATE therefore okays elimination of FP to SP,
> >       and this eventually causes the ICE when the avr backend sees SP being
> >       used as a pointer register.
> > 
> >    This is the relevant sequence after reload
> > <snip>
> > (insn 189 128 239 7 (set (reg:HI 28 r28 [orig:58 b ] [58])
> >          (const_int 0 [0])) "case.c":7:7 101 {*movhi_split}
> >       (nil))
> > (insn 239 189 159 7 (set (mem/c:HI (plus:HI (reg/f:HI 32 __SP_L__)
> >                  (const_int 1 [0x1])) [2 %sfp+1 S2 A8])
> >          (reg:HI 28 r28 [orig:58 b ] [58])) "case.c":7:7 101 {*movhi_split}
> >       (nil))
> > (insn 159 239 240 7 (set (mem/c:QI (plus:HI (reg/f:HI 32 __SP_L__)
> >                  (const_int 1 [0x1])) [2 %sfp+1 S1 A8])
> >          (reg:QI 24 r24 [orig:86 a ] [86])) "case.c":7:7 86 
> > {movqi_insn_split}
> >       (nil))
> > (insn 240 159 241 7 (set (reg:HI 28 r28 [orig:58 b ] [58])
> >          (mem/c:HI (plus:HI (reg/f:HI 32 __SP_L__)
> >                  (const_int 1 [0x1])) [2 %sfp+1 S2 A8])) "case.c":7:7 101 
> > {*movhi_split}
> >       (nil))
> > (insn 241 240 160 7 (set (mem/c:HI (plus:HI (reg/f:HI 32 __SP_L__)
> >                  (const_int 1 [0x1])) [2 %sfp+1 S2 A8])
> >          (reg:HI 28 r28 [orig:58 b ] [58])) "case.c":7:7 101 {*movhi_split}
> >       (nil))
> > (insn 160 241 242 7 (set (mem/c:QI (plus:HI (reg/f:HI 32 __SP_L__)
> >                  (const_int 2 [0x2])) [2 %sfp+2 S1 A8])
> >          (reg:QI 18 r18 [orig:87 a+1 ] [87])) "case.c":7:7 86 
> > {movqi_insn_split}
> >       (nil))
> > (insn 242 160 33 7 (set (reg:HI 28 r28 [orig:58 b ] [58])
> >          (mem/c:HI (plus:HI (reg/f:HI 32 __SP_L__)
> >                  (const_int 1 [0x1])) [2 %sfp+1 S2 A8])) "case.c":7:7 101 
> > {*movhi_split}
> >       (nil))
> > 
> >    For choices other than FP, simplify_subreg_regno returns the correct part
> >    of the wider HImode reg, so rclass is not NO_REGS, and things workout 
> > fine.
> > 
> >    I checked what classic reload does in the same situation - it picks a
> >    different register (R25) instead of spilling to a stack slot.
> > 
> > <snip>
> > (insn 189 128 159 7 (set (reg:HI 28 r28 [orig:58 b ] [58])
> >          (const_int 0 [0])) "case.c":7:7 101 {*movhi_split}
> >       (nil))
> > (insn 159 189 226 7 (set (reg:QI 25 r25)
> >          (reg:QI 24 r24 [orig:86 a ] [86])) "case.c":7:7 86 
> > {movqi_insn_split}
> >       (nil))
> > (insn 226 159 160 7 (set (reg:QI 28 r28)
> >          (reg:QI 25 r25)) "case.c":7:7 86 {movqi_insn_split}
> >       (nil))
> > (insn 160 226 227 7 (set (reg:QI 25 r25)
> >          (reg:QI 18 r18 [orig:87 a+1 ] [87])) "case.c":7:7 86 
> > {movqi_insn_split}
> >       (nil))
> > (insn 227 160 33 7 (set (reg:QI 29 r29)
> >          (reg:QI 25 r25)) "case.c":7:7 86 {movqi_insn_split}
> >       (nil))
> > 
> > 
> >    My questions:
> > 
> >    1. Is there something obvious the avr backend is doing wrong that is
> >    causing this?
> No.  In my opinion if it worked with reload, it should work with LRA
> because LRA is a substitute for reload.
> >    2. Shouldn't LRA ask the backend for frame_pointer_required_p and
> >    update frame_pointer_needed if it creates stack slots?
> I think so.
> >    3. Even if (2) works, I see that lra-eliminates.cc:update_reg_eliminate
> >    asserts that if the backend said elimination to SP is
> > ok first up, it
> >    cannot reject that elimination later (line 1165). If the only reason
> >    FP is required is because LRA created stack
> > slots, what should the backend do?
> > 
> I think it can be relaxed for avr on which sp can not be used access
> stack memory
> >    4. When simplify_subreg_regno bails for FP, lra-constraints.cc:1815
> >    sets rclass = NO_REGS and forces a spill to memory. The comment says
> >    it is to prevent infinite looping, but for this case, doesn't it
> >    make sense to look for other regs?
> > 
> I guess it can be relaxed for avr and frame-pointer too as avr does not
> use sp for accessing memory
> >    5. I can work around the problem by disabling elimination from FP to SP
> >    when lra_in_progress, but I think it pevents IRA/LRA from using
> >    R28:R29 even when FP is not required at all?
> Yes, it is better to avoid disabling elimination
> >    6. Basic question, but does FP to SP elimination mean any operation
> >    possible with FP should be doable with SP as well?
> 
> Hard to say. There are a lot of undocumented RA assumptions.
> 
> If you send me the preprocessed test, I could start to work on it to fix
> the problems.  I think it is hard to fix them right for a person having
> a little experience with LRA.
> 
> 
Ok, this is a reduced test case that reproduces the failure.

$ cat case.c
typedef int HItype __attribute__ ((mode (HI)));
HItype
__mulvhi3 (HItype a, HItype b)
{
  HItype w;

  if (__builtin_mul_overflow (a, b, &w))
    __builtin_trap ();

  return w;
}

On latest master, this trivial patch turns on LRA for avr
--- gcc/config/avr/avr.cc
+++ gcc/config/avr/avr.cc
@@ -15244,9 +15244,6 @@ avr_float_lib_compare_returns_bool (machine_mode mode, 
enum rtx_code)
 #undef  TARGET_CONVERT_TO_TYPE
 #define TARGET_CONVERT_TO_TYPE avr_convert_to_type
 
-#undef TARGET_LRA_P
-#define TARGET_LRA_P hook_bool_void_false
-
 #undef  TARGET_ADDR_SPACE_SUBSET_P
 #define TARGET_ADDR_SPACE_SUBSET_P avr_addr_space_subset_p
 
Then configuring and building for avr without attempting to build libgcc

$ configure --target=avr --prefix=<prefix_dir> --enable-languages=c && make 
all-host && make install-host

And finally to reproduce the failure
$ <prefix_dir>/bin/avr-gcc -mmcu=avr25 case.c -Os

Thanks for taking this up - please do let me know if you need more information.

Regards
Senthil

Reply via email to