On Thu, Jul 4, 2024 at 1:08 PM Georg-Johann Lay <a...@gjlay.de> wrote:
>
>
>
> Am 04.07.24 um 11:49 schrieb Richard Biener:
> > On Thu, Jul 4, 2024 at 11:24 AM Richard Biener
> > <richard.guent...@gmail.com> wrote:
> >>
> >> On Wed, Jul 3, 2024 at 9:26 PM Georg-Johann Lay <a...@gjlay.de> wrote:
> >>>
> >>>
> >>>
> >>> Am 02.07.24 um 15:48 schrieb Richard Biener:
> >>>> On Tue, Jul 2, 2024 at 3:43 PM Georg-Johann Lay <a...@gjlay.de> wrote:
> >>>>>
> >>>>> Hi Jeff,
> >>>>>
> >>>>> This is a patch to get correct code out of 64-bit
> >>>>> loads from address-space __memx.
> >>>>>
> >>>>> The AVR address-spaces may require that move insns issue
> >>>>> calls to library support functions, a fact that -ftree-ter
> >>>>> doesn't account for.  tree-ssa-ter.cc then replaces an
> >>>>> expression across such a library call, resulting in wrong code.
> >>>>>
> >>>>> This patch disables that pass per default on avr, as there is no
> >>>>> more fine grained way to avoid malicious optimizations.
> >>>>> The pass can still be re-enabled by means of explicit -ftree-ter.
> >>>>>
> >>>>> Ok to apply?
> >>>>
> >>>> I think this requires more details on what goes wrong - I assume
> >>>> it's not stmt reordering that effectively happens but recursive
> >>>> expand_expr on SSA defs when those invoke libcalls?  In that
> >>>> case this would point to a deeper issue.
> >>>
> >>> The difference is that with TER, we get a hard reg in .expand
> >>> for a movdi from 24-bit address-space __memx.
> >>>
> >>> Such moves require library calls, which in turn require
> >>> specific hard registers.  As avr backend has no movdi, the
> >>> moddi gets expanded as 8 * movqi, and that does not work
> >>> when the target registers are hard regs, as some of them
> >>> are clobbered by the libcalls.
> >>
> >> So I see
> >>
> >> (insn 18 17 19 2 (parallel [
> >>              (set (reg:QI 22 r22 [+4 ])
> >>                  (mem/u/c:QI (reg/f:PSI 55) [1 aa+4 S1 A8 AS7]))
> >>              (clobber (reg:QI 22 r22))
> >>              (clobber (reg:QI 21 r21))
> >>              (clobber (reg:HI 30 r30))
> >>          ]) "t.c":12:13 38 {xloadqi_A}
> >>       (nil))
> >> (insn 19 18 20 2 (set (reg:PSI 56)
> >>          (reg/f:PSI 47)) "t.c":12:13 112 {*movpsi_split}
> >>       (nil))
> >> (insn 20 19 21 2 (parallel [
> >>              (set (reg/f:PSI 57)
> >>                  (plus:PSI (reg/f:PSI 47)
> >>                      (const_int 5 [0x5])))
> >>              (clobber (scratch:QI))
> >>          ]) "t.c":12:13 205 {addpsi3}
> >>       (nil))
> >> (insn 21 20 22 2 (parallel [
> >>              (set (reg:QI 23 r23 [+5 ])
> >>                  (mem/u/c:QI (reg/f:PSI 57) [1 aa+5 S1 A8 AS7]))
> >>              (clobber (reg:QI 22 r22))
> >>              (clobber (reg:QI 21 r21))
> >>              (clobber (reg:HI 30 r30))
> >>          ]) "t.c":12:13 38 {xloadqi_A}
> >>       (nil))
> >>
> >> for example - insn 21 clobbers r22 which is also the destination of insn 
> >> 18.
> >>
> >> With -fno-tree-ter those oddly get _no_ intermediate reg but we have
> >>
> >> (insn 9 8 10 2 (parallel [
> >>              (set (subreg:QI (reg:DI 43 [ aa.0_1 ]) 1)
> >>                  (mem/u/c:QI (reg/f:PSI 48) [1 aa+1 S1 A8 AS7]))
> >>              (clobber (reg:QI 22 r22))
> >>              (clobber (reg:QI 21 r21))
> >>              (clobber (reg:HI 30 r30))
> >>          ]) "t.c":12:13 38 {xloadqi_A}
> >>       (nil))
> >>
> >> but since on GIMPLE we have DImode loads I don't see how TER comes into
> >> play here - TER should favor the second code generation, not the first ...
> >> (or TER shouldn't play into here at all).
> >>
> >> with -fno-tree-ter we come via
> >>
> >> #0  expand_expr_real (exp=<var_decl 0x7ffff7162000 aa>, 
> >> target=0x7ffff716c9a8,
> >>      tmode=E_DImode, modifier=EXPAND_NORMAL, alt_rtl=0x7fffffffcff8,
> >>      inner_reference_p=false) at /space/rguenther/src/gcc/gcc/expr.cc:9433
> >> #1  0x000000000109fe63 in store_expr (exp=<var_decl 0x7ffff7162000 aa>,
> >>      target=0x7ffff716c9a8, call_param_p=0, nontemporal=false, 
> >> reverse=false)
> >>      at /space/rguenther/src/gcc/gcc/expr.cc:6740
> >> #2  0x000000000109e626 in expand_assignment (to=<ssa_name 0x7ffff713f678 
> >> 1>,
> >>      from=<var_decl 0x7ffff7162000 aa>, nontemporal=false)
> >>      at /space/rguenther/src/gcc/gcc/expr.cc:6461
> >>
> >> while with TER we instead have
> >>
> >> #0  expand_expr_real (exp=<var_decl 0x7ffff7162000 aa>, target=0x0,
> >>      tmode=E_VOIDmode, modifier=EXPAND_NORMAL, alt_rtl=0x0,
> >>      inner_reference_p=false) at /space/rguenther/src/gcc/gcc/expr.cc:9433
> >> #1  0x00000000010b279f in expand_expr_real_gassign (g=0x7ffff71613c0,
> >>      target=0x0, tmode=E_VOIDmode, modifier=EXPAND_NORMAL, alt_rtl=0x0,
> >>      inner_reference_p=false) at /space/rguenther/src/gcc/gcc/expr.cc:11100
> >> #2  0x00000000010b3294 in expand_expr_real_1 (exp=<ssa_name 0x7ffff713f678 
> >> 1>,
> >>      target=0x0, tmode=E_VOIDmode, modifier=EXPAND_NORMAL, alt_rtl=0x0,
> >>      inner_reference_p=false) at /space/rguenther/src/gcc/gcc/expr.cc:11278
> >>
> >> the difference is -fno-tree-ter has a target (reg:DI 43 [...]) but with 
> >> TER we
> >> are not passing a target or a mode.
> >>
> >> I think the issue is that the expansion at some point doesn't expect
> >> the result to end up in
> >> a hard register.  Maybe define_expand are not supposed to do that but maybe
> >> expansion needs to fix up.
> >>
> >> A first thought was
> >>
> >> diff --git a/gcc/expr.cc b/gcc/expr.cc
> >> index ffbac513692..1509acad02e 100644
> >> --- a/gcc/expr.cc
> >> +++ b/gcc/expr.cc
> >> @@ -11111,6 +11111,12 @@ expand_expr_real_gassign (gassign *g, rtx
> >> target, machine_mode tmode,
> >>         gcc_unreachable ();
> >>       }
> >>     set_curr_insn_location (saved_loc);
> >> +  if (!target && REG_P (r) && REGNO (r) < FIRST_PSEUDO_REGISTER)
> >> +    {
> >> +      rtx tem = gen_reg_rtx (GET_MODE (r));
> >> +      emit_move_insn (tem, r);
> >> +      r = tem;
> >> +    }
> >>     if (REG_P (r) && !REG_EXPR (r))
> >>       set_reg_attrs_for_decl_rtl (lhs, r);
> >>     return r;
> >>
> >> but of course that's not the place to fix - this sees (mem/u/c:DI
> >> (reg/f:PSI 47) [1 aa+0 S8 A8 AS7])
> >> as result and things go wrong somewhere in the chain of expanding
> >> things from the return, possibly at the point of expanding the plus and
> >> there possibly when building subregs of the DImode mem.
> >>
> >> You'd have to trace that down but the fix in the end is to do sth like the 
> >> above
> >> or alternatively, in the expander producing the 'xload' make sure to not 
> >> allow
> >> a hardreg as destination when you can still create pseudos?
> >
> > diff --git a/gcc/config/avr/avr.md b/gcc/config/avr/avr.md
> > index dabf4c0fc5a..f897113c885 100644
> > --- a/gcc/config/avr/avr.md
> > +++ b/gcc/config/avr/avr.md
> > @@ -746,9 +746,7 @@
> >           else
> >             {
> >               rtx reg_22 = gen_rtx_REG (<MODE>mode, REG_22);
> > -            if (reg_overlap_mentioned_p (dest2, reg_22)
> > -                || reg_overlap_mentioned_p (dest2, all_regs_rtx[REG_21]))
> > -              dest2 = gen_reg_rtx (<MODE>mode);
> > +            dest2 = gen_reg_rtx (<MODE>mode);
> >
> >               emit_insn (gen_xload<mode>_A (dest2, src));
> >             }
> >
> > seems to fix it.  I'm not sure what reg_overlap_mentioned_p should achieve 
> > in
> > a define-expand.
> >
> > Richard.
>
> I tried that, but it is still producing wrong code (test case aborts).

If that's so you need to look at where the middle-end comes up with
that hard register as target but still continues to expand other "stuff"
without emitting the hardreg use immediately (the call for what the
hardreg is the argument?).

It's definitely the fault of either the middle-end or the target and has
nothing to do with TER.  I do not have time to track this down further
though.

> The problem is that the middle-end comes up with a hard
> register as destination.  Just using a pseudo register at that
> place does not help because you have to copy that pseudo (dest2)
> back to the hard register (dest) two lines below at avr.md:757.
>
> Then, in the next such call of gen_movqi, that hard register
> lives, and will be clobbered.  Copying back and forth pseudos
> and hard regs does not help because the hard reg result
> from the previous gen_movqi is still supposed to live across
> the next gen_movqi *and* the 2nd gen_movqi is told to put
> a result in that exact same hard register.
>
> That cannot work.

Exactly.

> For hard regs that are inputs, we could save / restore them across
> the xload insns like for PR63633, but that doesn't work for
> output operands that are hard regs.

RTL expansion is not supposed to produce defs or uses of hard registers
that are otherwise free to use.  IIRC we disabled TER across hardreg
assignments also because of libcalls (PR70184 which looks similar to
your issue).  It seems that in your case we're expanding the arguments
of a libcall to a libcall which would be the same issue.

I'll note the libcalls only appear very late - after RTL expansion I still see

(insn 44 43 53 2 (set (reg:DI 18 r18)
        (plus:DI (reg:DI 18 r18)
            (reg:DI 10 r10))) "t.c":12:13 2653 {adddi3_insn}
     (nil))

for example but those seem to have hardreg constraints already?

I do wonder how say softfp targets avoid this when doing, say
(a + b) + (d + e)?

Richard.

> Johann
>
>
> >> Richard.
> >>
> >>> Moreover, even with TER, the code is no more efficient than
> >>> without it, so it's not clear what's the point in propagating
> >>> hard regs into expander operands. Later passes like fwprop1 and
> >>> combine can do that, too.
> >>>
> >>> Requiring libcalls in a mov insn is quite special indeed,
> >>> and there is no way to tell that to TER.  TER itself does not
> >>> optimize code involving libcalls, so it knows they are fragile.
> >>>
> >>>> So - if the wrongness is already apparent in the RTL expansion
> >>>> pass dump can you quote the respective pieces and explain why?
> >>>
> >>> It expand a 64-bit move from __memx address-space to registers
> >>> R18...R25.  This is broken into 8 QI moves to these regs, but
> >>> the movqi requires a libcall in some situations, which pass their
> >>> arguments in R21...R25.  Hence the libcalls clobber some of the
> >>> destination regs.
> >>>
> >>> It would already help when TER would not propagate hard-regs into
> >>> expander operands.
> >>>
> >>> Johann
> >>>
> >>>>> As an alternative, the option could be disabled permanently in
> >>>>> avr.cc::avr_option_override().
> >>>>>
> >>>>> Johann
> >>>>>
> >>>>> --
> >>>>>
> >>>>> AVR: middle-end/87376 - Use -fno-tree-ter per default.
> >>>>>
> >>>>> Temporary expression replacement might replace expressions across
> >>>>> library calls, for example with move insn from address-space __memx
> >>>>> like in PR87376.  -ftree-ter has no way where the backend could hook
> >>>>> in to avoid only problematic replacements, thus kick it out altogether.
> >>>>>
> >>>>>           PR middle-end/87376
> >>>>> gcc/
> >>>>>           * common/config/avr/avr-common.cc 
> >>>>> (avr_option_optimization_table)
> >>>>>           <OPT_ftree_ter>: Set to 0.
> >>>>> gcc/testsuite/
> >>>>>           * gcc.target/avr/torture/pr87376-memx.c: New test.

Reply via email to