Re: [PATCH] Don't combine across likely spilled hard reg setters (PR rtl-optimization/59477)

Richard Biener Thu, 16 Jan 2014 01:34:40 -0800

On Wed, 15 Jan 2014, Jakub Jelinek wrote:

> Hi!
> 
> As discussed in the PR, when combine combines something across a setter
> of a likely spilled non-fixed hard register, we may get RA ICE, as the
> combined insn might need for reload a hard reg that is live at the point
> where combiner has combined the insn.
> 
> The following patch attempts to fix that by not creating LOG_LINKS in
> certain cases.


Err - your description makes it sound like a register allocator / reload / 
LRA issue but you are papering over it in combine?

Vlad claims that reload or LRA cannot fix up here but surely splitting
the live range of the live-crossing hardreg twice and spilling it
must be possible, no?

>  There are two cases which the patch handles separately:
> 1) CALL_INSN and preceeding load_register_parameters sequence
>    (from the call insn up to the earliest insn in the sequence that
>    sets some likely spilled non-fixed hard register

that's PR59477, right?

> 2) other setters of likely spilled non-fixed hard register
>    (loading of return value of a function into the return register(s),
>    explicit register vars, ...)
> 
> 1) is handled specially, by disallowing LOG_LINKS from before the first
>    insn into the sequence, because in between that first insn and
>    the CALL_INSN a likely spilled hard register is live.  LOG_LINKS from
>    before the sequence to after the CALL_INSN or from within the sequence
>    to after the call or from within the sequence to within the sequence
>    are allowed
> 2) other setters are handled just like a basic block boundary for LOG_LINKS,
>    no LOG_LINKS are allowed to cross that boundary

It seems this is very conservative - the time combine runs you
only have pseudos in the insn that will eventually create the
conflict.  In fact,

(define_insn "*ashl<mode>3_1"
  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r,r")
        (ashift:SWI48 (match_operand:SWI48 1 "nonimmediate_operand" 
"0,l,rm")
                      (match_operand:QI 2 "nonmemory_operand" 
"c<S>,M,r")))
   (clobber (reg:CC FLAGS_REG))]
  "ix86_binary_operator_ok (ASHIFT, <MODE>mode, operands)"

suggests that IRA can happily choose sth else than cx here even though
Vlad claims

"As r90 in inns 29 needs to be in cx and cx is already alive, neither 
reload nor LRA can do anything."

Other passes may have exactly the same issue - if live-ranges cannot
be split during IRA then two shift insns with overlapping shift
amount life-ranges will cause exactly the same problem, no?

Jakub says

I don't think this is a RA bug, the problem is I think that combine 
changes:
 (insn 20 19 21 2 (set (reg:DI 2 cx)
         (const_int 0 [0])) pr59477.C:20 62 {*movdi_internal_rex64}
      (nil))
 
-(insn 21 20 22 2 (set (reg:SI 37 r8)
-        (subreg:SI (reg:DI 64) 0)) pr59477.C:20 64 {*movsi_internal}
-     (expr_list:REG_DEAD (reg:DI 64)
-        (nil)))
+(insn 21 20 22 2 (parallel [
+            (set (reg:DI 37 r8)
+                (ashift:DI (reg:DI 65)
+                    (subreg:QI (reg:SI 63 [ v ]) 0)))
+            (clobber (reg:CC 17 flags))
+        ]) pr59477.C:20 503 {*ashldi3_1}
+     (expr_list:REG_UNUSED (reg:CC 17 flags)
+        (expr_list:REG_DEAD (reg:SI 63 [ v ])
+            (expr_list:REG_DEAD (reg:DI 65)
+                (nil)))))

why is there a non-argument setup insn inbetween argument setup
and the call?  Having argument register constraints handled like

;; bar (_20, _20, _20, _20);

(insn 21 20 22 (set (reg:SI 2 cx)
        (reg:SI 97 [ D.1780 ])) t.c:10 -1
     (nil))

(insn 22 21 23 (set (reg:SI 1 dx)
        (reg:SI 97 [ D.1780 ])) t.c:10 -1
     (nil))

(insn 23 22 24 (set (reg:SI 4 si)
        (reg:SI 97 [ D.1780 ])) t.c:10 -1
     (nil))

(insn 24 23 25 (set (reg:SI 5 di)
        (reg:SI 97 [ D.1780 ])) t.c:10 -1
     (nil))

(call_insn 25 24 0 (call (mem:QI (symbol_ref:DI ("bar") [flags 0x41]  
<function_decl 0x7fc7b0b52600 bar>) [0 bar S1 A8])

looks less than ideal - we get a live-range split which is
probably good for RA, but ultimatively there shouldn't be separate
insns involving hard regs before reload - if anything gets
between those insns this limits RA / reload freedom.  We
should be able to tell the RA that the call needs its inputs in
certain registers - just as we do with asm()s.

So, looking at what combine does:

Trying 18 -> 28:
Successfully matched this instruction:
(set (reg:SI 37 r8)
    (subreg:SI (reg:DI 89 [ D.2308 ]) 0))

that's

(insn 18 17 20 2 (parallel [
            (set (reg:DI 95)
                (ior:DI (reg:DI 93)
                    (reg:DI 91 [ D.2309 ])))
            (clobber (reg:CC 17 flags))
        ]) t.c:4 421 {*iordi_1}
     (expr_list:REG_DEAD (reg:DI 93)
        (expr_list:REG_DEAD (reg:DI 91 [ D.2309 ])
            (expr_list:REG_UNUSED (reg:CC 17 flags)
                (nil)))))

(insn 28 27 29 2 (set (reg:SI 37 r8)
        (subreg:SI (reg:DI 95) 0)) t.c:20 90 {*movsi_internal}
     (expr_list:REG_DEAD (reg:DI 95)
        (nil)))

Trying 11 -> 28:
Successfully matched this instruction:
(set (reg:DI 37 r8)
    (ashift:DI (reg:DI 90)
        (subreg:QI (reg:SI 88 [ v ]) 0)))

and we're toast.

I think the problem is still either a missed feature in LRA/reload
(split live-ranges), a problem in how we represent calls & argument
setup (magic hard-reg uses), or a backend problem (should spill
itself if necessary, via a post-reload splitter or always spill
and undo via a peephole2).

Of course papering over in combine might be the best at this
stage.  So the above was just observations from the less experienced
people in this area.

Richard.

> Bootstrapped/regtested on x86_64-linux and i686-linux.
> 
> For i686-linux stage3 + target libraries, just 8 *.o files are affected by
> the patch, for x86_64-linux approx. 8% of *.o files are affected
> by the patch, though haven't checked in detail how much it affects each
> file, so far only looked at x86_64-linux builtins.o, where the patch
> pessimized the code in a single spot:
> -       movq    (%rax,%rdx), %rdi
> +       addq    %rax, %rdx
> +       movq    (%rdx), %rdi
>         call    _ZL12validate_argPK9tree_node9tree_code
> which comes from insn 112 not being combined with 114, because both si
> and di are on x86_64 likely spilled non-fixed hard registers and thus we
> disallowed LOG_LINKS from before insn 113 into the 113..114 range, because
> of the fear that the combined pattern might during reload need the hard
> register that is live across it:
> (insn 112 111 113 17 (parallel [
>             (set (reg/f:DI 134)
>                 (plus:DI (reg:DI 132)
>                     (reg/v:DI 113 [ off ])))
>             (clobber (reg:CC 17 flags))
>         ]) ../../gcc/gimple.h:2066 266 {*adddi_1}
>      (expr_list:REG_DEAD (reg:DI 132)
>         (expr_list:REG_DEAD (reg/v:DI 113 [ off ])
>             (expr_list:REG_UNUSED (reg:CC 17 flags)
>                 (nil)))))
> (insn 113 112 114 17 (set (reg:SI 4 si)
>         (reg:SI 94 [ D.97472 ])) ../../gcc/builtins.c:11417 90 
> {*movsi_internal}
>      (expr_list:REG_DEAD (reg:SI 94 [ D.97472 ])
>         (nil)))
> (insn 114 113 115 17 (set (reg:DI 5 di)
>         (mem/f:DI (reg/f:DI 134) [8 *_43+0 S8 A64])) 
> ../../gcc/builtins.c:11417 89 {*movdi_internal}
>      (expr_list:REG_DEAD (reg/f:DI 134)
>         (nil)))
> (call_insn/c/i 115 114 116 17 (set (reg:QI 0 ax)
>         (call (mem:QI (symbol_ref:DI 
> ("_ZL12validate_argPK9tree_node9tree_code") [flags 0x3]  <function_decl 
> 0x7f4045caf200 validate_arg>) [0 validate_arg S1 A8])
>             (const_int 0 [0]))) ../../gcc/builtins.c:11417 680 {*call_value}
>      (expr_list:REG_DEAD (reg:DI 5 di)
>         (expr_list:REG_DEAD (reg:SI 4 si)
>             (expr_list:REG_EH_REGION (const_int 0 [0])
>                 (nil))))
>     (expr_list:REG_FRAME_RELATED_EXPR (use (reg:DI 5 di))
>         (expr_list:REG_BR_PRED (use (reg:SI 4 si))
>             (nil))))
> 
> Thoughts on this?  I can surely investigate more tomorrow, look at more
> random object files that differ and see how many changes it creates.
> Unfortunately, because the patch works at LOG_LINKS creation time, it is
> hard to gather statistics during compilation, because while lots of
> LOG_LINKS will be affected, only much smaller actual combinations will not
> be successfully performed because of the missing LOG_LINKS.
> Typically the call argument load sequence contains just (set (reg:.. hard 
> reg) (reg:.. pseudo))
> insns and we wouldn't combine into that anyway.
> 
> The patch removes the likely_spilled_retval stuff Joern wrote a few years
> ago because I believe this change should obsolete that.  But, have tried to
> reproduce the ICE using sh-elf target and haven't managed to do so, 4.9 is
> quite a different compiler from 2005-ish gcc.
> 
> 2014-01-15  Jakub Jelinek  <ja...@redhat.com>
> 
>       PR rtl-optimization/59477
>       * combine.c (create_log_links): For sequences of insns loading
>       hard register parameters for a call starting with load of some
>       non-fixed likely spilled hard register don't record LOG_LINKS
>       from before the first load to any insn in between the first
>       load and the call insn.  For setters of non-fixed likely
>       spilled hard registers other than in calls and the above sequences
>       disallow recoding of any LOG_LINKS crossing such insns.
>       (struct likely_spilled_retval_info): Remove.
>       (likely_spilled_retval_1, likely_spilled_retval): Remove.
>       (try_combine): Don't call likely_spilled_retval.
> 
>       * g++.dg/torture/pr59477.C: New test.
> 
> --- gcc/combine.c.jj  2014-01-15 08:11:22.636430002 +0100
> +++ gcc/combine.c     2014-01-15 15:45:01.588819713 +0100
> @@ -985,20 +985,54 @@ create_log_links (void)
>    basic_block bb;
>    rtx *next_use, insn;
>    df_ref *def_vec, *use_vec;
> +  int *undo_list;
>  
>    next_use = XCNEWVEC (rtx, max_reg_num ());
> +  undo_list = XCNEWVEC (int, max_reg_num ());
>  
>    /* Pass through each block from the end, recording the uses of each
>       register and establishing log links when def is encountered.
>       Note that we do not clear next_use array in order to save time,
>       so we have to test whether the use is in the same basic block as def.
>  
> +     Avoid creating log links across instructions that set hard registers
> +     that are likely to be spilled, otherwise reload might die if some
> +     instruction that might need some particular hard register is combined
> +     into the area where that hard register is already live.  Usually
> +     likely spilled non-fixed hard registers should occur in the IL
> +     for function call parameter passing and call return value, return
> +     value of the whole function and then for explicit register variables.
> +
> +     For calls expansion in calls.c usually emits instructions setting the
> +     hard registers immediately before the call, with simple rhs.  For this
> +     case the code below attempts to identify the block of insns between the
> +     first insn loading parameter into a hard register where the hard 
> register
> +     is likely spilled and last insn loading parameter into (any) hard
> +     register (call_block_start and call_block_end).  We allow adding log 
> links
> +     to instructions in that range only from instructions in that range, when
> +     in the backward walk we reach the call_block_start insn, we remove all
> +     next_use elements for insns in that block, using undo_list array
> +     and undo_list_head.  That way it is e.g. possible to have log links
> +     from before a call argument setup sequence to insns after the call.
> +     In the call_block_start .. call_block_end range we don't set
> +     likely_spilled_setter_luid though.
> +
> +     For all other setters of likely spilled non-fixed hard registers, we 
> just
> +     disallow any log links across those insns, by setting
> +     likely_spilled_setter_luid and not creating any log links pointing to
> +     insns with higher luid than that.
> +
>       There are a few cases below when we do not consider the definition or
>       usage -- these are taken from original flow.c did. Don't ask me why it 
> is
>       done this way; I don't know and if it works, I don't want to know.  */
>  
>    FOR_EACH_BB_FN (bb, cfun)
>      {
> +      int likely_spilled_setter_luid = -1;
> +      int call_block_start = -1;
> +      int call_block_end = -1;
> +      int undo_list_head = 0;
> +
>        FOR_BB_INSNS_REVERSE (bb, insn)
>          {
>            if (!NONDEBUG_INSN_P (insn))
> @@ -1007,12 +1041,86 @@ create_log_links (void)
>         /* Log links are created only once.  */
>         gcc_assert (!LOG_LINKS (insn));
>  
> +       int luid = DF_INSN_LUID (insn);
> +
> +       if (CALL_P (insn))
> +         {
> +           rtx last_reg = NULL_RTX;
> +           gcc_assert (call_block_start == -1 && call_block_end == -1);
> +           for (rtx link = CALL_INSN_FUNCTION_USAGE (insn);
> +                link != NULL_RTX;
> +                link = XEXP (link, 1))
> +             if (GET_CODE (XEXP (link, 0)) == USE
> +                 && REG_P (XEXP (XEXP (link, 0), 0))
> +                 && HARD_REGISTER_P (XEXP (XEXP (link, 0), 0)))
> +               {
> +                 rtx reg = XEXP (XEXP (link, 0), 0);
> +                 int nregs = hard_regno_nregs[REGNO (reg)][GET_MODE (reg)];
> +                 while (nregs-- > 0)
> +                   if (! TEST_HARD_REG_BIT (fixed_reg_set,
> +                                            REGNO (reg) + nregs)
> +                       && targetm.class_likely_spilled_p
> +                                     (REGNO_REG_CLASS (REGNO (reg) + nregs)))
> +                     break;
> +                 if (nregs >= 0)
> +                   last_reg = reg;
> +               }
> +           if (last_reg)
> +             {
> +               for (rtx prev = PREV_INSN (insn);
> +                    prev; prev = PREV_INSN (prev))
> +                 {
> +                   if (!NONDEBUG_INSN_P (prev))
> +                     continue;
> +                   if (BLOCK_FOR_INSN (prev) != bb
> +                       || !NONJUMP_INSN_P (prev))
> +                     break;
> +                   rtx set = single_set (prev);
> +                   if (! set)
> +                     break;
> +                   rtx dest = SET_DEST (set);
> +                   if (GET_CODE (dest) == SUBREG)
> +                     dest = SUBREG_REG (dest);
> +                   if (!REG_P (dest) || !HARD_REGISTER_P (dest))
> +                     break;
> +                   if (reg_overlap_mentioned_p (dest, last_reg))
> +                     {
> +                       if (call_block_end != -1)
> +                         call_block_start = DF_INSN_LUID (prev);
> +                       break;
> +                     }
> +                   else if (call_block_end == -1)
> +                     call_block_end = DF_INSN_LUID (prev);
> +                 }
> +               if (call_block_start == -1)
> +                 call_block_end = -1;
> +             }
> +         }
> +
>            for (def_vec = DF_INSN_DEFS (insn); *def_vec; def_vec++)
>              {
>             df_ref def = *def_vec;
>                int regno = DF_REF_REGNO (def);
>                rtx use_insn;
>  
> +           if (HARD_REGISTER_NUM_P (regno)
> +               && !CALL_P (insn)
> +               && !(luid >= call_block_start && luid <= call_block_end)
> +               && !(DF_REF_FLAGS (def)
> +                    & (DF_REF_MUST_CLOBBER | DF_REF_MAY_CLOBBER)))
> +             {
> +               enum machine_mode mode = GET_MODE (DF_REF_REAL_REG (def));
> +               int nregs = hard_regno_nregs[regno][mode];
> +               while (nregs-- > 0)
> +                 if (! TEST_HARD_REG_BIT (fixed_reg_set, regno + nregs)
> +                     && targetm.class_likely_spilled_p
> +                                 (REGNO_REG_CLASS (regno + nregs)))
> +                   {
> +                     likely_spilled_setter_luid = luid;
> +                     break;
> +                   }
> +             }
> +
>                if (!next_use[regno])
>                  continue;
>  
> @@ -1034,7 +1142,10 @@ create_log_links (void)
>                  continue;
>  
>                use_insn = next_use[regno];
> -              if (BLOCK_FOR_INSN (use_insn) == bb)
> +           if (BLOCK_FOR_INSN (use_insn) == bb
> +               && (likely_spilled_setter_luid == -1
> +                   || (DF_INSN_LUID (use_insn)
> +                       <= likely_spilled_setter_luid)))
>                  {
>                    /* flow.c claimed:
>  
> @@ -1060,6 +1171,33 @@ create_log_links (void)
>                next_use[regno] = NULL_RTX;
>              }
>  
> +       if (luid == call_block_start)
> +         {
> +           /* At the start of the call argument block, remove
> +              all next_use array values pointing to instructions
> +              within the block.  The undo_list array normally contains
> +              all zeros, non-zero values form a chain of what needs
> +              to be cleared in next_use array.  Value of -1 in the array
> +              means tail of the chain, other non-zero values are next
> +              regno that need to be cleared plus 1 (as zero is a valid
> +              register number).  */
> +           if (undo_list_head)
> +             {
> +               int r = undo_list_head;
> +               do
> +                 {
> +                   int regno = r - 1;
> +                   r = undo_list[regno];
> +                   undo_list[regno] = 0;
> +                   next_use[regno] = NULL_RTX;
> +                 }
> +               while (r != -1);
> +               undo_list_head = 0;
> +             }
> +           call_block_start = -1;
> +           call_block_end = -1;
> +         }
> +
>            for (use_vec = DF_INSN_USES (insn); *use_vec; use_vec++)
>              {
>             df_ref use = *use_vec;
> @@ -1070,11 +1208,23 @@ create_log_links (void)
>                if (DF_REF_FLAGS (use) & DF_REF_CALL_STACK_USAGE)
>                  continue;
>  
> +           if (luid >= call_block_start
> +               && luid <= call_block_end
> +               && undo_list[regno] == 0)
> +             {
> +               if (undo_list_head == 0)
> +                 undo_list[regno] = -1;
> +               else
> +                 undo_list[regno] = undo_list_head;
> +               undo_list_head = regno + 1;
> +             }
> +
>                next_use[regno] = insn;
>              }
>          }
>      }
>  
> +  free (undo_list);
>    free (next_use);
>  }
>  
> @@ -2222,87 +2372,6 @@ cant_combine_insn_p (rtx insn)
>    return 0;
>  }
>  
> -struct likely_spilled_retval_info
> -{
> -  unsigned regno, nregs;
> -  unsigned mask;
> -};
> -
> -/* Called via note_stores by likely_spilled_retval_p.  Remove from info->mask
> -   hard registers that are known to be written to / clobbered in full.  */
> -static void
> -likely_spilled_retval_1 (rtx x, const_rtx set, void *data)
> -{
> -  struct likely_spilled_retval_info *const info =
> -    (struct likely_spilled_retval_info *) data;
> -  unsigned regno, nregs;
> -  unsigned new_mask;
> -
> -  if (!REG_P (XEXP (set, 0)))
> -    return;
> -  regno = REGNO (x);
> -  if (regno >= info->regno + info->nregs)
> -    return;
> -  nregs = hard_regno_nregs[regno][GET_MODE (x)];
> -  if (regno + nregs <= info->regno)
> -    return;
> -  new_mask = (2U << (nregs - 1)) - 1;
> -  if (regno < info->regno)
> -    new_mask >>= info->regno - regno;
> -  else
> -    new_mask <<= regno - info->regno;
> -  info->mask &= ~new_mask;
> -}
> -
> -/* Return nonzero iff part of the return value is live during INSN, and
> -   it is likely spilled.  This can happen when more than one insn is needed
> -   to copy the return value, e.g. when we consider to combine into the
> -   second copy insn for a complex value.  */
> -
> -static int
> -likely_spilled_retval_p (rtx insn)
> -{
> -  rtx use = BB_END (this_basic_block);
> -  rtx reg, p;
> -  unsigned regno, nregs;
> -  /* We assume here that no machine mode needs more than
> -     32 hard registers when the value overlaps with a register
> -     for which TARGET_FUNCTION_VALUE_REGNO_P is true.  */
> -  unsigned mask;
> -  struct likely_spilled_retval_info info;
> -
> -  if (!NONJUMP_INSN_P (use) || GET_CODE (PATTERN (use)) != USE || insn == 
> use)
> -    return 0;
> -  reg = XEXP (PATTERN (use), 0);
> -  if (!REG_P (reg) || !targetm.calls.function_value_regno_p (REGNO (reg)))
> -    return 0;
> -  regno = REGNO (reg);
> -  nregs = hard_regno_nregs[regno][GET_MODE (reg)];
> -  if (nregs == 1)
> -    return 0;
> -  mask = (2U << (nregs - 1)) - 1;
> -
> -  /* Disregard parts of the return value that are set later.  */
> -  info.regno = regno;
> -  info.nregs = nregs;
> -  info.mask = mask;
> -  for (p = PREV_INSN (use); info.mask && p != insn; p = PREV_INSN (p))
> -    if (INSN_P (p))
> -      note_stores (PATTERN (p), likely_spilled_retval_1, &info);
> -  mask = info.mask;
> -
> -  /* Check if any of the (probably) live return value registers is
> -     likely spilled.  */
> -  nregs --;
> -  do
> -    {
> -      if ((mask & 1 << nregs)
> -       && targetm.class_likely_spilled_p (REGNO_REG_CLASS (regno + nregs)))
> -     return 1;
> -    } while (nregs--);
> -  return 0;
> -}
> -
>  /* Adjust INSN after we made a change to its destination.
>  
>     Changing the destination can invalidate notes that say something about
> @@ -2512,8 +2581,7 @@ try_combine (rtx i3, rtx i2, rtx i1, rtx
>    if (cant_combine_insn_p (i3)
>        || cant_combine_insn_p (i2)
>        || (i1 && cant_combine_insn_p (i1))
> -      || (i0 && cant_combine_insn_p (i0))
> -      || likely_spilled_retval_p (i3))
> +      || (i0 && cant_combine_insn_p (i0)))
>      return 0;
>  
>    combine_attempts++;
> --- gcc/testsuite/g++.dg/torture/pr59477.C.jj 2014-01-15 09:38:18.348864654 
> +0100
> +++ gcc/testsuite/g++.dg/torture/pr59477.C    2014-01-15 09:38:18.348864654 
> +0100
> @@ -0,0 +1,25 @@
> +// PR rtl-optimization/59477
> +// { dg-do compile }
> +
> +struct A
> +{
> +  unsigned *a;
> +  unsigned long b;
> +  A (unsigned long x) : a (), b (x) {}
> +};
> +
> +struct B
> +{
> +  B (int);
> +  B (const B &) {}
> +};
> +
> +B bar (B, B, A);
> +int v;
> +
> +void
> +foo ()
> +{
> +  B c = 0;
> +  bar (c, c, A (1ULL << v));
> +}
> 
>       Jakub
> 
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imend"orffer

Re: [PATCH] Don't combine across likely spilled hard reg setters (PR rtl-optimization/59477)

Reply via email to