Anything I once knew about reorg.c has long since faded away, but since
noone else has reviewed it…

Do you know what guarantees that REG_DEAD and REG_UNUSED notes are
reliable during reorg.c?  It was written at a time when passes were
expected to keep the notes up-to-date, but that's not true these days.
My worry is that they might be mostly right, but just stale enough
to be harmful in corner cases.

Hans-Peter Nilsson via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> Originally I thought to bootstrap this patch on MIPS and SPARC
> since they're both delayed-branch-slot targets but I
> reconsidered, as neither is a TARGET_FLAGS_REGNUM target.  It
> seems only visium and CRIS has this feature set, and I see no
> trace of visium in neither newlib nor the simulator next to
> glibc.  So, I just tested cris-elf.
>
> This handles TARGET_FLAGS_REGNUM clobbering insns as delay-slot
> fillers using a method similar to that in commit 33c2207d3fda,
> where care was taken for fill_simple_delay_slots to allow such
> insns when scanning for delay-slot fillers *backwards* (before
> the insn).
>
> A TARGET_FLAGS_REGNUM target is typically a former cc0 target.
> For cc0 targets, insns don't mention clobbering cc0, so the
> clobbers are mentioned in the "resources" only as a special
> entity and only for compare-insns and branches, where the cc0
> value matters.
>
> In contrast, with TARGET_FLAGS_REGNUM, most insns clobber it and
> the register liveness detection in reorg.c / resource.c treats
> that as a blocker (for other insns mentioning it, i.e. most)
> when looking for delay-slot-filling candidates.  This means that
> when comparing core and performance for a delay-slot cc0 target
> before and after the de-cc0 conversion, the inability to fill a
> delay slot after conversion manifests as a regression.  This was
> one such case, for CRIS, with random_bitstring in
> gcc.c-torture/execute/arith-rand-ll.c as well as the target
> libgcc division function.
>
> After this, all known performance regressions compared to cc0
> are fixed.
>
> Ok to commit?
>
> gcc:
>       PR target/93372
>       * reorg.c (fill_slots_from_thread): Allow trial insns that clobber
>       TARGET_FLAGS_REGNUM as delay-slot fillers.
>
> gcc/testsuite:
>       PR target/93372
>       * gcc.target/cris/pr93372-47.c: New test.
> ---
>  gcc/reorg.c                                | 37 +++++++++++++++++++++-
>  gcc/testsuite/gcc.target/cris/pr93372-47.c | 49 
> ++++++++++++++++++++++++++++++
>  2 files changed, 85 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/cris/pr93372-47.c
>
> diff --git a/gcc/reorg.c b/gcc/reorg.c
> index dfd7494bf..83161caa0 100644
> --- a/gcc/reorg.c
> +++ b/gcc/reorg.c
> @@ -2411,6 +2411,21 @@ fill_slots_from_thread (rtx_jump_insn *insn, rtx 
> condition,
>    CLEAR_RESOURCE (&needed);
>    CLEAR_RESOURCE (&set);
>  
> +  /* Handle the flags register specially, to be able to accept a
> +     candidate that clobbers it.  See also fill_simple_delay_slots.  */
> +  bool filter_flags
> +    = (slots_to_fill == 1
> +       && targetm.flags_regnum != INVALID_REGNUM
> +       && find_regno_note (insn, REG_DEAD, targetm.flags_regnum));
> +  struct resources fset;
> +  struct resources flags_res;
> +  if (filter_flags)
> +    {
> +      CLEAR_RESOURCE (&fset);
> +      CLEAR_RESOURCE (&flags_res);
> +      SET_HARD_REG_BIT (flags_res.regs, targetm.flags_regnum);
> +    }
> +
>    /* If we do not own this thread, we must stop as soon as we find
>       something that we can't put in a delay slot, since all we can do
>       is branch into THREAD at a later point.  Therefore, labels stop
> @@ -2439,8 +2454,18 @@ fill_slots_from_thread (rtx_jump_insn *insn, rtx 
> condition,
>        /* If TRIAL conflicts with the insns ahead of it, we lose.  Also,
>        don't separate or copy insns that set and use CC0.  */
>        if (! insn_references_resource_p (trial, &set, true)
> -       && ! insn_sets_resource_p (trial, &set, true)
> +       && ! insn_sets_resource_p (trial, filter_flags ? &fset : &set, true)
>         && ! insn_sets_resource_p (trial, &needed, true)
> +       /* If we're handling sets to the flags register specially, we
> +          only allow an insn into a delay-slot, if it either:
> +          - doesn't set the flags register,
> +          - the "set" of the flags register isn't used (clobbered),
> +          - insns between the delay-slot insn and the trial-insn
> +          as accounted in "set", have not affected the flags register.  */
> +       && (! filter_flags
> +           || ! insn_sets_resource_p (trial, &flags_res, true)
> +           || find_regno_note (trial, REG_UNUSED, targetm.flags_regnum)
> +           || ! TEST_HARD_REG_BIT (set.regs, targetm.flags_regnum))
>         && (!HAVE_cc0 || (! (reg_mentioned_p (cc0_rtx, pat)
>                             && (! own_thread || ! sets_cc0_p (pat)))))
>         && ! can_throw_internal (trial))
> @@ -2618,6 +2643,16 @@ fill_slots_from_thread (rtx_jump_insn *insn, rtx 
> condition,
>        lose = 1;
>        mark_set_resources (trial, &set, 0, MARK_SRC_DEST_CALL);
>        mark_referenced_resources (trial, &needed, true);
> +      if (filter_flags)
> +     {
> +       mark_set_resources (trial, &fset, 0, MARK_SRC_DEST_CALL);
> +
> +       /* Groups of flags-register setters with users should not
> +          affect opportunities to move flags-register-setting insns
> +          (clobbers) into the delay-slot.  */
> +       CLEAR_HARD_REG_BIT (needed.regs, targetm.flags_regnum);

If we do this, what stops us from trying to move a flags-register user
ahead of the associated setter, when the user doesn't itself set the
flags register?  Feels like there should be some test involving
insn_references_resource_p (trial, &flags_res, true) in the
(! filter_flags || …) condition above.

Thanks,
Richard

> +       CLEAR_HARD_REG_BIT (fset.regs, targetm.flags_regnum);
> +     }
>  
>        /* Ensure we don't put insns between the setting of cc and the 
> comparison
>        by moving a setting of cc into an earlier delay slot since these insns
> diff --git a/gcc/testsuite/gcc.target/cris/pr93372-47.c 
> b/gcc/testsuite/gcc.target/cris/pr93372-47.c
> new file mode 100644
> index 000000000..8d80ae6b4
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/cris/pr93372-47.c
> @@ -0,0 +1,49 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -march=v10" } */
> +/* { dg-final { scan-assembler-times {\tnop} 1 } } */
> +
> +/* A somewhat brittle test-case, checking that we have (only) one
> +   unfilled delay-slot in random_bitstring: there might be none or two
> +   or more, and general improvements may lead to unfilled delay-slots.
> +   When the scan-assembler-times directive regresses, re-run
> +   gcc.c-torture/execute/arith-rand-ll.c, check cycle-level
> +   execution-time regressions in random_bitstring and take appropriate
> +   action.  */
> +
> +static long long
> +simple_rand ()
> +{
> +  static unsigned long long seed = 47114711;
> +  unsigned long long this = seed * 1103515245 + 12345;
> +  seed = this;
> +  return this >> 8;
> +}
> +
> +unsigned long long int
> +random_bitstring ()
> +{
> +  unsigned long long int x;
> +  int n_bits;
> +  long long ran;
> +  int tot_bits = 0;
> +
> +  x = 0;
> +  for (;;)
> +    {
> +      ran = simple_rand ();
> +      n_bits = (ran >> 1) % 16;
> +      tot_bits += n_bits;
> +
> +      if (n_bits == 0)
> +     return x;
> +      else
> +     {
> +       x <<= n_bits;
> +       if (ran & 1)
> +         x |= (1 << n_bits) - 1;
> +
> +       if (tot_bits > 8 * sizeof (long long) + 6)
> +         return x;
> +     }
> +    }
> +}

Reply via email to