Re: reorg.c (fill_slots_from_thread): Improve for TARGET_FLAGS_REGNUM targets

2020-09-11 Thread Hans-Peter Nilsson via Gcc-patches
> From: Eric Botcazou 
> Date: Fri, 11 Sep 2020 13:09:48 +0200

> > @@ -2618,6 +2643,16 @@ fill_slots_from_thread (rtx_jump_insn *insn, rtx
> > condition, lose = 1;
> >mark_set_resources (trial, , 0, MARK_SRC_DEST_CALL);
> >mark_referenced_resources (trial, , true);
> > +  if (filter_flags)
> > +   {
> > + mark_set_resources (trial, , 0, MARK_SRC_DEST_CALL);
> > +
> > + /* Groups of flags-register setters with users should not
> > +affect opportunities to move flags-register-setting insns
> > +(clobbers) into the delay-slot.  */
> > + CLEAR_HARD_REG_BIT (needed.regs, targetm.flags_regnum);
> > + CLEAR_HARD_REG_BIT (fset.regs, targetm.flags_regnum);
> > +   }
> 
> Don't you need to put the new block before mark_referenced_resources (as I 
> did 
> in fill_simple_delay_slots) in case the needed insn reads the flags register?

I think the simplest answer is that wouldn't be good; it'd leave the
"CLEAR_HARD_REG_BIT (needed.regs, targetm.flags_regnum);" without
effect.

Note also fill_simple_delay_slots is moving insns from the before the
delay-slotting insn, while this is from a "tread" (or arm, or leg)
"after".

brgds, H-P


Re: reorg.c (fill_slots_from_thread): Improve for TARGET_FLAGS_REGNUM targets

2020-09-11 Thread Hans-Peter Nilsson via Gcc-patches
> From: Hans-Peter Nilsson 
> Date: Fri, 11 Sep 2020 13:24:18 +0200

> > > @@ -2618,6 +2643,16 @@ fill_slots_from_thread (rtx_jump_insn *insn, rtx
> > > condition, lose = 1;
> > >mark_set_resources (trial, , 0, MARK_SRC_DEST_CALL);
> > >mark_referenced_resources (trial, , true);
> > > +  if (filter_flags)
> > > + {
> > > +   mark_set_resources (trial, , 0, MARK_SRC_DEST_CALL);
> > > +
> > > +   /* Groups of flags-register setters with users should not
> > > +  affect opportunities to move flags-register-setting insns
> > > +  (clobbers) into the delay-slot.  */
> > > +   CLEAR_HARD_REG_BIT (needed.regs, targetm.flags_regnum);
> > > +   CLEAR_HARD_REG_BIT (fset.regs, targetm.flags_regnum);
> > > + }
> > 
> > Don't you need to put the new block before mark_referenced_resources (as I 
> > did 
> > in fill_simple_delay_slots) in case the needed insn reads the flags 
> > register?
> 
> It can't be needed, as the flags register is dead in the insn with the
> delay-slot, as part of the filter_flags condition.

Uh, I'll get back with a more thoughtful reply...

brgds, H-P


Re: reorg.c (fill_slots_from_thread): Improve for TARGET_FLAGS_REGNUM targets

2020-09-11 Thread Hans-Peter Nilsson via Gcc-patches
> From: Eric Botcazou 
> CC: "gcc-patches@gcc.gnu.org" 
> Date: Fri, 11 Sep 2020 13:09:48 +0200
> received-spf: None (smtp1.axis.com: no sender authenticity  information
>  available from domain of  postmas...@mail-wr1-f54.google.com) identity=helo;
>   client-ip=209.85.221.54; receiver=smtp1.axis.com;
>   envelope-from="botca...@adacore.com";
>   x-sender="postmas...@mail-wr1-f54.google.com";
>   x-conformance=sidf_compatible
> Old-Content-Type: multipart/alternative;
>   boundary="_000_2479570Ksvo8Q1nxZfomalhaut_"
> Content-Type: text/plain; charset=iso-8859-1
> 
> > @@ -2618,6 +2643,16 @@ fill_slots_from_thread (rtx_jump_insn *insn, rtx
> > condition, lose = 1;
> >mark_set_resources (trial, , 0, MARK_SRC_DEST_CALL);
> >mark_referenced_resources (trial, , true);
> > +  if (filter_flags)
> > +   {
> > + mark_set_resources (trial, , 0, MARK_SRC_DEST_CALL);
> > +
> > + /* Groups of flags-register setters with users should not
> > +affect opportunities to move flags-register-setting insns
> > +(clobbers) into the delay-slot.  */
> > + CLEAR_HARD_REG_BIT (needed.regs, targetm.flags_regnum);
> > + CLEAR_HARD_REG_BIT (fset.regs, targetm.flags_regnum);
> > +   }
> 
> Don't you need to put the new block before mark_referenced_resources (as I 
> did 
> in fill_simple_delay_slots) in case the needed insn reads the flags register?

It can't be needed, as the flags register is dead in the insn with the
delay-slot, as part of the filter_flags condition.

brgds, H-P


Re: reorg.c (fill_slots_from_thread): Improve for TARGET_FLAGS_REGNUM targets

2020-09-11 Thread Eric Botcazou
> @@ -2618,6 +2643,16 @@ fill_slots_from_thread (rtx_jump_insn *insn, rtx
> condition, lose = 1;
>mark_set_resources (trial, , 0, MARK_SRC_DEST_CALL);
>mark_referenced_resources (trial, , true);
> +  if (filter_flags)
> + {
> +   mark_set_resources (trial, , 0, MARK_SRC_DEST_CALL);
> +
> +   /* Groups of flags-register setters with users should not
> +  affect opportunities to move flags-register-setting insns
> +  (clobbers) into the delay-slot.  */
> +   CLEAR_HARD_REG_BIT (needed.regs, targetm.flags_regnum);
> +   CLEAR_HARD_REG_BIT (fset.regs, targetm.flags_regnum);
> + }

Don't you need to put the new block before mark_referenced_resources (as I did 
in fill_simple_delay_slots) in case the needed insn reads the flags register?

-- 
Eric Botcazou




Re: reorg.c (fill_slots_from_thread): Improve for TARGET_FLAGS_REGNUM targets

2020-08-24 Thread Jeff Law via Gcc-patches
On Fri, 2020-08-14 at 07:27 +0200, Hans-Peter Nilsson via Gcc-patches wrote:
> Originally I thought to bootstrap this patch on MIPS and SPARC
> since they're both delayed-branch-slot targets but I
> reconsidered, as neither is a TARGET_FLAGS_REGNUM target.  It
> seems only visium and CRIS has this feature set, and I see no
> trace of visium in neither newlib nor the simulator next to
> glibc.  So, I just tested cris-elf.
[ Sorry, been focused on non-upstream stuff for the last month+, so yea,
  some stuff is falling through the cracks. ]

Ibuild newlib on visium daily.  I also run the gcc testsuite, but using dummy
simulator.  So we're really just testing that the code generator doesn't fault
and the various dg-scan style tests.

http://3.14.90.209:8080/job/visium-elf/


> 
> This handles TARGET_FLAGS_REGNUM clobbering insns as delay-slot
> fillers using a method similar to that in commit 33c2207d3fda,
> where care was taken for fill_simple_delay_slots to allow such
> insns when scanning for delay-slot fillers *backwards* (before
> the insn).
> 
> A TARGET_FLAGS_REGNUM target is typically a former cc0 target.
> For cc0 targets, insns don't mention clobbering cc0, so the
> clobbers are mentioned in the "resources" only as a special
> entity and only for compare-insns and branches, where the cc0
> value matters.
> 
> In contrast, with TARGET_FLAGS_REGNUM, most insns clobber it and
> the register liveness detection in reorg.c / resource.c treats
> that as a blocker (for other insns mentioning it, i.e. most)
> when looking for delay-slot-filling candidates.  This means that
> when comparing core and performance for a delay-slot cc0 target
> before and after the de-cc0 conversion, the inability to fill a
> delay slot after conversion manifests as a regression.  This was
> one such case, for CRIS, with random_bitstring in
> gcc.c-torture/execute/arith-rand-ll.c as well as the target
> libgcc division function.
I wouldn't be surprised if H8 would have tripped over this as well.  Converting
it to REG_CC is still in-progress and a failure to fill a delay slot would 
likely
have been missed in my performance testing to-date (delay slot filling isn't
terribly important on the H8 thankfully).


> 
> After this, all known performance regressions compared to cc0
> are fixed.
> 
> Ok to commit?
> 
> gcc:
>   PR target/93372
>   * reorg.c (fill_slots_from_thread): Allow trial insns that clobber
>   TARGET_FLAGS_REGNUM as delay-slot fillers.
> 
> gcc/testsuite:
>   PR target/93372
>   * gcc.target/cris/pr93372-47.c: New test.
It looks reasonable to me.

jeff
> 



Re: reorg.c (fill_slots_from_thread): Improve for TARGET_FLAGS_REGNUM targets

2020-08-20 Thread Richard Sandiford
Hans-Peter Nilsson via Gcc-patches  writes:
>> > @@ -2411,6 +2411,21 @@ fill_slots_from_thread (rtx_jump_insn *insn, rtx 
>> > condition,
>> >CLEAR_RESOURCE ();
>> >CLEAR_RESOURCE ();
>> >  
>> > +  /* Handle the flags register specially, to be able to accept a
>> > + candidate that clobbers it.  See also fill_simple_delay_slots.  */
>> > +  bool filter_flags
>> > += (slots_to_fill == 1
>> > +   && targetm.flags_regnum != INVALID_REGNUM
>> > +   && find_regno_note (insn, REG_DEAD, targetm.flags_regnum));
>> > +  struct resources fset;
>> > +  struct resources flags_res;
>> > +  if (filter_flags)
>> > +{
>> > +  CLEAR_RESOURCE ();
>> > +  CLEAR_RESOURCE (_res);
>> > +  SET_HARD_REG_BIT (flags_res.regs, targetm.flags_regnum);
>> > +}
>> > +
>> >/* If we do not own this thread, we must stop as soon as we find
>> >   something that we can't put in a delay slot, since all we can do
>> >   is branch into THREAD at a later point.  Therefore, labels stop
>> > @@ -2439,8 +2454,18 @@ fill_slots_from_thread (rtx_jump_insn *insn, rtx 
>> > condition,
>> >/* If TRIAL conflicts with the insns ahead of it, we lose.  Also,
>> > don't separate or copy insns that set and use CC0.  */
>> >if (! insn_references_resource_p (trial, , true)
>> > -&& ! insn_sets_resource_p (trial, , true)
>> > +&& ! insn_sets_resource_p (trial, filter_flags ?  : , true)
>> >  && ! insn_sets_resource_p (trial, , true)
>> > +/* If we're handling sets to the flags register specially, we
>> > +   only allow an insn into a delay-slot, if it either:
>> > +   - doesn't set the flags register,
>> > +   - the "set" of the flags register isn't used (clobbered),
>> > +   - insns between the delay-slot insn and the trial-insn
>> > +   as accounted in "set", have not affected the flags register.  */
>> > +&& (! filter_flags
>> > +|| ! insn_sets_resource_p (trial, _res, true)
>> > +|| find_regno_note (trial, REG_UNUSED, targetm.flags_regnum)
>> > +|| ! TEST_HARD_REG_BIT (set.regs, targetm.flags_regnum))
>> >  && (!HAVE_cc0 || (! (reg_mentioned_p (cc0_rtx, pat)
>> >  && (! own_thread || ! sets_cc0_p (pat)
>> >  && ! can_throw_internal (trial))
>> > @@ -2618,6 +2643,16 @@ fill_slots_from_thread (rtx_jump_insn *insn, rtx 
>> > condition,
>> >lose = 1;
>> >mark_set_resources (trial, , 0, MARK_SRC_DEST_CALL);
>> >mark_referenced_resources (trial, , true);
>> > +  if (filter_flags)
>> > +  {
>> > +mark_set_resources (trial, , 0, MARK_SRC_DEST_CALL);
>> > +
>> > +/* Groups of flags-register setters with users should not
>> > +   affect opportunities to move flags-register-setting insns
>> > +   (clobbers) into the delay-slot.  */
>> > +CLEAR_HARD_REG_BIT (needed.regs, targetm.flags_regnum);
>> 
>> If we do this, what stops us from trying to move a flags-register user
>> ahead of the associated setter, when the user doesn't itself set the
>> flags register?
>
> First, all sets are still in set (set.regs) including any that
> set the flags register between the insn with the delay-slot and
> the trial.
>
> (That's why it's separate from fset (fset.regs).  I used the
> same naming scheme as in the named commit, but perhaps a better
> name would be set_with_flags_filtered or something.)
>
>>  Feels like there should be some test involving
>> insn_references_resource_p (trial, _res, true) in the
>> (! filter_flags || ...) condition above.
>
> There is: the "! insn_references_resource_p (trial, , true)"
> (pre-existing in the context above the patch), so we don't move
> anything that references anything set (only additional flags
> register setters where the result is unused).

Ah, right, I think I was getting them the wrong way around.

Looks OK to me, but give Eric 24 hrs to object/comment.
Like you say, he's better qualified to review this, I was just
stepping in because it looked like the patch had fallen through
the cracks.

Thanks,
Richard


Re: reorg.c (fill_slots_from_thread): Improve for TARGET_FLAGS_REGNUM targets

2020-08-20 Thread Hans-Peter Nilsson via Gcc-patches
> From: Richard Sandiford 
> Date: Thu, 20 Aug 2020 10:30:56 +0200

> Anything I once knew about reorg.c has long since faded away, but since
> noone else has reviewed it...

Thanks.  I forgot to add PATCH and/or RFA: in the subject and
forgot to CC Eric, assuming he's interested (I did CC him as a
heads-up that this was coming, that's why I added you now, Eric).

> Do you know what guarantees that REG_DEAD and REG_UNUSED notes are
> reliable during reorg.c?  It was written at a time when passes were
> expected to keep the notes up-to-date, but that's not true these days.
> My worry is that they might be mostly right, but just stale enough
> to be harmful in corner cases.

They are depended upon in reorg.c as absolutely accurate (and
kept updated), and there's IMO enough empirical evidence that
they still are.  In this particular patch, I'm using it in the
same way Eric used it in 33c2207d3fda for
fill_simple_delay_slots (i.e. it would be similarly flawed), but
it's also used all over for the old-style dataflow analysis done
here.  We'd be seeing failing for those "corner cases" all
around delayed-branch targets if that didn't work.  I should be
able to use the existing machinery in this patch.

BTW, I happened to notice that bugs here are also somewhat more
visible than your ordinary wrong-result bug. :)

> Hans-Peter Nilsson via Gcc-patches  writes:
> > Originally I thought to bootstrap this patch on MIPS and SPARC
> > since they're both delayed-branch-slot targets but I
> > reconsidered, as neither is a TARGET_FLAGS_REGNUM target.  It
> > seems only visium and CRIS has this feature set, and I see no
> > trace of visium in neither newlib nor the simulator next to
> > glibc.  So, I just tested cris-elf.
> >
> > This handles TARGET_FLAGS_REGNUM clobbering insns as delay-slot
> > fillers using a method similar to that in commit 33c2207d3fda,
> > where care was taken for fill_simple_delay_slots to allow such
> > insns when scanning for delay-slot fillers *backwards* (before
> > the insn).
> >
> > A TARGET_FLAGS_REGNUM target is typically a former cc0 target.
> > For cc0 targets, insns don't mention clobbering cc0, so the
> > clobbers are mentioned in the "resources" only as a special
> > entity and only for compare-insns and branches, where the cc0
> > value matters.
> >
> > In contrast, with TARGET_FLAGS_REGNUM, most insns clobber it and
> > the register liveness detection in reorg.c / resource.c treats
> > that as a blocker (for other insns mentioning it, i.e. most)
> > when looking for delay-slot-filling candidates.  This means that
> > when comparing core and performance for a delay-slot cc0 target
> > before and after the de-cc0 conversion, the inability to fill a
> > delay slot after conversion manifests as a regression.  This was
> > one such case, for CRIS, with random_bitstring in
> > gcc.c-torture/execute/arith-rand-ll.c as well as the target
> > libgcc division function.
> >
> > After this, all known performance regressions compared to cc0
> > are fixed.
> >
> > Ok to commit?
> >
> > gcc:
> > PR target/93372
> > * reorg.c (fill_slots_from_thread): Allow trial insns that clobber
> > TARGET_FLAGS_REGNUM as delay-slot fillers.
> >
> > gcc/testsuite:
> > PR target/93372
> > * gcc.target/cris/pr93372-47.c: New test.
> > ---
> >  gcc/reorg.c| 37 +-
> >  gcc/testsuite/gcc.target/cris/pr93372-47.c | 49 
> > ++
> >  2 files changed, 85 insertions(+), 1 deletion(-)
> >  create mode 100644 gcc/testsuite/gcc.target/cris/pr93372-47.c
> >
> > diff --git a/gcc/reorg.c b/gcc/reorg.c
> > index dfd7494bf..83161caa0 100644
> > --- a/gcc/reorg.c
> > +++ b/gcc/reorg.c
> > @@ -2411,6 +2411,21 @@ fill_slots_from_thread (rtx_jump_insn *insn, rtx 
> > condition,
> >CLEAR_RESOURCE ();
> >CLEAR_RESOURCE ();
> >  
> > +  /* Handle the flags register specially, to be able to accept a
> > + candidate that clobbers it.  See also fill_simple_delay_slots.  */
> > +  bool filter_flags
> > += (slots_to_fill == 1
> > +   && targetm.flags_regnum != INVALID_REGNUM
> > +   && find_regno_note (insn, REG_DEAD, targetm.flags_regnum));
> > +  struct resources fset;
> > +  struct resources flags_res;
> > +  if (filter_flags)
> > +{
> > +  CLEAR_RESOURCE ();
> > +  CLEAR_RESOURCE (_res);
> > +  SET_HARD_REG_BIT (flags_res.regs, targetm.flags_regnum);
> > +}
> > +
> >/* If we do not own this thread, we must stop as soon as we find
> >   something that we can't put in a delay slot, since all we can do
> >   is branch into THREAD at a later point.  Therefore, labels stop
> > @@ -2439,8 +2454,18 @@ fill_slots_from_thread (rtx_jump_insn *insn, rtx 
> > condition,
> >/* If TRIAL conflicts with the insns ahead of it, we lose.  Also,
> >  don't separate or copy insns that set and use CC0.  */
> >if (! insn_references_resource_p (trial, , true)
> > - && ! 

Re: reorg.c (fill_slots_from_thread): Improve for TARGET_FLAGS_REGNUM targets

2020-08-20 Thread Richard Sandiford
Anything I once knew about reorg.c has long since faded away, but since
noone else has reviewed it…

Do you know what guarantees that REG_DEAD and REG_UNUSED notes are
reliable during reorg.c?  It was written at a time when passes were
expected to keep the notes up-to-date, but that's not true these days.
My worry is that they might be mostly right, but just stale enough
to be harmful in corner cases.

Hans-Peter Nilsson via Gcc-patches  writes:
> Originally I thought to bootstrap this patch on MIPS and SPARC
> since they're both delayed-branch-slot targets but I
> reconsidered, as neither is a TARGET_FLAGS_REGNUM target.  It
> seems only visium and CRIS has this feature set, and I see no
> trace of visium in neither newlib nor the simulator next to
> glibc.  So, I just tested cris-elf.
>
> This handles TARGET_FLAGS_REGNUM clobbering insns as delay-slot
> fillers using a method similar to that in commit 33c2207d3fda,
> where care was taken for fill_simple_delay_slots to allow such
> insns when scanning for delay-slot fillers *backwards* (before
> the insn).
>
> A TARGET_FLAGS_REGNUM target is typically a former cc0 target.
> For cc0 targets, insns don't mention clobbering cc0, so the
> clobbers are mentioned in the "resources" only as a special
> entity and only for compare-insns and branches, where the cc0
> value matters.
>
> In contrast, with TARGET_FLAGS_REGNUM, most insns clobber it and
> the register liveness detection in reorg.c / resource.c treats
> that as a blocker (for other insns mentioning it, i.e. most)
> when looking for delay-slot-filling candidates.  This means that
> when comparing core and performance for a delay-slot cc0 target
> before and after the de-cc0 conversion, the inability to fill a
> delay slot after conversion manifests as a regression.  This was
> one such case, for CRIS, with random_bitstring in
> gcc.c-torture/execute/arith-rand-ll.c as well as the target
> libgcc division function.
>
> After this, all known performance regressions compared to cc0
> are fixed.
>
> Ok to commit?
>
> gcc:
>   PR target/93372
>   * reorg.c (fill_slots_from_thread): Allow trial insns that clobber
>   TARGET_FLAGS_REGNUM as delay-slot fillers.
>
> gcc/testsuite:
>   PR target/93372
>   * gcc.target/cris/pr93372-47.c: New test.
> ---
>  gcc/reorg.c| 37 +-
>  gcc/testsuite/gcc.target/cris/pr93372-47.c | 49 
> ++
>  2 files changed, 85 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/cris/pr93372-47.c
>
> diff --git a/gcc/reorg.c b/gcc/reorg.c
> index dfd7494bf..83161caa0 100644
> --- a/gcc/reorg.c
> +++ b/gcc/reorg.c
> @@ -2411,6 +2411,21 @@ fill_slots_from_thread (rtx_jump_insn *insn, rtx 
> condition,
>CLEAR_RESOURCE ();
>CLEAR_RESOURCE ();
>  
> +  /* Handle the flags register specially, to be able to accept a
> + candidate that clobbers it.  See also fill_simple_delay_slots.  */
> +  bool filter_flags
> += (slots_to_fill == 1
> +   && targetm.flags_regnum != INVALID_REGNUM
> +   && find_regno_note (insn, REG_DEAD, targetm.flags_regnum));
> +  struct resources fset;
> +  struct resources flags_res;
> +  if (filter_flags)
> +{
> +  CLEAR_RESOURCE ();
> +  CLEAR_RESOURCE (_res);
> +  SET_HARD_REG_BIT (flags_res.regs, targetm.flags_regnum);
> +}
> +
>/* If we do not own this thread, we must stop as soon as we find
>   something that we can't put in a delay slot, since all we can do
>   is branch into THREAD at a later point.  Therefore, labels stop
> @@ -2439,8 +2454,18 @@ fill_slots_from_thread (rtx_jump_insn *insn, rtx 
> condition,
>/* If TRIAL conflicts with the insns ahead of it, we lose.  Also,
>don't separate or copy insns that set and use CC0.  */
>if (! insn_references_resource_p (trial, , true)
> -   && ! insn_sets_resource_p (trial, , true)
> +   && ! insn_sets_resource_p (trial, filter_flags ?  : , true)
> && ! insn_sets_resource_p (trial, , true)
> +   /* If we're handling sets to the flags register specially, we
> +  only allow an insn into a delay-slot, if it either:
> +  - doesn't set the flags register,
> +  - the "set" of the flags register isn't used (clobbered),
> +  - insns between the delay-slot insn and the trial-insn
> +  as accounted in "set", have not affected the flags register.  */
> +   && (! filter_flags
> +   || ! insn_sets_resource_p (trial, _res, true)
> +   || find_regno_note (trial, REG_UNUSED, targetm.flags_regnum)
> +   || ! TEST_HARD_REG_BIT (set.regs, targetm.flags_regnum))
> && (!HAVE_cc0 || (! (reg_mentioned_p (cc0_rtx, pat)
> && (! own_thread || ! sets_cc0_p (pat)
> && ! can_throw_internal (trial))
> @@ -2618,6 +2643,16 @@ fill_slots_from_thread (rtx_jump_insn *insn, rtx 
> condition,
>lose = 1;
>

reorg.c (fill_slots_from_thread): Improve for TARGET_FLAGS_REGNUM targets

2020-08-13 Thread Hans-Peter Nilsson via Gcc-patches
Originally I thought to bootstrap this patch on MIPS and SPARC
since they're both delayed-branch-slot targets but I
reconsidered, as neither is a TARGET_FLAGS_REGNUM target.  It
seems only visium and CRIS has this feature set, and I see no
trace of visium in neither newlib nor the simulator next to
glibc.  So, I just tested cris-elf.

This handles TARGET_FLAGS_REGNUM clobbering insns as delay-slot
fillers using a method similar to that in commit 33c2207d3fda,
where care was taken for fill_simple_delay_slots to allow such
insns when scanning for delay-slot fillers *backwards* (before
the insn).

A TARGET_FLAGS_REGNUM target is typically a former cc0 target.
For cc0 targets, insns don't mention clobbering cc0, so the
clobbers are mentioned in the "resources" only as a special
entity and only for compare-insns and branches, where the cc0
value matters.

In contrast, with TARGET_FLAGS_REGNUM, most insns clobber it and
the register liveness detection in reorg.c / resource.c treats
that as a blocker (for other insns mentioning it, i.e. most)
when looking for delay-slot-filling candidates.  This means that
when comparing core and performance for a delay-slot cc0 target
before and after the de-cc0 conversion, the inability to fill a
delay slot after conversion manifests as a regression.  This was
one such case, for CRIS, with random_bitstring in
gcc.c-torture/execute/arith-rand-ll.c as well as the target
libgcc division function.

After this, all known performance regressions compared to cc0
are fixed.

Ok to commit?

gcc:
PR target/93372
* reorg.c (fill_slots_from_thread): Allow trial insns that clobber
TARGET_FLAGS_REGNUM as delay-slot fillers.

gcc/testsuite:
PR target/93372
* gcc.target/cris/pr93372-47.c: New test.
---
 gcc/reorg.c| 37 +-
 gcc/testsuite/gcc.target/cris/pr93372-47.c | 49 ++
 2 files changed, 85 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/cris/pr93372-47.c

diff --git a/gcc/reorg.c b/gcc/reorg.c
index dfd7494bf..83161caa0 100644
--- a/gcc/reorg.c
+++ b/gcc/reorg.c
@@ -2411,6 +2411,21 @@ fill_slots_from_thread (rtx_jump_insn *insn, rtx 
condition,
   CLEAR_RESOURCE ();
   CLEAR_RESOURCE ();
 
+  /* Handle the flags register specially, to be able to accept a
+ candidate that clobbers it.  See also fill_simple_delay_slots.  */
+  bool filter_flags
+= (slots_to_fill == 1
+   && targetm.flags_regnum != INVALID_REGNUM
+   && find_regno_note (insn, REG_DEAD, targetm.flags_regnum));
+  struct resources fset;
+  struct resources flags_res;
+  if (filter_flags)
+{
+  CLEAR_RESOURCE ();
+  CLEAR_RESOURCE (_res);
+  SET_HARD_REG_BIT (flags_res.regs, targetm.flags_regnum);
+}
+
   /* If we do not own this thread, we must stop as soon as we find
  something that we can't put in a delay slot, since all we can do
  is branch into THREAD at a later point.  Therefore, labels stop
@@ -2439,8 +2454,18 @@ fill_slots_from_thread (rtx_jump_insn *insn, rtx 
condition,
   /* If TRIAL conflicts with the insns ahead of it, we lose.  Also,
 don't separate or copy insns that set and use CC0.  */
   if (! insn_references_resource_p (trial, , true)
- && ! insn_sets_resource_p (trial, , true)
+ && ! insn_sets_resource_p (trial, filter_flags ?  : , true)
  && ! insn_sets_resource_p (trial, , true)
+ /* If we're handling sets to the flags register specially, we
+only allow an insn into a delay-slot, if it either:
+- doesn't set the flags register,
+- the "set" of the flags register isn't used (clobbered),
+- insns between the delay-slot insn and the trial-insn
+as accounted in "set", have not affected the flags register.  */
+ && (! filter_flags
+ || ! insn_sets_resource_p (trial, _res, true)
+ || find_regno_note (trial, REG_UNUSED, targetm.flags_regnum)
+ || ! TEST_HARD_REG_BIT (set.regs, targetm.flags_regnum))
  && (!HAVE_cc0 || (! (reg_mentioned_p (cc0_rtx, pat)
  && (! own_thread || ! sets_cc0_p (pat)
  && ! can_throw_internal (trial))
@@ -2618,6 +2643,16 @@ fill_slots_from_thread (rtx_jump_insn *insn, rtx 
condition,
   lose = 1;
   mark_set_resources (trial, , 0, MARK_SRC_DEST_CALL);
   mark_referenced_resources (trial, , true);
+  if (filter_flags)
+   {
+ mark_set_resources (trial, , 0, MARK_SRC_DEST_CALL);
+
+ /* Groups of flags-register setters with users should not
+affect opportunities to move flags-register-setting insns
+(clobbers) into the delay-slot.  */
+ CLEAR_HARD_REG_BIT (needed.regs, targetm.flags_regnum);
+ CLEAR_HARD_REG_BIT (fset.regs, targetm.flags_regnum);
+   }
 
   /* Ensure we don't put insns between the setting of cc and the