Re: [V2] New pass for sign/zero extension elimination -- not ready for "final" review

2023-11-30 Thread Joern Rennecke
On Thu, 30 Nov 2023 at 17:53, Jeff Law  wrote:
 > >  * ext-dce.c: Fixes for carry handling.
> >
> >  * ext-dce.c (safe_for_live_propagation): Handle MINUS.
> >(ext_dce_process_uses): Break out carry handling into ..
> >(carry_backpropagate): This new function.
> >Better handling of ASHIFT.
> >Add handling of SMUL_HIGHPART, UMUL_HIGHPART, SIGN_EXTEND, SS_ASHIFT 
> > and
> >US_ASHIFT.
> >
> >  * ext-dce.c: fix SUBREG_BYTE test
> >
> >  As mentioned in
> >  https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637486.html
> >  and
> >  https://gcc.gnu.org/pipermail/gcc-patches/2023-November/638473.html
> >
> >
> > diff --git a/gcc/ext-dce.cc b/gcc/ext-dce.cc
> > index 4e4c57de117..228c50e8b73 100644
> > --- a/gcc/ext-dce.cc
> > +++ b/gcc/ext-dce.cc
> > @@ -38,7 +38,10 @@ along with GCC; see the file COPYING3.  If not see
> >  bit 0..7   (least significant byte)
> >  bit 8..15  (second least significant byte)
> >  bit 16..31
> > -   bit 32..BITS_PER_WORD-1  */
> > +   bit 32..BITS_PER_WORD-1
> > +
> > +   For vector modes, we apply these bit groups to every lane; if any of the
> > +   bits in the group are live in any lane, we consider this group live.  */
> Why add vector modes now?  I realize it might help a vectorized sub*_dct
> from x264, but I was thinking that would be more of a gcc-15 improvement.

Actually, we already did, but because it was unintentional, it wasn't
done properly.

I've been using BEG_MODE_BITSIZE(GET_MODE (x)).to_constant, thinking a mode
should just have a constant size that can easily fit into an int.  I was wrong.
Debugging found that was a scalable vector mode.  SUBREGs, shifts and
other stuff
has vector modes and goes through the code.  Instead of adding code to bail our,
I though it would be a good idea to think about how vector modes can
be supported
without balooning the computation time or memory.  And keeping in mind
the original
intention of the patch - eliminating redundant sign/zero extension -
that actually can
applied to vectors as well, and that means we should consider how
these operations
work on each lane.

By looking at the inner mode of a vector, we also conventiently also
get a sane size.
For complex numbers, it's also saner to treat them as two-element vectors, tham
trying to apply the bit parts while ignoring the structure, so it
makes sense to use
GET_MODE_INNER in general.

Something that could be done for further improvement but seems too complex
for gcc 14 would be to handle vector constants as shift counts.

Come to think of it, I actually applied the wrong test for the integer
shift counts -
it should be CONST_INT_P, not CONSTANT_P.

> >
> >   /* Note this pass could be used to narrow memory loads too.  It's
> >  not clear if that's profitable or not in general.  */
>
> > @@ -96,6 +100,8 @@ safe_for_live_propagation (rtx_code code)
> >   case SS_ASHIFT:
> >   case US_ASHIFT:
> >   case ASHIFT:
> > +case LSHIFTRT:
> > +case ASHIFTRT:
> > return true;
> So this starts to touch on a cleanup Richard mentioned.  The codes in
> there until now were supposed to be safe across the board.

Saturating operations are not safe at all without explicitly computing
the liveness propagation.

>  As we add
> things like LSHIFTRT, we need to describe how to handle liveness
> transfer from the destination into the source(s).  I think what Richard
> is asking for is to just have one place which handles both.

LSHIFTRT is much simpler than the saturating operations.

> Anyway, my current plan would be to pull in the formatting fixes, the
> back propagation without the vector enhancement.

Pretending the vector modes don't happen is not making the code safe.
We have to handle them somehow, so we might as well do that in a way
that is consistent and gives more potential for optimization.


Re: [V2] New pass for sign/zero extension elimination -- not ready for "final" review

2023-11-29 Thread Joern Rennecke
 I originally computed mmask in carry_backpropagate from XEXP (x, 0),
but abandoned that when I realized we also get called for RTX_OBJ
things.  I forgot to adjust the SIGN_EXTEND code, though.  Fixed
in the attached revised patch.  Also made sure to not make inputs
of LSHIFTRT / ASHIFTRT live if the output is dead (and commened
the checks for (mask == 0) in the process).

Something that could be done to futher simplif the code is to make
carry_backpropagate do all the rtx_code-dependent propagation
decisions.  I.e. would have cases for RTX_OBJ, AND, OR, IOR etc
that propagate the mask, and the default action would be to make
the input live (after the check not make any bits in the input
live if the output is dead).

Then we wouldn't need safe_for_live_propagation any more.

Not sure if carry_backpropagate would then still be a suitable name
anymore, though.
* ext-dce.cc (carry_backpropagate): Always return 0 when output is dead.  
Fix SIGN_EXTEND input mask.

* ext-dce.cc: handle vector modes.

* ext-dce.cc: Amend comment to explain how liveness of vectors is tracked.
  (carry_backpropagate): Use GET_MODE_INNER.
  (ext_dce_process_sets): Likewise.  Only apply big endian correction for
  subregs if they don't have a vector mode.
  (ext_cde_process_uses): Likewise.

* ext-dce.cc: carry_backpropagate: [US]S_ASHIFT fix, handle [LA]SHIFTRT

* ext-dce.cc (safe_for_live_propagation): Add LSHIFTRT and ASHIFTRT.
  (carry_backpropagate): Reformat top comment.
  Add handling of LSHIFTRT and ASHIFTRT.
  Fix bit count for [SU]MUL_HIGHPART.
  Fix pasto for [SU]S_ASHIFT.

* ext-dce.c: Fixes for carry handling.

* ext-dce.c (safe_for_live_propagation): Handle MINUS.
  (ext_dce_process_uses): Break out carry handling into ..
  (carry_backpropagate): This new function.
  Better handling of ASHIFT.
  Add handling of SMUL_HIGHPART, UMUL_HIGHPART, SIGN_EXTEND, SS_ASHIFT and
  US_ASHIFT.

* ext-dce.c: fix SUBREG_BYTE test

As mentioned in
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637486.html
and
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/638473.html

diff --git a/gcc/ext-dce.cc b/gcc/ext-dce.cc
index 4e4c57de117..fd80052ad75 100644
--- a/gcc/ext-dce.cc
+++ b/gcc/ext-dce.cc
@@ -38,7 +38,10 @@ along with GCC; see the file COPYING3.  If not see
bit 0..7   (least significant byte)
bit 8..15  (second least significant byte)
bit 16..31
-   bit 32..BITS_PER_WORD-1  */
+   bit 32..BITS_PER_WORD-1
+
+   For vector modes, we apply these bit groups to every lane; if any of the
+   bits in the group are live in any lane, we consider this group live.  */
 
 /* Note this pass could be used to narrow memory loads too.  It's
not clear if that's profitable or not in general.  */
@@ -83,6 +86,7 @@ safe_for_live_propagation (rtx_code code)
 case SIGN_EXTEND:
 case TRUNCATE:
 case PLUS:
+case MINUS:
 case MULT:
 case SMUL_HIGHPART:
 case UMUL_HIGHPART:
@@ -96,6 +100,8 @@ safe_for_live_propagation (rtx_code code)
 case SS_ASHIFT:
 case US_ASHIFT:
 case ASHIFT:
+case LSHIFTRT:
+case ASHIFTRT:
   return true;
 
 /* There may be other safe codes.  If so they can be added
@@ -215,13 +221,22 @@ ext_dce_process_sets (rtx_insn *insn, rtx obj, bitmap 
livenow, bitmap live_tmp)
 
  /* Phase one of destination handling.  First remove any wrapper
 such as SUBREG or ZERO_EXTRACT.  */
- unsigned HOST_WIDE_INT mask = GET_MODE_MASK (GET_MODE (x));
+ unsigned HOST_WIDE_INT mask
+   = GET_MODE_MASK (GET_MODE_INNER (GET_MODE (x)));
  if (SUBREG_P (x)
  && !paradoxical_subreg_p (x)
  && SUBREG_BYTE (x).is_constant ())
{
- bit = subreg_lsb (x).to_constant ();
- mask = GET_MODE_MASK (GET_MODE (SUBREG_REG (x))) << bit;
+ enum machine_mode omode = GET_MODE_INNER (GET_MODE (x));
+ enum machine_mode imode = GET_MODE (SUBREG_REG (x));
+ bit = 0;
+ if (!VECTOR_MODE_P (GET_MODE (x))
+ || (GET_MODE_SIZE (imode).is_constant ()
+ && (GET_MODE_SIZE (omode).to_constant ()
+ > GET_MODE_SIZE (imode).to_constant (
+   bit = subreg_lsb (x).to_constant ();
+ mask = (GET_MODE_MASK (GET_MODE_INNER (GET_MODE (SUBREG_REG (x
+ << bit);
  gcc_assert (mask);
  if (!mask)
mask = -0x1ULL;
@@ -365,6 +380,85 @@ binop_implies_op2_fully_live (rtx_code code)
 }
 }
 
+/* X, with code CODE, is an operation for which safe_for_live_propagation
+   holds true, and bits set in MASK are live in the result.  Compute a
+   mask of (potentially) live bits in the non-constant inputs.  In case of
+   binop_implies_op2_fully_live (e.g. shifts), the computed mask may
+   exclusively 

Re: [V2] New pass for sign/zero extension elimination -- not ready for "final" review

2023-11-29 Thread Joern Rennecke
On Wed, 29 Nov 2023 at 20:05, Joern Rennecke
 wrote:

> > I suspect it'd be more useful to add handling of LSHIFTRT and ASHIFTRT
> > .  Some ports do
> > a lot of static shifting.
>
> > +case SS_ASHIFT:
> > +case US_ASHIFT:
> > +  if (!mask || XEXP (x, 1) == const0_rtx)
> > +   return 0;
>
> P.S.: I just realize that this is a pasto: in the case of a const0_rtx
> shift count,
> we returning 0 will usually be wrong.

I've attached my current patch version.
ext-dce.cc: handle vector modes.

* ext-dce.cc: Amend comment to explain how liveness of vectors is tracked.
  (carry_backpropagate): Use GET_MODE_INNER.
  (ext_dce_process_sets): Likewise.  Only apply big endian correction for
  subregs if they don't have a vector mode.
  (ext_cde_process_uses): Likewise.

* ext-dce.cc: carry_backpropagate: [US]S_ASHIFT fix, handle [LA]SHIFTRT

* ext-dce.cc (safe_for_live_propagation): Add LSHIFTRT and ASHIFTRT.
  (carry_backpropagate): Reformat top comment.
  Add handling of LSHIFTRT and ASHIFTRT.
  Fix bit count for [SU]MUL_HIGHPART.
  Fix pasto for [SU]S_ASHIFT.

* ext-dce.c: Fixes for carry handling.

* ext-dce.c (safe_for_live_propagation): Handle MINUS.
  (ext_dce_process_uses): Break out carry handling into ..
  (carry_backpropagate): This new function.
  Better handling of ASHIFT.
  Add handling of SMUL_HIGHPART, UMUL_HIGHPART, SIGN_EXTEND, SS_ASHIFT and
  US_ASHIFT.

* ext-dce.c: fix SUBREG_BYTE test

As mentioned in
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637486.html
and
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/638473.html


diff --git a/gcc/ext-dce.cc b/gcc/ext-dce.cc
index 4e4c57de117..228c50e8b73 100644
--- a/gcc/ext-dce.cc
+++ b/gcc/ext-dce.cc
@@ -38,7 +38,10 @@ along with GCC; see the file COPYING3.  If not see
bit 0..7   (least significant byte)
bit 8..15  (second least significant byte)
bit 16..31
-   bit 32..BITS_PER_WORD-1  */
+   bit 32..BITS_PER_WORD-1
+
+   For vector modes, we apply these bit groups to every lane; if any of the
+   bits in the group are live in any lane, we consider this group live.  */
 
 /* Note this pass could be used to narrow memory loads too.  It's
not clear if that's profitable or not in general.  */
@@ -83,6 +86,7 @@ safe_for_live_propagation (rtx_code code)
 case SIGN_EXTEND:
 case TRUNCATE:
 case PLUS:
+case MINUS:
 case MULT:
 case SMUL_HIGHPART:
 case UMUL_HIGHPART:
@@ -96,6 +100,8 @@ safe_for_live_propagation (rtx_code code)
 case SS_ASHIFT:
 case US_ASHIFT:
 case ASHIFT:
+case LSHIFTRT:
+case ASHIFTRT:
   return true;
 
 /* There may be other safe codes.  If so they can be added
@@ -215,13 +221,22 @@ ext_dce_process_sets (rtx_insn *insn, rtx obj, bitmap 
livenow, bitmap live_tmp)
 
  /* Phase one of destination handling.  First remove any wrapper
 such as SUBREG or ZERO_EXTRACT.  */
- unsigned HOST_WIDE_INT mask = GET_MODE_MASK (GET_MODE (x));
+ unsigned HOST_WIDE_INT mask
+   = GET_MODE_MASK (GET_MODE_INNER (GET_MODE (x)));
  if (SUBREG_P (x)
  && !paradoxical_subreg_p (x)
  && SUBREG_BYTE (x).is_constant ())
{
- bit = subreg_lsb (x).to_constant ();
- mask = GET_MODE_MASK (GET_MODE (SUBREG_REG (x))) << bit;
+ enum machine_mode omode = GET_MODE_INNER (GET_MODE (x));
+ enum machine_mode imode = GET_MODE (SUBREG_REG (x));
+ bit = 0;
+ if (!VECTOR_MODE_P (GET_MODE (x))
+ || (GET_MODE_SIZE (imode).is_constant ()
+ && (GET_MODE_SIZE (omode).to_constant ()
+ > GET_MODE_SIZE (imode).to_constant (
+   bit = subreg_lsb (x).to_constant ();
+ mask = (GET_MODE_MASK (GET_MODE_INNER (GET_MODE (SUBREG_REG (x
+ << bit);
  gcc_assert (mask);
  if (!mask)
mask = -0x1ULL;
@@ -365,6 +380,84 @@ binop_implies_op2_fully_live (rtx_code code)
 }
 }
 
+/* X, with code CODE, is an operation for which safe_for_live_propagation
+   holds true, and bits set in MASK are live in the result.  Compute a
+   mask of (potentially) live bits in the non-constant inputs.  In case of
+   binop_implies_op2_fully_live (e.g. shifts), the computed mask may
+   exclusively pertain to the first operand.  */
+
+HOST_WIDE_INT
+carry_backpropagate (HOST_WIDE_INT mask, enum rtx_code code, rtx x)
+{
+  enum machine_mode mode = GET_MODE_INNER (GET_MODE (x));
+  HOST_WIDE_INT mmask = GET_MODE_MASK (mode);
+  switch (code)
+{
+case ASHIFT:
+  if (CONSTANT_P (XEXP (x, 1))
+ && known_lt (UINTVAL (XEXP (x, 1)), GET_MODE_BITSIZE (mode)))
+   ret

Re: [V2] New pass for sign/zero extension elimination -- not ready for "final" review

2023-11-29 Thread Joern Rennecke
On Wed, 29 Nov 2023 at 19:57, Joern Rennecke
 wrote:
>
> Attached is what I have for carry_backpropagate .
>
> The utility of special handling for SS_ASHIFT / US_ASHIFT seems
> somewhat marginal.
>
> I suspect it'd be more useful to add handling of LSHIFTRT and ASHIFTRT
> .  Some ports do
> a lot of static shifting.

> +case SS_ASHIFT:
> +case US_ASHIFT:
> +  if (!mask || XEXP (x, 1) == const0_rtx)
> +   return 0;

P.S.: I just realize that this is a pasto: in the case of a const0_rtx
shift count,
we returning 0 will usually be wrong.  OTOH the code below will handle this
just almost perfectly - the one imperfection being that SS_ASHIFT will see
the sign bit set live if anything is live.  Not that it actually
matters if we track
liveness in 8 / 8 / 16 / 32 bit chunks.


[V2] New pass for sign/zero extension elimination -- not ready for "final" review

2023-11-29 Thread Joern Rennecke
Attached is what I have for carry_backpropagate .

The utility of special handling for SS_ASHIFT / US_ASHIFT seems
somewhat marginal.

I suspect it'd be more useful to add handling of LSHIFTRT and ASHIFTRT
.  Some ports do
a lot of static shifting.
commit ed47c3d0d38f85c9b4e22bdbd079e0665465ef9c
Author: Joern Rennecke 
Date:   Wed Nov 29 18:46:06 2023 +

* ext-dce.c: Fixes for carry handling.

* ext-dce.c (safe_for_live_propagation): Handle MINUS.
  (ext_dce_process_uses): Break out carry handling into ..
  (carry_backpropagate): This new function.
  Better handling of ASHIFT.
  Add handling of SMUL_HIGHPART, UMUL_HIGHPART, SIGN_EXTEND, SS_ASHIFT and
  US_ASHIFT.

diff --git a/gcc/ext-dce.cc b/gcc/ext-dce.cc
index 590656f72c7..2a4508181a1 100644
--- a/gcc/ext-dce.cc
+++ b/gcc/ext-dce.cc
@@ -83,6 +83,7 @@ safe_for_live_propagation (rtx_code code)
 case SIGN_EXTEND:
 case TRUNCATE:
 case PLUS:
+case MINUS:
 case MULT:
 case SMUL_HIGHPART:
 case UMUL_HIGHPART:
@@ -365,6 +366,67 @@ binop_implies_op2_fully_live (rtx_code code)
 }
 }
 
+/* X, with code CODE, is an operation for which
+safe_for_live_propagation holds true,
+   and bits set in MASK are live in the result.  Compute a make of 
(potentially)
+   live bits in the non-constant inputs.  In case of
+binop_implies_op2_fully_live
+   (e.g. shifts), the computed mask may exclusively pertain to the
+first operand.  */
+
+HOST_WIDE_INT
+carry_backpropagate (HOST_WIDE_INT mask, enum rtx_code code, rtx x)
+{
+  enum machine_mode mode = GET_MODE (x);
+  HOST_WIDE_INT mmask = GET_MODE_MASK (mode);
+  switch (code)
+{
+case ASHIFT:
+  if (CONSTANT_P (XEXP (x, 1))
+ && known_lt (UINTVAL (XEXP (x, 1)), GET_MODE_BITSIZE (mode)))
+   return mask >> INTVAL (XEXP (x, 1));
+  /* Fall through.  */
+case PLUS: case MINUS:
+case MULT:
+  return mask ? ((2ULL << floor_log2 (mask)) - 1) : 0;
+case SMUL_HIGHPART: case UMUL_HIGHPART:
+  if (!mask || XEXP (x, 1) == const0_rtx)
+   return 0;
+  if (CONSTANT_P (XEXP (x, 1)))
+   {
+ if (pow2p_hwi (INTVAL (XEXP (x, 1
+   return mmask & (mask << (GET_MODE_BITSIZE (mode).to_constant ()
+- exact_log2 (INTVAL (XEXP (x, 1);
+
+ int bits = (2 * GET_MODE_BITSIZE (mode).to_constant ()
+ - clz_hwi (mask) - ctz_hwi (INTVAL (XEXP (x, 1;
+ if (bits < GET_MODE_BITSIZE (mode).to_constant ())
+   return (1ULL << bits) - 1;
+   }
+  return mmask;
+case SIGN_EXTEND:
+  if (mask & ~mmask)
+   mask |= 1ULL << (GET_MODE_BITSIZE (mode).to_constant () - 1);
+  return mask;
+
+/* We propagate for the shifted operand, but not the shift
+   count.  The count is handled specially.  */
+case SS_ASHIFT:
+case US_ASHIFT:
+  if (!mask || XEXP (x, 1) == const0_rtx)
+   return 0;
+  if (CONSTANT_P (XEXP (x, 1))
+ && UINTVAL (XEXP (x, 1)) < GET_MODE_BITSIZE (mode).to_constant ())
+   {
+ return ((mmask & ~((unsigned HOST_WIDE_INT)mmask
+>> (INTVAL (XEXP (x, 1)) + (code == SS_ASHIFT
+ | (mask >> INTVAL (XEXP (x, 1;
+   }
+  return mmask;
+default:
+  return mask;
+}
+}
 /* Process uses in INSN contained in OBJ.  Set appropriate bits in LIVENOW
for any chunks of pseudos that become live, potentially filtering using
bits from LIVE_TMP.
@@ -480,11 +542,7 @@ ext_dce_process_uses (rtx_insn *insn, rtx obj, bitmap 
livenow,
 sure everything that should get marked as live is marked
 from here onward.  */
 
- /* ?!? What is the point of this adjustment to DST_MASK?  */
- if (code == PLUS || code == MINUS
- || code == MULT || code == ASHIFT)
-   dst_mask
- = dst_mask ? ((2ULL << floor_log2 (dst_mask)) - 1) : 0;
+ dst_mask = carry_backpropagate (dst_mask, code, src);
 
  /* We will handle the other operand of a binary operator
 at the bottom of the loop by resetting Y.  */


Re: [RFA] New pass for sign/zero extension elimination

2023-11-29 Thread Joern Rennecke
Why did you leave out MINUS from safe_for_live_propagation ?


Re: [RFA] New pass for sign/zero extension elimination

2023-11-28 Thread Joern Rennecke
On Tue, 28 Nov 2023 at 13:36, Joern Rennecke
 wrote:
 > For the saturating truncation operations, we have the high-to-low
propagation,
> but no low-to-high propagation, so that would be something separate to model.

P.S.:
For unsigned saturating truncation, the propagation from higher to
lower bits only
happens for bits that are truncated off.
e.g. if we truncate a 64 bit value to a 32 bit value, and only the
lower 16 bit of the
result are live, we got an output live mask
0x implying an input live mask:
0x

For signed saturating truncation, we got an extra corner case.  For
the same data widths
as above, the value
0x8000
truncates to:
0x8000
but
0x8000
truncates to:
0x7fff

so the top bit that is included in the truncated mode propagates to
all the lower bits
(irrespective if it itself is live in the output), so it is live in
the input if any bit is live in
the output - just like all the truncated-off bits.


Re: [RFA] New pass for sign/zero extension elimination

2023-11-28 Thread Joern Rennecke
On Mon, 27 Nov 2023 at 20:18, Jeff Law  wrote:
>
>
>
> On 11/27/23 13:03, Richard Sandiford wrote:
> > Joern Rennecke  writes:
> >>   On 11/20/23 11:26, Richard Sandiford wrote:
> >>>> +  /* ?!? What is the point of this adjustment to DST_MASK?  */
> >>>> +  if (code == PLUS || code == MINUS
> >>>> +  || code == MULT || code == ASHIFT)
> >>>> + dst_mask
> >>>> +  = dst_mask ? ((2ULL << floor_log2 (dst_mask)) - 1) : 0;
> >>>
> >>> Yeah, sympathise with the ?!? here :)
> >> Jeff Law:
> >>> Inherited.  Like the other bit of magic I think I'll do a test with them
> >>> pulled out to see if I can make something undesirable trigger.
> >>
> >> This represents the carry effect.  Even if the destination only cares about
> >> some high order bits, you have to consider all lower order bits of the 
> >> inputs.
> >>
> >> For ASHIFT, you could refine this in the case of a constant shift count.
> >
> > Ah, right.  Think it would be worth a comment.
> Definitely.  Wouldn't SIGN_EXTEND have a similar problem?  While we
> don't care about all the low bits, we do care about that MSB.

Yes, if bits outside of the MODE_MASK of the input (i.e. higher bits) are
life in the output, than we want the high bit of the SIGN_EXTEND input live.

OTOH, if the output is not wider, then the high bit of the input is
only life if the
same bit of the output is.  That latter point is important because chains of
same-width sign extends are a prime target for this optimization.

SMUL_HIGHPART / UMUL_HIGHPART also have carry-propagation.

With the saturating operations, we also have propagations from high bit
 into lower bits in the saturating case.  I don't think we can do anything
useful for the saturating addition / multiplication operators safely.

For the saturating truncation operations, we have the high-to-low propagation,
but no low-to-high propagation, so that would be something separate to model.


Re: [RFA] New pass for sign/zero extension elimination

2023-11-28 Thread Joern Rennecke
On Mon, 27 Nov 2023 at 20:03, Richard Sandiford
 wrote:
>
> Joern Rennecke  writes:
> >  On 11/20/23 11:26, Richard Sandiford wrote:
> >>> +  /* ?!? What is the point of this adjustment to DST_MASK?  */
> >>> +  if (code == PLUS || code == MINUS
> >>> +  || code == MULT || code == ASHIFT)
> >>> + dst_mask
> >>> +  = dst_mask ? ((2ULL << floor_log2 (dst_mask)) - 1) : 0;
> >>
> >> Yeah, sympathise with the ?!? here :)
> > Jeff Law:
> >> Inherited.  Like the other bit of magic I think I'll do a test with them
> >> pulled out to see if I can make something undesirable trigger.
> >
> > This represents the carry effect.  Even if the destination only cares about
> > some high order bits, you have to consider all lower order bits of the 
> > inputs.
> >
> > For ASHIFT, you could refine this in the case of a constant shift count.
>
> Ah, right.  Think it would be worth a comment.
>
> But I wonder whether we should centralise all this code-specific
> information into a single place.  I.e. rather than having one switch to
> say "PLUS is OK" or "AND is OK", and then having code-specific handling
> elsewhere, we could enumerate how to handle a code.

This carry-back-propagation code is used only in that one place, so I
saw no need to put it in a separate function.
But if we need to add to it (handle SIGN_EXTEND, maybe handle
ASHIFT better) and add lots of comments, it makes sense to put it
into an inlinable function so it doesn't disrupt the flow of reading the
code.

Maybe something like this?

/* X, with code CODE, is an operation for which
safe_for_live_propagation holds true,
   and bits set in MASK are live in the result.  Compute a make of (potentially)
   live bits in the non-constant inputs.  In case of
binop_implies_op2_fully_live
   (e.g. shifts), the computed mask may exclusively pertain to the
first operand.  */

HOST_WIDE_INT
carry_backpropagate (HOST_WIDE_INT mask, enum rtx_code code, rtx x)


Re: [RFA] New pass for sign/zero extension elimination

2023-11-27 Thread Joern Rennecke
You are applying PATTERN to an INSN_LIST.
diff --git a/gcc/ext-dce.cc b/gcc/ext-dce.cc
index 52032b50951..4523654538c 100644
--- a/gcc/ext-dce.cc
+++ b/gcc/ext-dce.cc
@@ -122,10 +122,9 @@ safe_for_live_propagation (rtx_code code)
optimziation phase during use handling will be.  */
 
 static void
-ext_dce_process_sets (rtx insn, bitmap livenow, bitmap live_tmp)
+ext_dce_process_sets (rtx insn, rtx pat, bitmap livenow, bitmap live_tmp)
 {
   subrtx_iterator::array_type array;
-  rtx pat = PATTERN (insn);
   FOR_EACH_SUBRTX (iter, array, pat, NONCONST)
 {
   const_rtx x = *iter;
@@ -377,7 +376,7 @@ binop_implies_op2_fully_live (rtx_code code)
eliminated in CHANGED_PSEUDOS.  */
 
 static void
-ext_dce_process_uses (rtx insn, bitmap livenow, bitmap live_tmp,
+ext_dce_process_uses (rtx insn, rtx pat, bitmap livenow, bitmap live_tmp,
  bool modify, bitmap changed_pseudos)
 {
   /* A nonlocal goto implicitly uses the frame pointer.  */
@@ -389,7 +388,6 @@ ext_dce_process_uses (rtx insn, bitmap livenow, bitmap 
live_tmp,
 }
 
   subrtx_var_iterator::array_type array_var;
-  rtx pat = PATTERN (insn);
   FOR_EACH_SUBRTX_VAR (iter, array_var, pat, NONCONST)
 {
   /* An EXPR_LIST (from call fusage) ends in NULL_RTX.  */
@@ -640,15 +638,16 @@ ext_dce_process_bb (basic_block bb, bitmap livenow,
   bitmap live_tmp = BITMAP_ALLOC (NULL);
 
   /* First process any sets/clobbers in INSN.  */
-  ext_dce_process_sets (insn, livenow, live_tmp);
+  ext_dce_process_sets (insn, PATTERN (insn), livenow, live_tmp);
 
   /* CALL_INSNs need processing their fusage data.  */
   if (GET_CODE (insn) == CALL_INSN)
-   ext_dce_process_sets (CALL_INSN_FUNCTION_USAGE (insn),
+   ext_dce_process_sets (insn, CALL_INSN_FUNCTION_USAGE (insn),
  livenow, live_tmp);
 
   /* And now uses, optimizing away SIGN/ZERO extensions as we go.  */
-  ext_dce_process_uses (insn, livenow, live_tmp, modify, changed_pseudos);
+  ext_dce_process_uses (insn, PATTERN (insn), livenow, live_tmp, modify,
+   changed_pseudos);
 
   /* And process fusage data for the use as well.  */
   if (GET_CODE (insn) == CALL_INSN)
@@ -663,7 +662,7 @@ ext_dce_process_bb (basic_block bb, bitmap livenow,
  if (global_regs[i])
bitmap_set_range (livenow, i * 4, 4);
 
- ext_dce_process_uses (CALL_INSN_FUNCTION_USAGE (insn),
+ ext_dce_process_uses (insn, CALL_INSN_FUNCTION_USAGE (insn),
livenow, live_tmp, modify, changed_pseudos);
}
 


Re: [RFA] New pass for sign/zero extension elimination

2023-11-27 Thread Joern Rennecke
 On 11/20/23 11:26, Richard Sandiford wrote:
>> +  /* ?!? What is the point of this adjustment to DST_MASK?  */
>> +  if (code == PLUS || code == MINUS
>> +  || code == MULT || code == ASHIFT)
>> + dst_mask
>> +  = dst_mask ? ((2ULL << floor_log2 (dst_mask)) - 1) : 0;
>
> Yeah, sympathise with the ?!? here :)
Jeff Law:
> Inherited.  Like the other bit of magic I think I'll do a test with them
> pulled out to see if I can make something undesirable trigger.

This represents the carry effect.  Even if the destination only cares about
some high order bits, you have to consider all lower order bits of the inputs.

For ASHIFT, you could refine this in the case of a constant shift count.


Re: [RFA] New pass for sign/zero extension elimination

2023-11-27 Thread Joern Rennecke
On 11/20/23 11:26, Richard Sandiford wrote:

>> +
>> +  mask = GET_MODE_MASK (GET_MODE (SUBREG_REG (x))) << bit;
>> +  if (!mask)
>> + mask = -0x1ULL;
>
> Not sure I follow this.  What does the -0x1ULL constant indicate?
> Also, isn't it the mask of the outer register that is shifted, rather
> than the mask of the inner mode?  E.g. if we have:
Jeff Law:
> Inherited.  I should have marked it like the other one as needing
> investigation.  Probably the fastest way is to just rip it out for a
> test to see what breaks.

This is for support of types wider than DImode.

You unsupported tracking of these values in various places, though.


RFA: RISC-V: Add support for XCVhwlp extension in CV32E40P

2023-11-18 Thread Joern Rennecke
This patch adds support for hardware loops as described in:
https://docs.openhwgroup.org/projects/cv32e40p-user-manual/en/cv32e40p_v1.3.2/instruction_set_extensions.html#hardware-loops
.

riscv32-corev-elf (using newlib) regression tested for multilibs:
rv32imc_zicsr-ilp32--
rv32imfc_zicsr-ilp32--
rv32imc_zicsr_zfinx-ilp32--
rv32imfc_zicsr_xcvmac_xcvalu-ilp32--

also tested against this:

rv32imc_zicsr_xcvhwlp-ilp32--
rv32imfc_zicsr_xcvhwlp-ilp32--
rv32imc_zicsr_zfinx_xcvhwlp-ilp32--
rv32imfc_zicsr_xcvmac_xcvalu_xcvhwlp-ilp32-

Bootstrapped on x86_64

build 'all-gcc' for x86_64 x sh-elf
Add support for XCVhwlp extension in CV32E40P

2023-11-18  Joern Rennecke  

gcc/
* common/config/riscv/riscv-common.cc (riscv_ext_version_table):
Add xcvhwlp.
(riscv_ext_flag_table): Likewise.
* config.gcc (riscv*): Add corev.o to extra_objs.
* config/riscv/constraints.md (xcvl0s, xcvl0e): New constraints.
(xcvl0c, xcvl1s, xcvl1e, xcvl1c): Likewise.
(CVl0, xcvlb5, xcvlbs, xcvlbe, CV12): Likewise.
* config/riscv/corev.cc: New file.
* config/riscv/corev.md (UNSPEC_CV_LOOPBUG): New constant.
(UNSPECV_CV_LOOPALIGN, UNSPEC_CV_FOLLOWS): Likewise.
(UNSPEC_CV_LP_START_12): Likewise.
(UNSPEC_CV_LP_END_5, UNSPEC_CV_LP_END_12): Likewise.
(doloop_end_i, *cv_start, *cv_end, *cv_count): New insn patterns.
(doloop_align): Likewise.
(doloop_end, doloop_begin): New expanders.
(doloop_begin_i): New define_insn_and_split.
(doloop_begin_i+1): New splitter.
* config/riscv/predicates.md (lpstart_reg_op): New predicate.
(lpend_reg_op, lpcount_reg_op): Likewise.
(label_register_operand, move_dest_operand): Likewise.
* config/riscv/riscv-passes.def (pass_riscv_doloop_begin): Add.
(pass_riscv_doloop_ranges):
Insert before and after register allocation.
* config/riscv/riscv-protos.h (make_pass_riscv_doloop_begin): Declare.
(make_pass_riscv_doloop_ranges): Likewise.
(riscv_can_use_doloop_p, riscv_invalid_within_doloop): Likewise.
(hwloop_setupi_p, add_label_op_ref, corev_label_align): Likewise.
* config/riscv/riscv.cc (riscv_regno_to_class): Add classes for
hardware loop start, end and counter registers.
(riscv_strip_unspec_address): Also strip UNSPEC_CV_LP_START_12,
UNSPEC_CV_LP_END_5 and UNSPEC_CV_LP_END_12.
(riscv_output_move): Add support to read loop counter registers.
(TARGET_CAN_USE_DOLOOP_P, TARGET_INVALID_WITHIN_DOLOOP): Override.
* config/riscv/riscv.h (enum reg_class): Add items for hardware
loop start, end and counter registers.
(REG_CLASS_NAMES): Likewise.
(REG_CLASS_CONTENTS): Likewise.
(REG_ALLOC_ORDER): Likewise.
(REGISTER_NAMES): Likewise.
(LABEL_ALIGN): Define.
* config/riscv/riscv.md (LPSTART0_REGNUM): New constant.
(LPEND0_REGNUM, LPCOUNT0_REGNUM): Likewise.
(LPSTART1_REGNUM, LPEND1_REGNUM, LPCOUNT1_REGNUM): Likewise.
(attr ext): New value xcvhwlp.
(attr enabled): Handle xcvhwlp.
(movsi_internal): Add alternatives to read loop counters.
Use move_dest_operand.
* config/riscv/riscv.opt (XCVHWLP): New Mask.
* config/riscv/t-riscv (corev.o): New rule.
* doc/md.texi (doloop_end): Document optional operand 2.
* loop-doloop.cc (doloop_optimize): Provide 3rd operand to
gen_doloop_end.
* target-insns.def (doloop_end): Add optional 3rd operand.
gcc/testsuite/
* gcc.target/riscv/cv-hwlp-shiftsub.c: New test.

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 5111626157b..55b56235134 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -312,6 +312,7 @@ static const struct riscv_ext_version 
riscv_ext_version_table[] =
 
   {"xcvmac", ISA_SPEC_CLASS_NONE, 1, 0},
   {"xcvalu", ISA_SPEC_CLASS_NONE, 1, 0},
+  {"xcvhwlp", ISA_SPEC_CLASS_NONE, 1, 0},
 
   {"xtheadba", ISA_SPEC_CLASS_NONE, 1, 0},
   {"xtheadbb", ISA_SPEC_CLASS_NONE, 1, 0},
@@ -1676,6 +1677,7 @@ static const riscv_ext_flag_table_t 
riscv_ext_flag_table[] =
 
   {"xcvmac",_options::x_riscv_xcv_subext, MASK_XCVMAC},
   {"xcvalu",_options::x_riscv_xcv_subext, MASK_XCVALU},
+  {"xcvhwlp",   _options::x_riscv_xcv_subext, MASK_XCVHWLP},
 
   {"xtheadba",  _options::x_riscv_xthead_subext, MASK_XTHEADBA},
   {"xtheadbb",  _options::x_riscv_xthead_subext, MASK_XTHEADBB},
diff --git a/gcc/config.gcc b/gcc/config.gcc
index 6d51bd93f3f..8cddfbb12b3 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -546,7 +546,7 @@ riscv*)
extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o 
riscv-shorten-memrefs.o riscv-selftests.o riscv-s

RFA: make scan-assembler* ignore LTO sections (Was: Re: committed [RISC-V]: Harden test scan patterns)

2023-11-08 Thread Joern Rennecke
On Fri, 29 Sept 2023 at 14:54, Jeff Law  wrote:
> ...  Joern  can you post a follow-up manual twiddle so
> that other ports can follow your example and avoid this problem?
>
> THanks,
>
> jeff

The attached patch makes the scan-assembler* directives ignore the LTO
sections.

Regression tested (using QEMU) for
riscv-sim

riscv-sim/-march=rv32gcv_zfh/-mabi=ilp32d/-ftree-vectorize/--param=riscv-autovec-preference=scalable
riscv-sim/-march=rv32imac/-mabi=ilp32

riscv-sim/-march=rv64gcv_zfh_zvfh_zba_zbb_zbc_zicond_zicboz_zawrs/-mabi=lp64d/-ftree-vectorize/--param=riscv-autovec-preference=scalable
riscv-sim/-march=rv64imac/-mabi=lp64
2023-11-08  Joern Rennecke  

gcc/testsuite/
* lib/scanasm.exp (scan-assembler-times): Disregard LTO sections.
(scan-assembler-dem, scan-assembler-dem-not): Likewise.
(dg-scan): Likewise, if name starts with scan-assembler.
(scan-raw-assembler): New proc.
* gcc.dg/pr61868.c: Use scan-raw-assembler.
* gcc.dg/scantest-lto.c: New test.
gcc/
* doc/sourcebuild.texi (Scan the assembly output): Document change.

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 8bf701461ec..5a34a10e6c2 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -3276,21 +3276,28 @@ Passes if @var{regexp} does not match text in the file 
generated by
 
 @table @code
 @item scan-assembler @var{regex} [@{ target/xfail @var{selector} @}]
-Passes if @var{regex} matches text in the test's assembler output.
+Passes if @var{regex} matches text in the test's assembler output,
+excluding LTO sections.
+
+@item scan-raw-assembler @var{regex} [@{ target/xfail @var{selector} @}]
+Passes if @var{regex} matches text in the test's assembler output,
+including LTO sections.
 
 @item scan-assembler-not @var{regex} [@{ target/xfail @var{selector} @}]
-Passes if @var{regex} does not match text in the test's assembler output.
+Passes if @var{regex} does not match text in the test's assembler output,
+excluding LTO sections.
 
 @item scan-assembler-times @var{regex} @var{num} [@{ target/xfail 
@var{selector} @}]
 Passes if @var{regex} is matched exactly @var{num} times in the test's
-assembler output.
+assembler output, excluding LTO sections.
 
 @item scan-assembler-dem @var{regex} [@{ target/xfail @var{selector} @}]
-Passes if @var{regex} matches text in the test's demangled assembler output.
+Passes if @var{regex} matches text in the test's demangled assembler output,
+excluding LTO sections.
 
 @item scan-assembler-dem-not @var{regex} [@{ target/xfail @var{selector} @}]
 Passes if @var{regex} does not match text in the test's demangled assembler
-output.
+output, excluding LTO sections.
 
 @item scan-assembler-symbol-section @var{functions} @var{section} [@{ 
target/xfail @var{selector} @}]
 Passes if @var{functions} are all in @var{section}.  The caller needs to
diff --git a/gcc/testsuite/gcc.dg/pr61868.c b/gcc/testsuite/gcc.dg/pr61868.c
index 4a7e8f6ae2d..52ab7838643 100644
--- a/gcc/testsuite/gcc.dg/pr61868.c
+++ b/gcc/testsuite/gcc.dg/pr61868.c
@@ -7,4 +7,4 @@ int main ()
   foo (100);
   return 0;
 }
-/* { dg-final { scan-assembler "\.gnu\.lto.*.12345" } } */
+/* { dg-final { scan-raw-assembler "\.gnu\.lto.*.12345" } } */
diff --git a/gcc/testsuite/lib/scanasm.exp b/gcc/testsuite/lib/scanasm.exp
index 5df80325dff..16b5198d38b 100644
--- a/gcc/testsuite/lib/scanasm.exp
+++ b/gcc/testsuite/lib/scanasm.exp
@@ -79,6 +79,12 @@ proc dg-scan { name positive testcase output_file orig_args 
} {
 }
 set text [read $fd]
 close $fd
+if { [string compare -length 14 $name scan-assembler] == 0 } {
+  # Remove LTO sections.
+  # ??? Somehow, .*? is still greedy.
+  # regsub -all 
{(^|\n)[[:space:]]*\.section[[:space:]]*\.gnu\.lto_.*?\n(?=[[:space:]]*\.text\n)}
 $text {\1} text
+  regsub -all 
{(^|\n)[[:space:]]*\.section[[:space:]]*\.gnu\.lto_(?:[^\n]*\n(?![[:space:]]*\.(section|text|data|bss)))*[^\n]*\n}
 $text {\1} text
+}
 
 set match [regexp -- $pattern $text]
 if { $match == $positive } {
@@ -108,6 +114,16 @@ proc scan-assembler { args } {
 
 set_required_options_for scan-assembler
 
+proc scan-raw-assembler { args } {
+set testcase [testname-for-summary]
+# The name might include a list of options; extract the file name.
+set filename [lindex $testcase 0]
+set output_file "[file rootname [file tail $filename]].s"
+dg-scan "scan-raw-assembler" 1 $testcase $output_file $args
+}
+
+set_required_options_for scan-raw-assembler
+
 # Check that a pattern is not present in the .s file produced by the
 # compiler.  See dg-scan for details.
 
@@ -487,6 +503,7 @@ proc scan-assembler-times { args } {
 set fd [open $output_file r]
 set text [read $fd]
 close $fd
+regsub -all 
{(^|\n)[[:space:]]*\.section[[:space:]]*\.gnu\.lto_(?:[^\n]*\n(?![[:space:]]*\.(section|text|data|bss)))*[^\n]*\n}
 $text 

Re: committed [RISC-V]: Harden test scan patterns

2023-10-11 Thread Joern Rennecke
On Wed, 11 Oct 2023 at 05:48, Joern Rennecke
 wrote:

> So I propose we look at the first character of the regexp, and if it's neither
> ^ nor \ (neither caret nor backslash), we consider the regexp un-anchored,
> and prepend ^[^"]* , so it won't allow a match after a double quote.

Looking at the sources for dg-scan / scan-assembler-times / scan-assembler-dem /
scan-assembler-dem-not, and the tcl regexp command and re_syntax manual
pages, I realise it won't work like that.  The backslash-escaped
characters in the
source file end up just as single characters if enclosed merely with
double quotes,
so "\t" is a single character, although {\t} and {\m} will  be passed
as two characters
to regexp (and "\m" is just an m).

And ^ , by default, matches only the begin of the text, which for the
aforementioned scan-assembler* procs means the entire (demangled for *-dem)
output file.
(The manual is a bit muddled about start of string or start of line,
but a test with
tclsh shows the default is indeed start of string.)
We can make use embedded options to make a prepended string work, i.e.
(?w)^[^"]*?

Although I'm not sure what that'd do on macOS - would the compiler output
contain lines terminated only with \r, and these be invisible to ^ ?
I see that we have a number of scan patterns that start with \n ing++.dg,
so I would hope that we can rely on lines ending with \n .
(\n\r or \r\n are OK for this purpose.)

Incidentally, these patterns should also work  with (?w^[^"]*? prepended,
as a line that ends should also have a start, but it could get a low count for
scan-assembler-times.  There are a number of tests in gcc.target/s390
that have directives starting with: scan-assembler-times {\n\t
which are perfectly anchored, but we might depress the count if we
prepend a pattern that matches the start of the line that has the newline.

We'd also have to make an exception for regexps that start with a parenthesis
to avoid disabling REs with embedded options.

So it seems we have to except patterns starting wit any of:
\\ \t \n (
Maybe we should also add [ to that list, for "[\n\r]" ?


Re: committed [RISC-V]: Harden test scan patterns

2023-10-10 Thread Joern Rennecke
On Sat, 30 Sept 2023 at 22:12, Joern Rennecke
 wrote:

> Also, we might have different directives for not scanning in LTO sections -
> or just ignoring .ascii .  Or maybe the other way round - you have to do
> something special if you want to scan inside strings, and by default we
> don't look inside strings?
> LTO information uses ascii, and ISTR sometimes also a zero-terminated
> variant (asciiz?); There might also some string constant outputs, or stabs
> information.
> One possible rule I think might work is: if the RE doesn't mention a quote,
> don't scan what's quoted inside double quotes.  Although we might to have
> to look out for backslash-escaped quotes to find the proper end of a quoted
> string.

I've though about this some more, and we need something that's simple for
dejagnu and simple to describe.

So I propose we look at the first character of the regexp, and if it's neither
^ nor \ (neither caret nor backslash), we consider the regexp un-anchored,
and prepend ^[^"]* , so it won't allow a match after a double quote.
Then document this in sourcebuild.texi, with some mention of lto information
and stabs, and also mentioning that if you really want to match irrespective
of a leading quote, you can prepend ^.* to your regexp.
There are good reasons to be more specific with your regexps in general,
but the matches in LTO are particularily damaging because they appear
semi-random, so often escape a regression test when the test is made,
only to surface during somebody else's regression test.


Re: [RISC-V]: Re: cpymem for RISCV with v extension

2023-10-04 Thread Joern Rennecke
On Wed, 4 Oct 2023 at 18:38, Patrick O'Neill  wrote:
>
> Hi Joern,
>
> I'm seeing new failures introduced by this patch
> (9464e72bcc9123b619215af8cfef491772a3ebd9).
>
> On rv64gcv:
> FAIL: gcc.dg/pr90263.c scan-assembler memcpy

My testing didn't flag this because I used elf targets.  The
expected behaviour now is to use vector instructions for rvv.
so we shouldn't expect memcpy to appear there.  I think the
rvv case is suitably covered by the new tests, so we just
have to avoid the failure here.  Does the attached patch work for you?

> FAIL: gfortran.fortran-torture/execute/intrinsic_count.f90 execution,
> -O2 -fomit-frame-pointer -finline-functions -funroll-loops

There seems to be an issue with my test setup regarding fortran, I'll
have to investigate.
diff --git a/gcc/testsuite/gcc.dg/pr90263.c b/gcc/testsuite/gcc.dg/pr90263.c
index 3222a5331c1..09e0446f45c 100644
--- a/gcc/testsuite/gcc.dg/pr90263.c
+++ b/gcc/testsuite/gcc.dg/pr90263.c
@@ -9,4 +9,4 @@ int *f (int *p, int *q, long n)
 }
 
 /* { dg-final { scan-assembler "mempcpy" { target { i?86-*-* x86_64-*-* } } } 
} */
-/* { dg-final { scan-assembler "memcpy" { target { ! { i?86-*-* x86_64-*-* } } 
} } } */
+/* { dg-final { scan-assembler "memcpy" { target { ! { i?86-*-* x86_64-*-* 
riscv_v } } } } } */


[RISC-V]: Re: cpymem for RISCV with v extension

2023-10-01 Thread Joern Rennecke
On Tue, 15 Aug 2023 at 15:06, Jeff Law  wrote:
 >
> On 8/15/23 03:16, juzhe.zh...@rivai.ai wrote:
> > The new  patch looks reasonable to me now. Thanks for fixing it.
> >
> > Could you append testcase after finishing test infrastructure ?
> > I prefer this patch with testcase after infrastructure.
> So let's call this an ACK, but ask that Joern not commit until the
> testsuite bits are in place.

Beyond the adding of tests, the patch needed some changes because of the
Refactoring of emit_{vlmax,nonvlmax}_xxx functions .
Attached is the committed version.
commit 9464e72bcc9123b619215af8cfef491772a3ebd9
Author: Joern Rennecke 
Date:   Mon Oct 2 03:16:09 2023 +0100

cpymem for RISC-V with v extension

gcc/
* config/riscv/riscv-protos.h (riscv_vector::expand_block_move):
Declare.
* config/riscv/riscv-v.cc (riscv_vector::expand_block_move):
New function.
* config/riscv/riscv.md (cpymemsi): Use 
riscv_vector::expand_block_move.
Change to ..
(cpymem) .. this.

gcc/testsuite/
* gcc.target/riscv/rvv/base/cpymem-1.c: New test.
* gcc.target/riscv/rvv/base/cpymem-2.c: Likewise.

Co-Authored-By: Juzhe-Zhong 

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index af5baf37e6a..43426a5326b 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -492,6 +492,7 @@ bool slide1_sew64_helper (int, machine_mode, machine_mode,
  machine_mode, rtx *);
 rtx gen_avl_for_scalar_move (rtx);
 void expand_tuple_move (rtx *);
+bool expand_block_move (rtx, rtx, rtx);
 machine_mode preferred_simd_mode (scalar_mode);
 machine_mode get_mask_mode (machine_mode);
 void expand_vec_series (rtx, rtx, rtx);
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 097457562bd..29e138e1da2 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -49,6 +49,7 @@
 #include "tm-constrs.h"
 #include "rtx-vector-builder.h"
 #include "targhooks.h"
+#include "predict.h"
 
 using namespace riscv_vector;
 
@@ -1991,6 +1992,206 @@ expand_tuple_move (rtx *ops)
 }
 }
 
+/* Used by cpymemsi in riscv.md .  */
+
+bool
+expand_block_move (rtx dst_in, rtx src_in, rtx length_in)
+{
+  /*
+memcpy:
+   mv a3, a0   # Copy destination
+loop:
+   vsetvli t0, a2, e8, m8, ta, ma  # Vectors of 8b
+   vle8.v v0, (a1) # Load bytes
+   add a1, a1, t0  # Bump pointer
+   sub a2, a2, t0  # Decrement count
+   vse8.v v0, (a3) # Store bytes
+   add a3, a3, t0  # Bump pointer
+   bnez a2, loop   # Any more?
+   ret # Return
+  */
+  if (!TARGET_VECTOR)
+return false;
+  HOST_WIDE_INT potential_ew
+= (MIN (MIN (MEM_ALIGN (src_in), MEM_ALIGN (dst_in)), BITS_PER_WORD)
+   / BITS_PER_UNIT);
+  machine_mode vmode = VOIDmode;
+  bool need_loop = true;
+  bool size_p = optimize_function_for_size_p (cfun);
+  rtx src, dst;
+  rtx end = gen_reg_rtx (Pmode);
+  rtx vec;
+  rtx length_rtx = length_in;
+
+  if (CONST_INT_P (length_in))
+{
+  HOST_WIDE_INT length = INTVAL (length_in);
+
+/* By using LMUL=8, we can copy as many bytes in one go as there
+   are bits in a vector register.  If the entire block thus fits,
+   we don't need a loop.  */
+if (length <= TARGET_MIN_VLEN)
+  {
+   need_loop = false;
+
+   /* If a single scalar load / store pair can do the job, leave it
+  to the scalar code to do that.  */
+   /* ??? If fast unaligned access is supported, the scalar code could
+  use suitably sized scalars irrespective of alignemnt.  If that
+  gets fixed, we have to adjust the test here.  */
+
+   if (pow2p_hwi (length) && length <= potential_ew)
+ return false;
+  }
+
+  /* Find the vector mode to use.  Using the largest possible element
+size is likely to give smaller constants, and thus potentially
+reducing code size.  However, if we need a loop, we need to update
+the pointers, and that is more complicated with a larger element
+size, unless we use an immediate, which prevents us from dynamically
+using the targets transfer size that the hart supports.  And then,
+unless we know the *exact* vector size of the hart, we'd need
+multiple vsetvli / branch statements, so it's not even a size win.
+If, in the future, we find an RISCV-V implementation that is slower
+for small element widths, we might allow larger element widths for
+loops too.  */
+  if (need_loop)
+   potential_ew = 1;
+  for (; potential_ew; potential_ew >>= 1)
+   {
+

Committed: Fix typo in add_options_for_riscv_v, add_options_for_riscv_zfh, add_options_for_riscv_d .

2023-10-01 Thread Joern Rennecke
Committed as obvious (RE doesn't compile without patch, and I know
what I meant when I wrote it).
commit 5f3da480e7541a9c29d655dccb2463fc5f3cf2c4
Author: Joern Rennecke 
Date:   Sun Oct 1 22:46:43 2023 +0100

Fix typo in add_options_for_riscv_v, add_options_for_riscv_zfh, 
add_options_for_riscv_d .

gcc/testsuite/
* lib/target-supports.exp (add_options_for_riscv_v):
Fix typo in first regexp.
(add_options_for_riscv_zfh): Likewise.
(add_options_for_riscv_d): Likewise.

diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index f3043b2af1b..64889fa6d34 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -2021,7 +2021,7 @@ proc riscv_get_arch { } {
 proc add_options_for_riscv_d { flags } {
 if { [lsearch $flags -march=*] >= 0 } {
# If there are multiple -march flags, we have to adjust all of them.
-   return [regsub -all -- 
{((?^|[[:space:]])-march=rv[[:digit:]]*[a-ce-rt-wy]*)d*} $flags \\1d ]
+   return [regsub -all -- 
{((?:^|[[:space:]])-march=rv[[:digit:]]*[a-ce-rt-wy]*)d*} $flags \\1d ]
 }
 if { [check_effective_target_riscv_d] } {
return "$flags"
@@ -2032,7 +2032,7 @@ proc add_options_for_riscv_d { flags } {
 proc add_options_for_riscv_v { flags } {
 if { [lsearch $flags -march=*] >= 0 } {
# If there are multiple -march flags, we have to adjust all of them.
-   return [regsub -all -- 
{((?^|[[:space:]])-march=rv[[:digit:]]*[a-rt-uwy]*)v*} $flags \\1v ]
+   return [regsub -all -- 
{((?:^|[[:space:]])-march=rv[[:digit:]]*[a-rt-uwy]*)v*} $flags \\1v ]
 }
 if { [check_effective_target_riscv_v] } {
return "$flags"
@@ -2043,8 +2043,8 @@ proc add_options_for_riscv_v { flags } {
 proc add_options_for_riscv_zfh { flags } {
 if { [lsearch $flags -march=*] >= 0 } {
# If there are multiple -march flags, we have to adjust all of them.
-   set flags [regsub -all -- {(?^|[[:space:]])-march=[[:alnum:]_.]*} 
$flags &_zfh ]
-   return [regsub -all -- 
{((?^|[[:space:]])-march=[[:alnum:]_.]*_zfh[[:alnum:]_.]*)_zfh} $flags \\1 ]
+   set flags [regsub -all -- {(?:^|[[:space:]])-march=[[:alnum:]_.]*} 
$flags &_zfh ]
+   return [regsub -all -- 
{((?:^|[[:space:]])-march=[[:alnum:]_.]*_zfh[[:alnum:]_.]*)_zfh} $flags \\1 ]
 }
 if { [check_effective_target_riscv_zfh] } {
return "$flags"


Re: committed [RISC-V]: Harden test scan patterns

2023-09-30 Thread Joern Rennecke
On Fri, 29 Sept 2023 at 14:54, Jeff Law  wrote:

> So I recommend we go forward with Joern's approach (so consider that an
> ACK for the trunk).   Joern  can you post a follow-up manual twiddle so
> that other ports can follow your example and avoid this problem?

The manual... so not in the general web pages, but the stuff under gcc/doc ?
I see that we have a description of scan-assembler* directives in
sourcebuild.texi ,
so I suppose that it should go there.  I suppose the advice should also apply to
scan-assembler-dem(-not), but not to scan-symbol-section .

The more I think about this, the more it feels like we are providing the wrong
tools and then are telling users they're using it incorrectly
(like "You're holding it wrong.").
Quoting dots with \. is not much of an issue, but prepending \t or \m
impairs legibility.  I like the obsoleted word-start/end markers \< / \>
much better, as they don't blend in with text.
^ as start-of-line marker is nice for legibility, but it will generally not
work with common semantics, as it'll be thrown off by white space,
and even more, by labels.

Also, we might have different directives for not scanning in LTO sections -
or just ignoring .ascii .  Or maybe the other way round - you have to do
something special if you want to scan inside strings, and by default we
don't look inside strings?
LTO information uses ascii, and ISTR sometimes also a zero-terminated
variant (asciiz?); There might also some string constant outputs, or stabs
information.
One possible rule I think might work is: if the RE doesn't mention a quote,
don't scan what's quoted inside double quotes.  Although we might to have
to look out for backslash-escaped quotes to find the proper end of a quoted
string.

Or should we instead make assembly scans specific to sections in which
assembly output goes, like text sections?  The danger is that we might miss
a text section by another name.  We can give an error if we find no text
section, but there might be a recognizable text section which is a red
herring besides the one that's hidden by some unusual name.


RFA: RISC-V: Make riscv_vector::legitimize_move adjust SRC in the caller. (Was: Remove mem-to-mem VLS move pattern[PR111566])

2023-09-30 Thread Joern Rennecke
>On 9/27/23 03:38, juzhe.zh...@rivai.ai wrote:
>>  >> Why add `can_create_pseudo_p ()` here? this will split after reload,
>>>>but we forbid that pattern between reload and split2?
>>
>> I have no ideal. Some fortran tests just need recognization of
>> mem-to-mem pattern before RA
>> I don't know the reason.
>But isn't that the key to understanding what's going on here?

Jeff law:
>There is nothing special about Fortran here.  Whatever problem this is
>working around will almost certainly show up again in other,
>non-Fortran, contexts.

I also ran into the problem of the  mov_mem_to_mem pattern
making ira combine the instructions output by my cpymem patch into
an unsplittable must-split pattern.  And just plain removing the mem-to-mem
pattern gives a newlib build failure.
The underlying problem is in the declaration of riscv_vector::legitimize_move .
The function gets passed by value a source and destination, and it either
emits (instructions for) a move and returns true, or does checks and/or
preparation statements and a modifications of its *copy of* src and returns.
IIRC, we don't want C++ pass-by-reference syntax in GCC source, so the
solution should be the tried-and trusted method of passing an explicit pointer
to rtl that we want modified.

I have attached a patch, regression tested for:
riscv-sim

riscv-sim/-march=rv32gcv_zfh/-mabi=ilp32d/-ftree-vectorize/--param=riscv-autovec-preference=scalable
riscv-sim/-march=rv32imac/-mabi=ilp32

riscv-sim/-march=rv64gcv_zfh_zvfh_zba_zbb_zbc_zicond_zicboz_zawrs/-mabi=lp64d/-ftree-vectorize/--param=riscv-autovec-preference=scalable
riscv-sim/-march=rv64imac/-mabi=lp64

Incidentally, the optimization that the mov_mem_to_mem made was invalid,
as it didn't check alignments, nor that the target supports unaligned
accesses with
a fast hardware implementation.  I think this optimization - with the
appropriate check
for hardware support - should be put into the non-vector path of the
cpymem expander,
simply as a relaxation of the alignment test for using scalars values
spanning multiple
addressable units.
Make riscv_vector::legitimize_move adjust SRC in the caller.

2023-09-29  Joern Rennecke  
Juzhe-Zhong  

PR target/111566

gcc/
* config/riscv/riscv-protos.h (riscv_vector::legitimize_move):
Change second parameter to rtx *.
* config/riscv/riscv-v.cc (risv_vector::legitimize_move): Likewise.
* config/riscv/vector.md: Changed callers of
riscv_vector::legitimize_move.
* config/riscv/vector.md (*mov_mem_to_mem): Remove.

gcc/testsuite/

* gcc.target/riscv/rvv/autovec/vls/mov-1.c: Adapt test.
* gcc.target/riscv/rvv/autovec/vls/mov-10.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/mov-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/mov-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/mov-7.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/mov-8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/mov-9.c: Ditto.1
* gcc.target/riscv/rvv/autovec/vls/mov-2.c: Removed.
* gcc.target/riscv/rvv/autovec/vls/mov-4.c: Removed.
* gcc.target/riscv/rvv/autovec/vls/mov-6.c: Removed.
* gcc.target/riscv/rvv/fortran/pr111566.f90: New test.

Co-Authored-By: Juzhe-Zhong  

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 368982a447b..af5baf37e6a 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -421,7 +421,7 @@ rtx expand_builtin (unsigned int, tree, rtx);
 bool check_builtin_call (location_t, vec, unsigned int,
   tree, unsigned int, tree *);
 bool const_vec_all_same_in_range_p (rtx, HOST_WIDE_INT, HOST_WIDE_INT);
-bool legitimize_move (rtx, rtx);
+bool legitimize_move (rtx, rtx *);
 void emit_vlmax_vsetvl (machine_mode, rtx);
 void emit_hard_vlmax_vsetvl (machine_mode, rtx);
 void emit_vlmax_insn (unsigned, unsigned, rtx *);
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 26700cfc732..097457562bd 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -1217,10 +1217,12 @@ get_frm_mode (rtx operand)
 }
 
 /* Expand a pre-RA RVV data move from SRC to DEST.
-   It expands move for RVV fractional vector modes.  */
+   It expands move for RVV fractional vector modes.
+   Return true if the move as already been emitted.  */
 bool
-legitimize_move (rtx dest, rtx src)
+legitimize_move (rtx dest, rtx *srcp)
 {
+  rtx src = *srcp;
   machine_mode mode = GET_MODE (dest);
   if (CONST_VECTOR_P (src))
 {
@@ -1238,7 +1240,7 @@ legitimize_move (rtx dest, rtx src)
{
  /* Need to force register if mem <- !reg.  */
  if (MEM_P (dest) && !REG_P (src))
-   src = force_reg (mode, src);
+   *srcp = force_reg (mode, src);
 
  return fal

Re: committed [RISC-V]: Harden test scan patterns

2023-09-27 Thread Joern Rennecke
On Wed, 27 Sept 2023 at 18:22, Jeff Law  wrote:

> It would help to describe how these patterns were under specified so
> that folks don't continue to make the same mistake as new tests get added.

dg-final scan-assembler, scan-assembler-not, and scan-assembler-times
use a tcl regular expression (often referred to abbreviated as RE), as
described in https://www.tcl.tk/man/tcl8.4/TclCmd/re_syntax.html .

If your RE is not specific enough, it can match LTO information that the
compiler places into its assembly output when the relevant options are
provided, which is common when running tests where the test harness
iterates over a number of optimization option combinations.
Note that '.' is an atom that can match any character.  If you want to
match a dot specifically, you have to escape it with a backslash: '\.' .
When you are matching an instruction mnemonic, an effective way to
avoid matching in LTO information is to enforce matching of word start
(\m) and/or word end (\M) .
Note also that the backslash has to be quoted.  If the RE is enclosed in
'"' quotes, extra backslashes are needed.  That is not necessary when it
is enclosed in curly braces.

For example, "ld.w" will be matched in:

.ascii  "h\227\022\212ld@w\251jr\254'\320\255vwj\252\026\016\364"

If you write {\mld\.w\M} instead, you avoid this problem.

#

Where should this go?  Maybe somewhere in or linked from
https://gcc.gnu.org/codingconventions.html , Testsuite conventions?


committed [RISC-V]: Harden test scan patterns

2023-09-27 Thread Joern Rennecke
I got tired of scan tests failing when they have an underspecified
pattern that matches LTO information, so I did a global replace for
the most common form of such scan patterns in the gcc.target/riscv
testsuite.

regression tested for:
riscv-sim

riscv-sim/-march=rv32gcv_zfh/-mabi=ilp32d/-ftree-vectorize/--param=riscv-autovec-preference=scalable
riscv-sim/-march=rv32imac/-mabi=ilp32

riscv-sim/-march=rv64gcv_zfh_zvfh_zba_zbb_zbc_zicond_zicboz_zawrs/-mabi=lp64d/-ftree-vectorize/--param=riscv-autovec-preferenc
e=scalable
riscv-sim/-march=rv64imac/-mabi=lp64

Committed as obvious.
commit d326bb6d7588425d013791299272f913fb23e56d
Author: Joern Rennecke 
Date:   Wed Sep 27 10:05:13 2023 +0100

Harden scan patterns with a bit of scripting:

$ egrep -r 'scan-assembler(|-not|-times) "[[:alnum:].]{1,7}"' riscv
$ egrep -rl 'scan-assembler(|-not|-times) "[[:alnum:].]{1,7}"' riscv > files
$ cat edcmds
g/\(scan-assembler\(\|-not\|-times\) 
\+\)"\([[:alnum:]]\{1,5\}\)\.\([[:alpha:].]\{1,3\}\)"/s//\1{\\m\3\\.\4\\M}/
g/\(scan-assembler\(\|-not\|-times\) 
\+\)"\([[:alnum:]]\{1,7\}\)"/s//\1{\\m\3}/
w
q
$ sed 's/.*/ed & < edcmds/' < files > tmp
$ source tmp

gcc/testsuite/
* gcc.target/riscv/shift-shift-1.c: Avoid spurious pattern matches.
* gcc.target/riscv/shift-shift-3.c: Likewise.
* gcc.target/riscv/zba-shNadd-01.c: Likewise.
* gcc.target/riscv/zba-shNadd-02.c: Likewise.
* gcc.target/riscv/zbb-andn-orn-xnor-01.c: Likewise.
* gcc.target/riscv/zbb-andn-orn-xnor-02.c: Likewise.
* gcc.target/riscv/zbb-min-max.c: Likewise.
* gcc.target/riscv/zero-extend-1.c: Likewise.
* gcc.target/riscv/zero-extend-2.c: Likewise.
* gcc.target/riscv/zero-extend-3.c: Likewise.
* gcc.target/riscv/zero-extend-4.c: Likewise.
* gcc.target/riscv/zero-extend-5.c: Likewise.
* gcc.target/riscv/_Float16-soft-2.c: Likewise.
* gcc.target/riscv/_Float16-soft-3.c: Likewise.
* gcc.target/riscv/_Float16-zfh-1.c: Likewise.
* gcc.target/riscv/_Float16-zfh-2.c: Likewise.
* gcc.target/riscv/_Float16-zfh-3.c: Likewise.
* gcc.target/riscv/and-extend-1.c: Likewise.
* gcc.target/riscv/and-extend-2.c: Likewise.
* gcc.target/riscv/pr108987.c: Likewise.
* gcc.target/riscv/ret-1.c: Likewise.
* gcc.target/riscv/rvv/autovec/align-1.c: Likewise.
* gcc.target/riscv/rvv/autovec/align-2.c: Likewise.
* gcc.target/riscv/zba-shNadd-04.c: Likewise.
* gcc.target/riscv/zba-shNadd-07.c: Likewise.
* gcc.target/riscv/zbb-rol-ror-02.c: Likewise.
* gcc.target/riscv/zbbw.c: Likewise.
* gcc.target/riscv/zbc32.c: Likewise.
* gcc.target/riscv/zbc64.c: Likewise.
* gcc.target/riscv/zbkb32.c: Likewise.
* gcc.target/riscv/zbkb64.c: Likewise.
* gcc.target/riscv/zbkc32.c: Likewise.
* gcc.target/riscv/zbkc64.c: Likewise.
* gcc.target/riscv/zbkx32.c: Likewise.
* gcc.target/riscv/zbkx64.c: Likewise.
* gcc.target/riscv/zfa-fleq-fltq.c: Likewise.
* gcc.target/riscv/zfa-fli-zfh.c: Likewise.
* gcc.target/riscv/zfa-fli.c: Likewise.
* gcc.target/riscv/zknd64.c: Likewise.
* gcc.target/riscv/zksed32.c: Likewise.
* gcc.target/riscv/zksed64.c: Likewise.
* gcc.target/riscv/zksh32.c: Likewise.
* gcc.target/riscv/zksh64.c: Likewise.
* gcc.target/riscv/_Float16-soft-1.c: Likewise.
* gcc.target/riscv/_Float16-zfhmin-1.c: Likewise.
* gcc.target/riscv/_Float16-zfhmin-2.c: Likewise.
* gcc.target/riscv/_Float16-zfhmin-3.c: Likewise.
* gcc.target/riscv/_Float16-zhinxmin-1.c: Likewise.
* gcc.target/riscv/_Float16-zhinxmin-2.c: Likewise.
* gcc.target/riscv/_Float16-zhinxmin-3.c: Likewise.
* gcc.target/riscv/fle-ieee.c: Likewise.
* gcc.target/riscv/fle-snan.c: Likewise.
* gcc.target/riscv/flef-ieee.c: Likewise.
* gcc.target/riscv/flef-snan.c: Likewise.
* gcc.target/riscv/flt-ieee.c: Likewise.
* gcc.target/riscv/flt-snan.c: Likewise.
* gcc.target/riscv/fltf-ieee.c: Likewise.
* gcc.target/riscv/fltf-snan.c: Likewise.
* gcc.target/riscv/interrupt-1.c: Likewise.
* gcc.target/riscv/interrupt-mmode.c: Likewise.
* gcc.target/riscv/interrupt-smode.c: Likewise.
* gcc.target/riscv/interrupt-umode.c: Likewise.
* gcc.target/riscv/pr106888.c: Likewise.
* gcc.target/riscv/pr89835.c: Likewise.
* gcc.target/riscv/shift-and-1.c: Likewise

Re: RISC-V: Added support for CRC.

2023-09-26 Thread Joern Rennecke
On Tue, 26 Sept 2023 at 14:18, Jeff Law  wrote:

>  But the Coremark code is what it is.  This isn't a whole lot
> different than the work in the 90s which rewrote loops and compromised
> some of the spec benchmarks, or the eqntott hack to simplify that one
> key loop in eqntott.

I think the stated purpose of the benchmark matters.  If dhrystone had been
pushed as an abstraction-penalty benchmark, it would have been fine to
present results with WPA, inlining and dead code elimination as ordinary
dhrystone results.  But since it's supposed to exercise specific hardware
features, and not have the tests for these optimized away, that's not
appropriate.

So, first, we make the compiled program perform the work that the benchmark
was supposed to include in the measurement, just more efficiently.
Second, we not only optimize the benchmark, but also make the target-optimized
code generation available for other programs.  For new programs targeted at
GNU C, that is minimally archived by providing a built-in function,
and in general
for new code, by being able to replicate the idiom from coremark that
is recognized
by GCC.  The mere existence of a C idiom in a popular benchmark also makes this
idiom a common idiom, if it hasn't already been that before.
As we find new patterns that are used to implement CRC which would
be better replaced with a target-specific implementation, we can add these.

This is similar to rotate operations, which are supported directly by
some processors,
and even for other targets, there are generally preferred ways to
expand the code,
but there are a couple of different variants depending on the
available instruction set,
registers, and the microarchitecture (pipeline, latency etc).  We
started out with
one patterns that was recognized, and as new patterns were identified
in C code, we
improved GCC to recognize these other patterns.

> What ultimately pushed us to keep moving forward on this effort was
> discovering numerous CRC loop implementations out in the wild, including
> 4 implementations (IIRC) in the kernel itself.

I have always assumed that such must exist (CRCs are useful for a number
of problems, and besides, they wouldn't have been included in coremark as
a part of the workload if they were not relevant), but it is good to have
confirmation, and even better to have code that can detect and analyse a
entire class of idioms that is in such widespread use.

This still leaves room for further improvements, like detecting fully-unrolled
code, table lookup, or slice-by-N, and replacing them with better
target-optimized code where this is indicated by the optimization flags to
save execution time or code/rodata size.  Not something we have to tackle
now, but just because we don't do it now, doesn't mean we couldn't address
these in the future if that appears worthwhile.

> I can easily see creating a clmul RTL opcode for targets which support
> it and hoisting the clmul vs lookup table selection into generic code.
> I'm still pondering if we're likely to ever see cases where we want a
> vector clmul intrinsic or support in the autovectorizer for clmul.
> We've certainly got targets with vector clmul in the ISA, the question
> is using it.

If we aim for optimal code, I think it more likely that we want to detect a
block CRC computation, and have a target expander decide to do that
with inline code or a library call that uses vectorized clmul.  At the time
we add such block-CRC expansion code, it also makes sense to add a
builtin for block CRC so that new GNU C programs can directly request
that functionality without having to go through the cargo cult of matching
a supported idiom.

Now, the library might be written in GNU C, and for that it might be useful
to have a vector clmul intrinsic so that we can express this algorithm more
easily.

> Probably the biggest task in that space right now is to see if we can
> avoid the symbolic execution engine by re-using ranger.

I'll be interested to see what you'll come up with, but if reverting to the
symbolic execution engine, the compile time cost isn't much if you only
use it for a proper match.  So whatever heuristics are used before deciding
to use the engine matter.  Can all the cases detected by the engine be
recognized as a loop with a reduction?  We might use different heuristics
for different optimization levels, i.e. allow more false negatives at -O1,
and more false positives at -O2 / -fexpensive-optimizations.

> To reiterate the real goal here is to take code as-is and make it
> significantly faster.  While the original target was Coremark, we've
> found similar bitwise implementations of CRCs all over the place.
> There's no good reason that code should have to change.
>
> The idea of exposing a CRC builtin was an intermediate step that would
> allow those willing to change their code or writing new code to write
> their CRC in a trivial way and let the compiler figure out a sensible
> implementation while we clean up the 

Re: RISC-V: Added support for CRC.

2023-09-24 Thread Joern Rennecke
On Sun, 24 Sept 2023 at 12:41, Alexander Monakov  wrote:
>
>
> On Sun, 24 Sep 2023, Joern Rennecke wrote:
>
> > It is a stated goal of coremark to test performance for CRC.
>
> I would expect a good CRC benchmark to print CRC throughput in
> bytes per cycle or megabytes per second.
>
> I don't see where Coremark states that goal. In the readme at
> https://github.com/eembc/coremark/blob/main/README.md
> it enumerates the three subcategories (linked list, matrix ops,
> state machine) and indicates that CRC is used for validation.

At https://www.eembc.org/coremark/index.php , they state under the
Details heading:

...
Replacing the antiquated Dhrystone benchmark, Coremark contains
implementations of the following algorithms: list processing (find and
sort), matrix manipulation (common matrix operations), state machine
(determine if an input stream contains valid numbers), and CRC (cyclic
redundancy check).
...
The CRC algorithm serves a dual function; it provides a workload
commonly seen in embedded applications and ensures correct operation
of the CoreMark benchmark, essentially providing a self-checking
mechanism.
...

They also point to a whitepaper there, which states:

Since CRC is also a commonly used function in embedded applications, this
calculation is included in the timed portion of the CoreMark.

> If it claims that elsewhere, the way its code employs CRC does not
> correspond to real-world use patterns, like in the Linux kernel for
> protocol and filesystem checksumming, or decompression libraries.

That may be so, but we should still strive to optimize the code to
obtain the intended purpose.

> It is, however, representative of the target CPU's ability to run
> those basic bitwise ops with good overlap with the rest of computation,
> which is far more relevant for the real-world performance of the CPU.

That depends on how much CRC calculation your application does.  You can
disable specific compiler optimizations in GCC for specialized testing.
> > thus if a compiler fails to translate this into the CRC implementation
> > that would be used for performance code, the compiler frustrates this
> > goal of coremark to give a measure of CRC calculation performance.
>
> Are you seriously saying that if a customer chooses CPU A over CPU B
> based on Coremark scores, and then discovers that actual performance
> in, say, zlib (which uses slice-by-N for CRC) is better on CPU B, that's
> entirely fair and the benchmarks scores they saw were not misleading?

Using coremark as a yardstick for any one application is always going to be
likely to give an inaccurate assessment - unless your application is
identical to
coremark.  I don't see why whatever implementation is chosen for the
short-length
CRC in coremark should be closer or farther from slice-by-N CRC, I would expect
it to be pseudo-random.  Unless CPU B has worse GCC support or
available hardware
instruction for short-range CRC, in which case the manufacturer might
considering
improving support (particularily if it's about GCC support ;-)
Actually, if CRC optimization is implemented via table lookup, on both
CPU A and B,
it gets a bit closer to slice-by-N, since both do table lookups,
although for slice-by-N you
trade latency for register pressure.

Any single benchmark can't be a good performance predictor for all applications.
If you care a lot about performance for a particular load, you should
benchmark that load,
or something that is known to be a close proxy.

> > > At best we might have
> > > a discussion on providing a __builtin_clmul for carry-less multiplication
> > > (which _is_ a fundamental primitive, unlike __builtin_crc), and move on.
> >
> > Some processors have specialized instructions for CRC computations.
>
> Only for one or two fixed polynomials. For that matter, some processors
> have instructions for AES and SHA, but that doesn't change that clmul is
> a more fundamental and flexible primitive than "CRC".

So it is, but when analyzing user programs that haven't been written by experts
with a focus on performance, CRC is more likely to come up than clmul .
I agree that it would make sense to have a builtin for clmul that can be used
uniformly across architectures that support this operation, but I'm
not volunteering
to write a patch for that.

> If only the "walk before you run" logic was applied in favor of
> implementing a portable clmul builtin prior to all this.

I started writing the CRC patch for an architecture that didn't have
clmul as a basecase
instruction, so a clmul builtin would not have helped.
>
> > A library can be used to implement built-ins in gcc (we still need to
> > define one for block operations, one step at a time...).  However,
> > someone or something needs to rewrite the existing code to use the
> > lib

Re: RISC-V: Added support for CRC.

2023-09-23 Thread Joern Rennecke
Mariam Harutyunyan:
+++ b/gcc/ChangeLog
@@ -1,3 +1,45 @@
+2023-08-03  Mariam Arutunian  
+

It is common courtesy to include all authors in the list of authors
for the ChangeLog; also,
this may help people in the future understand the history of the code better.
While must of your patch is new, it still contains non-trivial parts of mine
( https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591744.html )
.And stripping out the comment why, currently,  we can't use linkonce
for crc tables on the the RISC-V target is
not helpful to someone who wants to understand the code.

See also the discussion to put this into loop distribution:
https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591821.html
https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591866.html

Mariam Harutyunyan:
> It adds internal
> functions and built-ins specifically designed to handle CRC computations
> efficiently.

This sounds like this is a finished work, although defining built-in
functions to calculate the CRC of single data elements and recognizers
for some C idioms that do these calculations,
is just a starting point.

Alexander Monakov :

> Jeff, as I understand this all is happening only because Coremark contains
> use of bitwise CRC that affects benchmark scores. In another universe where
> - Coremark was careful to checksum outputs outside of timed sections, or
> - implemented CRC in a manner that is not transparent to the compiler, or
> - did not use CRC at all
> we would not be spending effort on this, correct?

It is a stated goal of coremark to test performance for CRC.  They do
not use a library call
to implement CRC, but a specific bit-banging algorithm they have
chosen.  That algorithm is,
for the vast majority of processors, not representative of the targets
performance potential in calculating CRCs, thus if a compiler fails to
translate this into the CRC implementation that
would be used for performance code, the compiler frustrates this goal
of coremark to give a measure of CRC calculation performance.

> At best we might have
> a discussion on providing a __builtin_clmul for carry-less multiplication
> (which _is_ a fundamental primitive, unlike __builtin_crc), and move on.

Some processors have specialized instructions for CRC computations.

> Instead, efficient CRC loops have the following structure:
> - they carry an unreduced remainder in the loop, performing final reduction
>  modulo polynomial only once after the loop — this halves the CLMUL count;
> - they overlap multiple CLMUL chains to make the loop throughput-bound
> rather than latency-bound. The typical unroll factor is about 4x-8x.

If you want to recognize a loop that does a CRC on a block, it makes
sense to start with recognizing the CRC computation for single array
elements first.  We have to learn to
walk before we can run.

Nore that my initial patch already contained a match.pd stanza to
recognize two chained single-element CRC calculations.

Jeff Law:
> The intention is to provide a useful
> builtin_crc while at the same time putting one side of the
> infrastructure we need for automatic detection of CRC loops and turning
> them into table lookups or CLMULs.

Note that when optimizing for size, for a target with tiny memory, or
when using a non-constant (or constant but undiscoverable by the
compiler) polynom, we can't use the table lookup.  But still, even if
we don't have CLmul, we can generally speed up CRC computation over
the coremark algorithm by using something more suited to the target,
like the crcu16_1 function I
put into comments in my patch.

Alexander Monakov :
> So... just provide a library? A library code is easier to develop and audit,
> it can be released independently, people can use it with their compiler of
> choice. Not everything needs to be in libgcc.

A library can be used to implement built-ins in gcc (we still need to
define one for block operations, one step at a time...).
However, someone or something needs to rewrite the existing code to
use the library.
It is commonly accepted that an efficient way to do this is to make a
compiler do the
necessary transformations, as long as it can be made to churn out good
enough code.

Alexander Monakov:
> Useful to whom? The Linux kernel? zlib, bzip2, xz-utils? ffmpeg?
> These consumers need high-performance blockwise CRC, offering them
> a latency-bound elementwise CRC primitive is a disservice. And what
> should they use as a fallback when __builtin_crc is unavailable?

We can provide a fallback implementation for all targets with table
lookup and/or shifts .

Alexander Monakov:
> I think offering a conventional library for CRC has substantial advantages.
Are you volunteering?  It would make our work to emit code for block
CRC easier if we could
just use a library call when we recognize a block CRC (although making
that recognizer is likely still considerable work if we want to get
good coverage over different coding styles).

Although maybe Oleg Endo's library, as mentioned 

RFC: RISC-V sign extension dead code elimination

2023-08-29 Thread Joern Rennecke
In the patch call we talked about sign extsnsion elimination, so I dug
up this patch set that I did a while ago.  It is still lacking some
documentation and testing in a more recent base version;
I only adjusted the common.opt part context for the patch to apply.
Author: Joern Rennecke 
Date:   Thu Mar 10 12:22:45 2022 +

Added ext-dce.cc pass, for deleting dead sign / zero extensions.

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 31ff95500c9..6e7ad5ff966 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1374,6 +1374,7 @@ OBJS = \
explow.o \
expmed.o \
expr.o \
+   ext-dce.o \
fibonacci_heap.o \
file-prefix-map.o \
final.o \
diff --git a/gcc/common.opt b/gcc/common.opt
index c69205f936a..27d6d15fe6d 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -3620,4 +3620,12 @@ Widen induction variables that have undefined overflow 
behaviour where convenien
 Common Var(flag_ipa_ra) Optimization
 Use caller save register across calls if possible.
 
+fext-dce
+Common Var(flag_ext_dce, 1) Optimization Init(0)
+Perform dead code elimination on zero and sign extensions with special 
dataflow analysis.
+
+fext-dce-pre
+Common Var(flag_ext_dce, 2)
+Perform dead code elimination on zero and sign extensions with special 
dataflow analysis.  Insert extensions on edges for partial redundancy 
elimination.
+
 ; This comment is to ensure we retain the blank line above.
diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index 0ab9ffe3c0b..d30464a913c 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -166,6 +166,20 @@
   [(set_attr "type" "bitmanip,load")
(set_attr "mode" "")])
 
+;; Combine has a different idea about canonical rtl.
+;; Example: int f (int i) { return (short)i; }
+(define_insn_and_split "*extendhidi_combine"
+  [(set (match_operand:DI 0 "register_operand")
+   (sign_extend:DI
+ (ashiftrt:SI
+   (subreg:SI (ashift:DI (match_operand:DI 1 "register_operand")
+ (const_int 16)) 0)
+   (const_int 16]
+  "TARGET_ZBB"
+  "#"
+  "&& 1"
+  [(set (match_dup 0) (sign_extend:DI (subreg:HI (match_dup 1) 0)))])
+
 (define_insn "*zero_extendhi2_zbb"
   [(set (match_operand:GPR0 "register_operand" "=r,r")
(zero_extend:GPR
diff --git a/gcc/df-scan.cc b/gcc/df-scan.cc
index 9b2375d561b..59b0a82dcc9 100644
--- a/gcc/df-scan.cc
+++ b/gcc/df-scan.cc
@@ -78,7 +78,6 @@ static void df_get_eh_block_artificial_uses (bitmap);
 
 static void df_record_entry_block_defs (bitmap);
 static void df_record_exit_block_uses (bitmap);
-static void df_get_exit_block_use_set (bitmap);
 static void df_get_entry_block_def_set (bitmap);
 static void df_grow_ref_info (struct df_ref_info *, unsigned int);
 static void df_ref_chain_delete_du_chain (df_ref);
@@ -3638,7 +3637,7 @@ df_epilogue_uses_p (unsigned int regno)
 
 /* Set the bit for regs that are considered being used at the exit. */
 
-static void
+void
 df_get_exit_block_use_set (bitmap exit_block_uses)
 {
   unsigned int i;
diff --git a/gcc/df.h b/gcc/df.h
index bd329205d08..9807a3e87f9 100644
--- a/gcc/df.h
+++ b/gcc/df.h
@@ -1090,6 +1090,7 @@ extern bool df_epilogue_uses_p (unsigned int);
 extern void df_set_regs_ever_live (unsigned int, bool);
 extern void df_compute_regs_ever_live (bool);
 extern void df_scan_verify (void);
+extern void df_get_exit_block_use_set (bitmap);
 
 
 /*
diff --git a/gcc/ext-dce.cc b/gcc/ext-dce.cc
new file mode 100644
index 000..9d264972c7f
--- /dev/null
+++ b/gcc/ext-dce.cc
@@ -0,0 +1,545 @@
+/* RTL dead zero/sign extension (code) elimination.
+   Copyright (C) 2000-2022 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "rtl.h"
+#include "tree.h"
+#include "memmodel.h"
+#include "insn-config.h"
+#include "emit-rtl.h"
+#include "recog.h"
+#include "cfganal.h"
+#include "tree-pass.h"
+#include "cfgrtl.h"
+

Re: Re: cpymem for RISCV with v extension

2023-08-15 Thread Joern Rennecke
On Sat, 5 Aug 2023 at 00:35, 钟居哲  wrote:
>
> >> Testing what specifically?  Are you asking for correctness tests,
> >> performance/code quality tests?
>
> Add memcpy test using RVV instructions, just like we are adding testcases for 
> auto-vectorization support.

I wanted to get in the test infrastructure first.

> void foo (int32_t * a, int32_t * b, int num)
> {
>   memcpy (a, b, num);
> }
>
>
> In my downstream LLVM/GCC codegen:
> foo:
> .L2:
> vsetvli a5,a2,e8,m8,ta,ma
> vle8.v  v24,(a1)
> sub a2,a2,a5
> vse8.v  v24,(a0)
> add a1,a1,a5
> add a0,a0,a5
> bne a2,zero,.L2
> ret

Yeah, it does that.

>
> Another example:
> void foo (int32_t * a, int32_t * b, int num)
> {
>   memcpy (a, b, 4);
> }
>
>
> My downstream LLVM/GCC assembly:
>
> foo:
> vsetvli zero,16,e8,m1,ta,ma
> vle8.v v24,(a1)
> vse8.v v24,(a0)
> ret

copying 16 bytes when asked to copy 4 is problematic.  Mine copies 4.

Note also for:
typedef struct { int a[31]; } s;

void foo (s *a, s *b)
{
  *a = *b;
}

You get:

vsetivlizero,31,e32,m8,ta,ma
vle32.v v8,0(a1)
vse32.v v8,0(a0)

Using memcpy, the compiler unfortunately discards the alignment.

> emit_insn (gen_pred_store...)

Thanks to pointing me in the right direction.  From the naming of the
patterns, the dearth of comments, and the default behaviour of the
compiler when optimizing with generic optimization options (i.e. no
vectorization) I had assumed that the infrastructure was still
missing.

I have attached a re-worked patch that uses pred_mov / pred_store and
as adapted to the refactored modes.
It lacks the strength reduction of the opaque pattern version for -O3,
though.  Would people also like to see that expanded into RTL?  Or
should I just drop in the opaque pattern for that?  Or not at all,
because everyone uses Superscalar Out-Of-Order execution?
commit 1f4b7a8e6798acab1f79de38e85d9d080a76eb4a
Author: Joern Rennecke 
Date:   Tue Aug 15 08:18:53 2023 +0100

cpymem using pred_mov / pred_store and adapted to mode refactoring.

2023-07-12  Ju-Zhe Zhong 
Joern Rennecke  

gcc/
* config/riscv/riscv-protos.h (riscv_vector::expand_block_move):
Declare.
* config/riscv/riscv-v.cc (riscv_vector::expand_block_move):
New function.
* config/riscv/riscv.md (cpymemsi): Use 
riscv_vector::expand_block_move.
Change to ..
(cpymem) .. this.

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 2fbed04ff84..70ffdcdf180 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -315,6 +315,7 @@ bool slide1_sew64_helper (int, machine_mode, machine_mode,
  machine_mode, rtx *);
 rtx gen_avl_for_scalar_move (rtx);
 void expand_tuple_move (rtx *);
+bool expand_block_move (rtx, rtx, rtx);
 machine_mode preferred_simd_mode (scalar_mode);
 machine_mode get_mask_mode (machine_mode);
 void expand_vec_series (rtx, rtx, rtx);
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 5f9b296c92e..ea96a0ef84d 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -49,6 +49,7 @@
 #include "tm-constrs.h"
 #include "rtx-vector-builder.h"
 #include "targhooks.h"
+#include "predict.h"
 
 using namespace riscv_vector;
 
@@ -2379,6 +2380,192 @@ expand_tuple_move (rtx *ops)
 }
 }
 
+/* Used by cpymemsi in riscv.md .  */
+
+bool
+expand_block_move (rtx dst_in, rtx src_in, rtx length_in)
+{
+  /*
+memcpy:
+   mv a3, a0   # Copy destination
+loop:
+   vsetvli t0, a2, e8, m8, ta, ma  # Vectors of 8b
+   vle8.v v0, (a1) # Load bytes
+   add a1, a1, t0  # Bump pointer
+   sub a2, a2, t0  # Decrement count
+   vse8.v v0, (a3) # Store bytes
+   add a3, a3, t0  # Bump pointer
+   bnez a2, loop   # Any more?
+   ret # Return
+  */
+  if (!TARGET_VECTOR)
+return false;
+  HOST_WIDE_INT potential_ew
+= (MIN (MIN (MEM_ALIGN (src_in), MEM_ALIGN (dst_in)), BITS_PER_WORD)
+   / BITS_PER_UNIT);
+  machine_mode vmode = VOIDmode;
+  bool need_loop = true;
+  bool size_p = optimize_function_for_size_p (cfun);
+  rtx src, dst;
+  rtx end = gen_reg_rtx (Pmode);
+  rtx vec;
+  rtx length_rtx = length_in;
+
+  if (CONST_INT_P (length_in))
+{
+  HOST_WIDE_INT length = INTVAL (length_in);
+
+/* By using LMUL=8, we can copy as many bytes in one go as there
+   are bits in a vector register.  If the entire block thus fits,
+   we don't need a loop.  */
+if (length <= TARGET_MIN_VLEN)
+  {
+  

Re: cpymem for RISCV with v extension

2023-08-14 Thread Joern Rennecke
On Fri, 4 Aug 2023 at 21:52, Jeff Law  wrote:

> > diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
> > index b4884a30872..e61110fa3ad 100644
> > --- a/gcc/config/riscv/riscv-v.cc
> > +++ b/gcc/config/riscv/riscv-v.cc
> > @@ -49,6 +49,7 @@
> >   #include "tm-constrs.h"
> >   #include "rtx-vector-builder.h"
> >   #include "targhooks.h"
> > +#include "predict.h"
> Not sure this is needed, but I didn't scan for it explicitly.  If it's
> not needed, then remove it.

It is needed to declare optimize_function_for_size_p .


Re: RISCV test infrastructure for d / v / zfh extensions

2023-08-14 Thread Joern Rennecke
 got to fix that.

> > +proc add_options_for_riscv_v { flags } {
> > +if { [lsearch $flags -march=*] >= 0 } {
> > + # If there are multiple -march flags, we have to adjust all of them.
> > + # ??? Is there a way to make the match specific to a full list 
> > element?
> > + # as it is, we might match something inside a string.
> > + return [regsub -all -- {(-march=rv[[:digit:]]*[a-rt-uwy]*)v*} $flags 
> > \\1v ]
>
> Is iterating over the list elements and returning a new list
> not an option?  Or would that break something else?

I was afraid making it overly complex increases the likelihood of a
coding error,
and hoping there'd be some simple solution to request a list element start/end
that I wasn't aware of.

Hmm, come to think of it, even if I can't exactly match a list element, I could
match a list start or delimiter.  I just have to make sure i put the delimiter I
matched back.  And I can't match it as a regexp part at the end too, although
I could match it with the positive look ahead pattern; but I don't actually want
to match the end.
So we can make sure we are dealing with a list element
that looks like a -march option.  (You could still construe a multi-word option
that uses a string starting with -march as a pathname or similar, but I suppose
you'd deserve whatever you get then.  I don't see a bobby tables scenario
here.)

I also found one comment pasto.

I have attached the amended patch - not tested yet.  I hope to get some opinions
today on the patch call regarding the test naming and the behaviour of
dg-require-effective-target riscv_v_ok when testing an architecture variant with
vector support, so the idea is to test after any input from that is taken into
account, and also maybe in the context of the cpymem patch.
2023-08-15  Joern Rennecke  

gcc/testsuite/
* lib/target-supports.exp (check_effective_target_rv_float_abi_soft):
New proc.
(check_effective_target_riscv_d): Likewise.
(check_effective_target_riscv_v): Likewise.
(check_effective_target_riscv_zfh): Likewise.
(check_effective_target_riscv_v_ok): likewise.
(check_effective_target_riscv_zfh_ok): Likewise.
(riscv_get_arch, add_options_for_riscv_v): Likewise.
(add_options_for_riscv_zfh): Likewise.
(add_options_for_riscv_d): Likewise.

diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 92b6f69730e..cdd00b4a064 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -1887,6 +1887,167 @@ proc check_effective_target_rv64 { } {
 }]
 }
 
+# Return 1 if the target abi is __riscv_float_abi_soft, 0 otherwise.
+# Cache the result.
+
+proc check_effective_target_rv_float_abi_soft { } {
+# Check that we are compiling for RV64 by checking the xlen size.
+return [check_no_compiler_messages riscv_riscv_float_abi_soft assembly {
+   #ifndef __riscv_float_abi_soft
+   #error "Not __riscv_float_abi_soft"
+   #endif
+}]
+}
+
+# Return 1 if the target arch supports the double precision floating point
+# extension, 0 otherwise.  Cache the result.
+
+proc check_effective_target_riscv_d { } {
+return [check_no_compiler_messages riscv_ext_d assembly {
+   #ifndef __riscv_d
+   #error "Not __riscv_d"
+   #endif
+}]
+}
+
+# Return 1 if the target arch supports the vector extension, 0 otherwise.
+# Cache the result.
+
+proc check_effective_target_riscv_v { } {
+return [check_no_compiler_messages riscv_ext_v assembly {
+   #ifndef __riscv_v
+   #error "Not __riscv_v"
+   #endif
+}]
+}
+
+# Return 1 if the target arch supports half float, 0 otherwise.
+# Note, this differs from the test performed by
+# /* dg-skip-if "" { *-*-* } { "*" } { "-march=rv*zfh*" } */
+# in that it takes default behaviour into account.
+# Cache the result.
+
+proc check_effective_target_riscv_zfh { } {
+return [check_no_compiler_messages riscv_ext_zfh assembly {
+   #ifndef __riscv_zfh
+   #error "Not __riscv_zfh"
+   #endif
+}]
+}
+
+# Return 1 if we can execute code when using dg-add-options riscv_v
+
+proc check_effective_target_riscv_v_ok { } {
+# If the target already supports v without any added options,
+# we may assume we can execute just fine.
+if { [check_effective_target_riscv_v] } {
+   return 1
+}
+
+# check if we can execute vector insns with the given hardware or
+# simulator
+set gcc_march [regsub {[[:alnum:]]*} [riscv_get_arch] ]
+if { [check_runtime ${gcc_march}_exec {
+ int main() {  asm("vsetivli t0, 9, e8, m1, tu, ma"); return 0; } } 
"-march=${gcc_march}"] } {
+   return 1
+}
+
+# Possible future extensions: If the target is a simulator, dg-add-options
+# might change its config 

RISCV test infrastructure for d / v / zfh extensions

2023-07-18 Thread Joern Rennecke
This makes it easier to write tests that safely test features needing
d, v and/or zfh extensions.

check_effective_target_riscv_v checks if the current target allows to
use vector instructions.
add_options_for_riscv_v ask to add an -arch option to change the
target to one like the current one, but with the 'v' extension
enabled, if it is not already is.  That is generally safe for
compile-only tests, e.g. using scan-assembler* stanzas.
If you have an execution test that you want to force usin the
extension if the actual execution
target supports that, you can use check_effective_target_riscv_v_ok to
check if that's ok, and then
add_options_for_riscv_v to add the appropriate -march option.

 Examples how this can be used can be found
athttps://github.com/embecosm/rvv-gcc/tree/rvv-12/gcc/testsuite
2023-04-17  Joern Rennecke  

gcc/testsuite/
* lib/target-supports.exp (check_effective_target_rv_float_abi_soft):
New proc.
(check_effective_target_riscv_d): Likewise.
(check_effective_target_riscv_v): Likewise.
(check_effective_target_riscv_zfh): Likewise.
(check_effective_target_riscv_v_ok): likewise.
(check_effective_target_riscv_zfh_ok): Likewise.
(riscv_get_arch, add_options_for_riscv_v): Likewise.
(add_options_for_riscv_zfh): Likewise.
(add_options_for_riscv_d): Likewise.

diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 8ea0d9feb1c..deeb0ef8865 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -1884,6 +1884,173 @@ proc check_effective_target_rv64 { } {
 }]
 }
 
+# Return 1 if the target abi is __riscv_float_abi_soft, 0 otherwise.
+# Cache the result.
+
+proc check_effective_target_rv_float_abi_soft { } {
+# Check that we are compiling for RV64 by checking the xlen size.
+return [check_no_compiler_messages riscv_riscv_float_abi_soft assembly {
+   #ifndef __riscv_float_abi_soft
+   #error "Not __riscv_float_abi_soft"
+   #endif
+}]
+}
+
+# Return 1 if the target arch supports the double precision floating point
+# extension, 0 otherwise.  Cache the result.
+
+proc check_effective_target_riscv_d { } {
+return [check_no_compiler_messages riscv_ext_d assembly {
+   #ifndef __riscv_d
+   #error "Not __riscv_d"
+   #endif
+}]
+}
+
+# Return 1 if the target arch supports the vector extension, 0 otherwise.
+# Cache the result.
+
+proc check_effective_target_riscv_v { } {
+return [check_no_compiler_messages riscv_ext_v assembly {
+   #ifndef __riscv_v
+   #error "Not __riscv_v"
+   #endif
+}]
+}
+
+# Return 1 if the target arch supports half float, 0 otherwise.
+# Note, this differs from the test performed by
+# /* dg-skip-if "" { *-*-* } { "*" } { "-march=rv*zfh*" } */
+# in that it takes default behaviour into account.
+# Cache the result.
+
+proc check_effective_target_riscv_zfh { } {
+return [check_no_compiler_messages riscv_ext_zfh assembly {
+   #ifndef __riscv_zfh
+   #error "Not __riscv_zfh"
+   #endif
+}]
+}
+
+# Return 1 if we can execute code when using dg-add-options riscv_v
+
+proc check_effective_target_riscv_v_ok { } {
+# If the target already supports v without any added options,
+# we may assume we can execute just fine.
+if { [check_effective_target_riscv_v] } {
+   return 1
+}
+
+# check if we can execute vector insns with the given hardware or
+# simulator
+set gcc_march [regsub {[[:alnum:]]*} [riscv_get_arch] ]
+if { [check_runtime ${gcc_march}_exec {
+ int main() {  asm("vsetivli t0, 9, e8, m1, tu, ma"); return 0; } } 
"-march=${gcc_march}"] } {
+   return 1
+}
+
+# Possible future extensions: If the target is a simulator, dg-add-options
+# might change its config to make it allow vector insns, or we might use
+# options to set special elf flags / sections to effect that.
+
+return 0
+}
+
+# Return 1 if we can execute code when using dg-add-options riscv_zfh
+
+proc check_effective_target_riscv_zfh_ok { } {
+# If the target already supports zfh without any added options,
+# we may assume we can execute just fine.
+# ??? Other cases we should consider: 
+# - target / simulator already supports zfh extension - test for that.
+# - target is a simulator, and dg-add-options knows how to enable zfh 
support in that simulator
+if { [check_effective_target_riscv_zfh] } {
+   return 1
+}
+
+# check if we can execute vector insns with the given hardware or
+# simulator
+set gcc_march [riscv_get_arch]
+if { [check_runtime ${gcc_march}_zfh_exec {
+ int main() {  asm("feq.h a3,fa5,fa4"); return 0; } } 
"-march=${gcc_march}_zfh"] } {
+   return 1
+}
+
+# Possible future extensions: If the target is a simu

cpymem for RISCV with v extension

2023-07-17 Thread Joern Rennecke
As discussed on last week's patch call, this patch uses either a
straight copy or an opaque pattern that emits the loop as assembly to
optimize cpymem for the 'v' extension.
I used Ju-Zhe Zhong's patch - starting in git with:

Author: zhongjuzhe <66454988+zhongju...@users.noreply.github.com>
Date:   Mon Mar 21 14:20:42 2022 +0800

  PR for RVV support using splitted small chunks (#334)

as a starting point, even though not all that much of the original code remains.

Regression tested on x86_64-pc-linux-gnu X
riscv-sim

riscv-sim/-march=rv32imafdcv_zicsr_zifencei_zfh_zba_zbb_zbc_zbs_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/-mabi=ilp32f

riscv-sim/-march=rv32imafdcv_zicsr_zifencei_zfh_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/-mabi=ilp32

riscv-sim/-march=rv32imafdcv_zicsr_zifencei_zfh_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/-mabi=ilp32f

riscv-sim/-march=rv32imfdcv_zicsr_zifencei_zfh_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/-mabi=ilp32

riscv-sim/-march=rv64imafdcv_zicsr_zifencei_zfh_zba_zbb_zbc_zbs_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/-mabi=lp64d

riscv-sim/-march=rv64imafdcv_zicsr_zifencei_zfh_zba_zbb_zbs_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/-mabi=lp64d

riscv-sim/-march=rv64imafdcv_zicsr_zifencei_zfh_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/-mabi=lp64d
2023-07-12  Ju-Zhe Zhong 
    Joern Rennecke  

* config/riscv/riscv-protos.h (riscv_vector::expand_block_move):
Declare.
* config/riscv/riscv-v.cc (riscv_vector::expand_block_move):
New function.
* config/riscv/riscv.md (cpymemsi): Use riscv_vector::expand_block_move.
* config/riscv/vector.md (@cpymem_straight):
New define_insn patterns.
(@cpymem_loop): Likewise.
(@cpymem_loop_fast): Likewise.

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 16fb8dabca0..40965a00681 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -301,6 +301,7 @@ bool slide1_sew64_helper (int, machine_mode, machine_mode,
  machine_mode, rtx *);
 rtx gen_avl_for_scalar_move (rtx);
 void expand_tuple_move (rtx *);
+bool expand_block_move (rtx, rtx, rtx);
 machine_mode preferred_simd_mode (scalar_mode);
 opt_machine_mode get_mask_mode (machine_mode);
 void expand_vec_series (rtx, rtx, rtx);
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index b4884a30872..e61110fa3ad 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -49,6 +49,7 @@
 #include "tm-constrs.h"
 #include "rtx-vector-builder.h"
 #include "targhooks.h"
+#include "predict.h"
 
 using namespace riscv_vector;
 
@@ -2164,6 +2165,191 @@ expand_tuple_move (rtx *ops)
 }
 }
 
+/* Used by cpymemsi in riscv.md .  */
+
+bool
+expand_block_move (rtx dest_in, rtx src_in, rtx length_in)
+{
+  /*
+memcpy:
+   mv a3, a0   # Copy destination
+loop:
+   vsetvli t0, a2, e8, m8, ta, ma  # Vectors of 8b
+   vle8.v v0, (a1) # Load bytes
+   add a1, a1, t0  # Bump pointer
+   sub a2, a2, t0  # Decrement count
+   vse8.v v0, (a3) # Store bytes
+   add a3, a3, t0  # Bump pointer
+   bnez a2, loop   # Any more?
+   ret # Return
+  */
+  if (!TARGET_VECTOR)
+return false;
+  HOST_WIDE_INT potential_ew
+= (MIN (MIN (MEM_ALIGN (src_in), MEM_ALIGN (dest_in)), BITS_PER_WORD)
+   / BITS_PER_UNIT);
+  machine_mode vmode = VOIDmode;
+  bool need_loop = true;
+  bool size_p = optimize_function_for_size_p (cfun);
+  rtx src, dst;
+  rtx end = gen_reg_rtx (Pmode);
+  rtx vec;
+  rtx length_rtx = length_in;
+
+  if (CONST_INT_P (length_in))
+{
+  HOST_WIDE_INT length = INTVAL (length_in);
+
+/* By using LMUL=8, we can copy as many bytes in one go as there
+   are bits in a vector register.  If the entire block thus fits,
+   we don't need a loop.  */
+if (length <= TARGET_MIN_VLEN)
+  {
+   need_loop = false;
+
+   /* If a single scalar load / store pair can do the job, leave it
+  to the scalar code to do that.  */
+
+   if (pow2p_hwi (length) && length <= potential_ew)
+ return false;
+  }
+
+  /* Find the vector mode to use.  Using the largest possible element
+size is likely to give smaller constants, and thus potentially
+reducing code size.  However, if we need a loop, we need to update
+the pointers, and that is more complicated with a larger element
+size, unless we use an immediate, which prevents us from dynamically
+using the largets transfer size that the hart supports.  And then,
+unless we know the 

Committed: Tighten regexps in gcc.target/riscv/_Float16-zhinx-1.c .

2023-07-17 Thread Joern Rennecke
Committed as obvious.
commit 6bab2772dbc42ce7a1b29b03ae84e6e434e23c4e
Author: Joern Rennecke 
Date:   Tue Jul 18 04:28:55 2023 +0100

Tighten regexps in gcc.target/riscv/_Float16-zhinx-1.c .

The original "mv" regexp would match
.ascii  "\254\254\375\002e2N6\013\231,\354NDmvVP0]\304\312F!biZ\025\211"
in the .gnu.lto_foo1.0.32528183c9deec41 section.

gcc/testsuite/
* gcc.target/riscv/_Float16-zhinx-1.c: Tighten regexps.

diff --git a/gcc/testsuite/gcc.target/riscv/_Float16-zhinx-1.c 
b/gcc/testsuite/gcc.target/riscv/_Float16-zhinx-1.c
index 90172b57e05..67826171bfb 100644
--- a/gcc/testsuite/gcc.target/riscv/_Float16-zhinx-1.c
+++ b/gcc/testsuite/gcc.target/riscv/_Float16-zhinx-1.c
@@ -6,5 +6,5 @@ _Float16 foo1 (_Float16 a, _Float16 b)
 return b;
 }
 
-/* { dg-final { scan-assembler-not "fmv.h" } } */
-/* { dg-final { scan-assembler-times "mv" 1 } } */
+/* { dg-final { scan-assembler-not {\mfmv\.h\M} } } */
+/* { dg-final { scan-assembler-times {\mmv\M} 1 } } */


Re: [v2] RISC-V: Remove masking third operand of rotate instructions

2023-05-18 Thread Joern Rennecke
On Thu, 18 May 2023 at 16:37, Joern Rennecke  wrote
in https://gcc.gnu.org/pipermail/gcc-patches/2023-May/618928.html :
>
> This breaks building libstdc++-v3 for
> -march=rv32imafdcv_zicsr_zifencei_zba_zbb_zbc_zbs_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b
> -mabi=ilp32f .

Sorry, I forgot the ChangeLog entry for my patch and missed the [v2]
part of the subject.

2023-05-18  Joern Rennecke  

gcc/ChangeLog:
* config/riscv/constraints.md (DsS, DsD): Restore agreement
with shiftm1 mode attribute.
diff --git a/gcc/config/riscv/constraints.md b/gcc/config/riscv/constraints.md
index c448e6b37e9..44525b2da49 100644
--- a/gcc/config/riscv/constraints.md
+++ b/gcc/config/riscv/constraints.md
@@ -65,13 +65,13 @@
   "@internal
31 immediate"
   (and (match_code "const_int")
-   (match_test "ival == 31")))
+   (match_test "(ival & 31) == 31")))
 
 (define_constraint "DsD"
   "@internal
63 immediate"
   (and (match_code "const_int")
-   (match_test "ival == 63")))
+   (match_test "(ival & 63) == 63")))
 
 (define_constraint "DbS"
   "@internal"


Re: [PATCH v4 02/34] RISC-V: Add vlex_2.c

2023-01-05 Thread Joern Rennecke
On Wed, Jun 1, 2022 at 02:28:45 GMT 2022, zhongjuzhe
 wrote:
> gcc/testsuite/ChangeLog:
>
>* gcc.target/riscv/rvv/intrinsic/vlex_2.c: New test.

These intrinsic test cases look like they have been machine generated.  And if
they aren't, they probably should (have) be(en).  I've been working on
stabilizing
a tree with the rvv patches merged, and found a number of tests had diverged
in intrinsic function naming, arguments taken, and/or return type.
Fixing this all
with global replaces in dozens of files is quite messy.  It would be
preferable if
such issues could be fixed by adjusting a generator file, and just re-generating
the generated files.  That's one of the reasons why the GPL makes a point of
asking to include source code.  Even if that is not strictly required
for the testsuite
for license reasons, it makes good sense to do that for maintenance reasons.
The generator file should then also add a note where in the source
tree to find the
generator file, and, where appropriate, notes which part(s) of the
generator file
is/are responsible for generating the test case.


Re: RFA: crc builtin functions & optimizations

2022-03-16 Thread Joern Rennecke
> and there needs to be code to actually expand the builtin using optabs.
> And something needs to be done to make match.pd work on the output.

Never mind that bit, that was just due to a function argument type mismatch
on the last argument of the crc built-in functions.


Re: RFA: crc builtin functions & optimizations

2022-03-16 Thread Joern Rennecke
On Wed, 16 Mar 2022 at 08:15, Richard Biener  wrote:

> The canonical place to transform loops into builtins would be loop 
> distribution.
> Another place would be final value replacement since you basically replace
> the reduction result with a call to the builtin, but I think
> loop-distribution is
> the better overall place.  See how we match strlen() there.
>
> Richard.

So I suppose that would be something along the line of the below patch?
Except it'd need a lot more checks to make sure it's actually processing a
CRC computation,
and there needs to be code to actually expand the builtin using optabs.
And something needs to be done to make match.pd work on the output.
I'm seeing crcu16 being unrolled by in the *.cunroll dump and the remains
of the loops deleted in the *.dse4 dump, but the patch.pd clause to simplify
the two __builtin_crc8s calls never matches, so we end up with this in
*.optimized:

 ee_u16 crcu16 (ee_u16 newval, ee_u16 crc)
{
  unsigned char _1;
  short unsigned int _2;
  unsigned char _3;
  short unsigned int _24;
  short unsigned int _26;

   [local count: 1073741824]:
  _1 = (unsigned char) newval_4(D);
  _24 = __builtin_crc8s (crc_6(D), _1, 40961);
  _2 = newval_4(D) >> 8;
  _3 = (unsigned char) _2;
  _26 = __builtin_crc8s (_24, _3, 40961); [tail call]
  return _26;

}
diff --git a/gcc/tree-loop-distribution.cc b/gcc/tree-loop-distribution.cc
index db6e9096a86..b74e8569b94 100644
--- a/gcc/tree-loop-distribution.cc
+++ b/gcc/tree-loop-distribution.cc
@@ -659,6 +659,9 @@ class loop_distribution
  replace them accordingly.  */
   bool transform_reduction_loop (loop_p loop);
 
+  /* Transform some loops which calculate a CRC.  */
+  bool transform_reduction_loop (loop_p loop, tree niters);
+
   /* Compute topological order for basic blocks.  Topological order is
  needed because data dependence is computed for data references in
  lexicographical order.  */
@@ -3432,6 +3435,26 @@ generate_strlen_builtin_using_rawmemchr (loop_p loop, 
tree reduction_var,
 start_len);
 }
 
+static void
+generate_crc_builtin (loop_p loop, tree reduction_var,
+ tree crc_in, tree data_in, tree xor_val,
+ location_t loc)
+{
+  gimple_seq seq = NULL;
+  tree reduction_var_new = make_ssa_name (TREE_TYPE (reduction_var));
+
+  crc_in = force_gimple_operand (crc_in, , true, NULL_TREE);
+  data_in = force_gimple_operand (data_in, , true, NULL_TREE);
+  tree fn = build_fold_addr_expr (builtin_decl_implicit (BUILT_IN_CRC8S));
+  gimple *fn_call = gimple_build_call (fn, 3, crc_in, data_in, xor_val);
+  gimple_call_set_lhs (fn_call, reduction_var_new);
+  gimple_set_location (fn_call, loc);
+  gimple_seq_add_stmt (, fn_call);
+
+  generate_reduction_builtin_1 (loop, seq, reduction_var, reduction_var_new,
+   "generated crc%s", E_QImode);
+}
+
 /* Return true if we can count at least as many characters by taking pointer
difference as we can count via reduction_var without an overflow.  Thus
compute 2^n < (2^(m-1) / s) where n = TYPE_PRECISION (reduction_var_type),
@@ -3713,6 +3736,128 @@ loop_distribution::transform_reduction_loop (loop_p 
loop)
   return false;
 }
 
+/* Match loops like:
+
+   [local count: 954449105]:
+  # data_16 = PHI 
+  # crc_22 = PHI 
+  # i_3 = PHI 
+  # ivtmp_7 = PHI 
+  _8 = (unsigned char) crc_22;
+  _1 = _8 ^ data_16;
+  x16_12 = _1 & 1;
+  data_13 = data_16 >> 1;
+  _19 = crc_22 >> 1;
+  if (x16_12 != 0)
+goto ; [50.00%]
+  else
+goto ; [50.00%]
+
+   [local count: 477224553]:
+  goto ; [100.00%]
+
+   [local count: 477224552]:
+  crc_17 = _19 ^ 40961;
+
+   [local count: 954449105]:
+  # crc_4 = PHI 
+  i_18 = i_3 + 1;
+  ivtmp_6 = ivtmp_7 - 1;
+  if (ivtmp_6 != 0)
+goto ; [88.89%]
+  else
+goto ; [11.11%]
+
+   [local count: 848409806]:
+  goto ; [100.00%] */
+
+bool
+loop_distribution::transform_reduction_loop (loop_p loop, tree niters)
+{
+  gimple *reduction_stmt;
+
+  if (!wi::eq_p (wi::to_widest (niters), 7))
+return false;
+
+  if (loop->num_nodes != 5)
+return false;
+
+  reduction_stmt = determine_reduction_stmt (loop);
+  if (reduction_stmt == NULL)
+return false;
+
+  /* Reduction variables are guaranteed to be SSA names.  */
+  tree reduction_var;
+  switch (gimple_code (reduction_stmt))
+{
+case GIMPLE_ASSIGN:
+case GIMPLE_PHI:
+  reduction_var = gimple_get_lhs (reduction_stmt);
+  break;
+default:
+  /* Bail out e.g. for GIMPLE_CALL.  */
+  return false;
+}
+
+  if (EDGE_COUNT (loop->header->preds) != 2)
+return false;
+
+  edge e, entry_edge = NULL, backedge = NULL;
+  edge_iterator ei;
+
+  FOR_EACH_EDGE (e, ei, loop->header->preds)
+if (e->src->loop_father != loop)
+  entry_edge = e;
+else
+  backedge = e;
+
+  if (!entry_edge || !backedge)
+return false;
+
+  tree crc_in = NULL_TREE, data_in = NULL_TREE;
+
+  for (gphi_iterator gsi = gsi_start_phis (loop->header); 

Re: RFA: crc builtin functions & optimizations

2022-03-15 Thread Joern Rennecke
On 15/03/2022, Richard Biener  wrote:

> Why's this a new pass?  Every walk over all insns costs time.  The pass
> lacks any comments as to what CFG / stmt structure is matched.  From
> a quick look it seems like it first(?) statically matches a stmt sequence
> without considering intermediate stmts, so matching should be quite
> fragile.  Why not match (sub-)expressions with the help of match.pd?

Thinking about this a bit more, I suppose I could change the match.pd
framework to allow to set a bit or add a list element for a basic block where
an expression match is found.  That wouldn't make it any simpler - on the
contrary, much more complicated, since there need to be another check
for the same expression that makes sure all the inputs and outputs line up
with the other basic blocks constituting the loop - but it could avoid scanning
functions that don't have anything that looks like a match in a separate pass.

The proper check and actual transformation would still have to be in its own
pass, but that could return immediately if no expression match for a starting
block was found.
It'd have to be early enough, though, to happen before all inlining
and unrolling,
since both operations would hinder recognition, and we also want them applied
to outer loops / inlining functions after the transformation of the
crc computing
loop into a built-in function.
I suppose if no gimple pass is early enough, we could resort to use a
generic match.


Re: RFA: crc builtin functions & optimizations

2022-03-15 Thread Joern Rennecke
On 15/03/2022, Richard Biener  wrote:

> Why's this a new pass?  Every walk over all insns costs time.

If should typically scan considerably less than all the insns.

>  The pass
> lacks any comments as to what CFG / stmt structure is matched.

I've put a file in:
config/riscv/tree-crc-doc.txt

would this text be suitabe to put in a comment block in tree-crc.cc ?

>  From
> a quick look it seems like it first(?) statically matches a stmt sequence
> without considering intermediate stmts, so matching should be quite
> fragile.

It might be fragile inasmuch as it won't match when things change, but
the matching has remained effective for seven years and across two
architecture families with varying word sizes.
And with regards to matching only what it's supposed to match, I believe
I have checked all the data dependencies and phis so that it's definitely
calculating a CRC.

>  Why not match (sub-)expressions with the help of match.pd?

Can you match a loop with match.pd ?

> Any reason why you match CRC before early inlinig and thus even when
> not optimizing?  Matching at least after early FRE/DCE/DSE would help
> to get rid of abstraction and/or memory temporary uses.

I haven't originally placed it there, but I believe benefits include:
- Getting rid of loop without having to actively deleting it in the
crc pass (this also
  might be safer as we just have to make sure we're are computing the CRC, and
  DCE will determine if there is any ancillary result that is left,
and only delete the
  loop if it's really dead.
- The optimized function is available for inlining.


semi-finished patch: dead zero/sign extension elimination

2022-03-15 Thread Joern Rennecke
This misses some documentation and testing, but it appears to work
well with 64 bit RISC-V.

-fext-dce is best used with aggressive unrolling and/or inlining.  It deletes
zero/sign extensiions where the part of the register that the
zero/sign extension
pertains to is dead.

This is not about multi-word registers (although there might be some
overlap on targets
with somewhat narrow words), but mainly about parts of a register within a word.
So, using BITS_LITTLE_ENDIAN nomenclature,  we consider liveness of the lowest
8 bits, i.e. 0..7, the next more significant 8 bits, i.e. bits 8..15,
then bits 16..31, and
finally bits 32..BITS_PER_WORD-1 .

-fext-dce-pre works better for less aggressive optimization, like a
plain -O3.  It inserts
extensions for return values on edges leading to predecessors of the exit block
where a highpart might be live, before performing the same dead
extension elimination
as -fext-dce .
diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 31ff95500c9..6e7ad5ff966 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1374,6 +1374,7 @@ OBJS = \
explow.o \
expmed.o \
expr.o \
+   ext-dce.o \
fibonacci_heap.o \
file-prefix-map.o \
final.o \
diff --git a/gcc/common.opt b/gcc/common.opt
index 8b6513de47c..80833bea285 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -3607,4 +3607,12 @@ fipa-ra
 Common Var(flag_ipa_ra) Optimization
 Use caller save register across calls if possible.
 
+fext-dce
+Common Var(flag_ext_dce, 1) Optimization Init(0)
+Perform dead code elimination on zero and sign extensions with special 
dataflow analysis.
+
+fext-dce-pre
+Common Var(flag_ext_dce, 2)
+Perform dead code elimination on zero and sign extensions with special 
dataflow analysis.  Insert extensions on edges for partial redundancy 
elimination.
+
 ; This comment is to ensure we retain the blank line above.
diff --git a/gcc/df-scan.cc b/gcc/df-scan.cc
index 9b2375d561b..59b0a82dcc9 100644
--- a/gcc/df-scan.cc
+++ b/gcc/df-scan.cc
@@ -78,7 +78,6 @@ static void df_get_eh_block_artificial_uses (bitmap);
 
 static void df_record_entry_block_defs (bitmap);
 static void df_record_exit_block_uses (bitmap);
-static void df_get_exit_block_use_set (bitmap);
 static void df_get_entry_block_def_set (bitmap);
 static void df_grow_ref_info (struct df_ref_info *, unsigned int);
 static void df_ref_chain_delete_du_chain (df_ref);
@@ -3638,7 +3637,7 @@ df_epilogue_uses_p (unsigned int regno)
 
 /* Set the bit for regs that are considered being used at the exit. */
 
-static void
+void
 df_get_exit_block_use_set (bitmap exit_block_uses)
 {
   unsigned int i;
diff --git a/gcc/df.h b/gcc/df.h
index bd329205d08..9807a3e87f9 100644
--- a/gcc/df.h
+++ b/gcc/df.h
@@ -1090,6 +1090,7 @@ extern bool df_epilogue_uses_p (unsigned int);
 extern void df_set_regs_ever_live (unsigned int, bool);
 extern void df_compute_regs_ever_live (bool);
 extern void df_scan_verify (void);
+extern void df_get_exit_block_use_set (bitmap);
 
 
 /*
diff --git a/gcc/ext-dce.cc b/gcc/ext-dce.cc
new file mode 100644
index 000..9d264972c7f
--- /dev/null
+++ b/gcc/ext-dce.cc
@@ -0,0 +1,545 @@
+/* RTL dead zero/sign extension (code) elimination.
+   Copyright (C) 2000-2022 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "rtl.h"
+#include "tree.h"
+#include "memmodel.h"
+#include "insn-config.h"
+#include "emit-rtl.h"
+#include "recog.h"
+#include "cfganal.h"
+#include "tree-pass.h"
+#include "cfgrtl.h"
+#include "rtl-iter.h"
+#include "df.h"
+
+/* We consider four bit groups for liveness:
+   bit 0..7   (least significant byte)
+   bit 8..15  (second least significant byte)
+   bit 16..31
+   bit 32..BITS_PER_WORD-1  */
+
+bitmap
+ext_dce_process_bb (basic_block bb, bitmap livenow, bool modify)
+{
+  rtx_insn *insn;
+
+  FOR_BB_INSNS_REVERSE (bb, insn)
+{
+  subrtx_iterator::array_type array;
+
+  if (!INSN_P (insn))
+   continue;
+
+  bitmap live_tmp = BITMAP_ALLOC (NULL);
+  int seen_fusage = 0;
+
+  /* First, process the sets.  */
+  for (rtx pat = PATTERN (insn);;)
+   {
+ FOR_EACH_SUBRTX (iter, array, pat, NONCONST)
+   {
+ const_rtx x = 

Re: RFA: crc builtin functions & optimizations

2022-03-15 Thread Joern Rennecke
On Tue, 15 Mar 2022 at 02:17, Oleg Endo  wrote:
> > In my own CRC library I've got ~30 'commonly used' CRC types, based on
> the following generic definition:
> > This being a library makes it relatively easy to tune and customize for
> various systems.

...

> How would that work together with your proposal?

With optabs, you can put in whatever you like into the
machine-specific expansion.

Or if we could put your library-using code into a default expansion that is used
if there's no optab expansion for the modes given, then the target can override
this for machine-specific methods using the optabs, and otherwise use
your library
method in the default expansion.


Fwd: RFA: crc builtin functions & optimizations

2022-03-15 Thread Joern Rennecke
Oops, that was meant to go to the list too.


On Tue, 15 Mar 2022 at 01:04, Andrew Pinski  wrote:
>
> On Mon, Mar 14, 2022 at 5:33 PM Joern Rennecke
>  wrote:
> >
> > Most microprocessors have efficient ways to perform CRC operations, be
> > that with lookup tables, rotates, or even special instructions.
> > However, because we lack a representation for CRC in the compiler, we
> > can't do proper instruction selection.  With this patch I seek out to
> > rectify this,
> > I've avoided using a mode name for the built-in functions because that
> > would tie the semantics to the size of the addressable unit.  We
> > generally use abbreviations like s/l/ll for type names, which is all
> > right when the type can be widened without changing semantics.  For
> > the data input, however, we also have to consider the shift count that
> > is tied to it.  That is why I used a number to designate the width of
> > the data input and shift.
> >
> > For machine support, I made a start with 8 and 16 bit little-endian
> > CRC for RISCV using a
> > lookup table.  I am sure once we have the basic infrastructure in the
> > tree, we'll get more
> > contributions of suitable named patterns for various ports.
>
>
> A few points.
> There are at least 9 different polynomials for the CRC-8 in common use today.
> For CRC-32 there are 5 different polynomials used.
> You don't have a patch to invoke.texi adding the descriptions of the builtins.

You are correct that the documentation could use some work, but that part
would go into extend.texi .

> How is your polynom 3rd argument described? Is it similar to how it is
> done on the wiki for the CRC?

It's a constant integer.
I haven't found a CRC in https://gcc.gnu.org/wiki .
If you mean wikipedia.org, they focus mainly on big endian CRC.  I've added
a function code IFN_CRC_BE for it because for completeness it should be
there, but haven't fleshed out anything further around that.  IFN_CRC and
its associated built-in functions are little-endian.  If you look at the start
of the confg/riscv/crc.md patch, there is a comment with a simple C
implementation of crchihi4.

> Does it make sense to have to list the most common polynomials in the
> documentation?

Maybe.  You could give advice on what makes cryptographic sense for
people who want to use CRCs in their code for integrity checks.
Or once some ports with special-purpose instructons are supported,
there could be comments on which polynoms will result in faster operation
because of the specialized expansion for the respective targets.

> Also I am sorry but micro-optimizing coremarks is just wrong.

The claim for that benchmark is that it tests a set of common
operations, including CRC calculations.  Without compiler support,
what we test instead is how well this particular implementation of CRC is
compiled for the target CPU, which can be very different from the actual
CRC computation performance.  So recognizing the CRC computation
helps the benchmark archive the stated goal of gauging CRC computation
performance.

Moreover, since the benchmark is commonly used, this also makes
it a commonly used idiom, and the license allows to copy the code
into your own programs to a large extent.

> Maybe it
> is better to pick the CRC32 that is inside zip instead for a testcase
> and benchmarking against?
> Or even the CRC32C for iSCSI/ext4.

I'm not sure what's inside there, but in principle, the more the merrier.
I had a look at the bzip2 CRC computation, but that's just a table
lookup.  We can recognize table lookups that compute a CRC if the
array is a constant, but there is no point if you haven't either a faster
implementation or want further optimization to be enabled.  Going
there was beyond the scope of my work at this time.

In principle, it would be interesting to do reduction / vectorization of
block CRC computations.  But you have to start with having a
representation for the CRC computations first.

> I see you also don't optimize the case where you have three other
> variants of polynomials that are reversed, reciprocal and reversed
> reciocal.

Do you want to contribute that?

> Also a huge problem, you don't check to make sure the third argument
> to the crc builtin function is constant in the rsicv backend.
Why is that a huge problem?  I see it as a further refinement not yet
added.  Strictly speaking, there is a check, but it's an assert, OTOH
it shouldn't be triggered with the infrstructure as it is now because the
optimizer only looks for a computation with a constant polynom, and
the third argument of the builtin crc functions is BT_CONST_SIZE for
now.  Variable polynoms are interesting, but before we introduce them,
we must make sure that constants remain inside the builtin function,
lest we get severe perfromance degradation if table loo

RFA: crc builtin functions & optimizations

2022-03-14 Thread Joern Rennecke
Most microprocessors have efficient ways to perform CRC operations, be
that with lookup tables, rotates, or even special instructions.
However, because we lack a representation for CRC in the compiler, we
can't do proper instruction selection.  With this patch I seek out to
rectify this,
I've avoided using a mode name for the built-in functions because that
would tie the semantics to the size of the addressable unit.  We
generally use abbreviations like s/l/ll for type names, which is all
right when the type can be widened without changing semantics.  For
the data input, however, we also have to consider the shift count that
is tied to it.  That is why I used a number to designate the width of
the data input and shift.

For machine support, I made a start with 8 and 16 bit little-endian
CRC for RISCV using a
lookup table.  I am sure once we have the basic infrastructure in the
tree, we'll get more
contributions of suitable named patterns for various ports.

bootstrapped on x86_64-pc-linux-gnu .
2022-03-14  Jon Beniston  
Joern Rennecke  

* Makefile.in (OBJS): Add tree-crc.o .
* builtin-types.def (BT_FN_UINT16_UINT16_UINT8_CONST_SIZE): Define.
(BT_FN_UINT16_UINT16_UINT16_CONST_SIZE): Likewise.
(BT_FN_UINT16_UINT16_UINT32_CONST_SIZE): Likewise.
* builtins.cc (associated_internal_fn):
Handle BUILT_IN_CRC8S, BUILT_IN_CRC16S, BUILT_IN_CRC32S.
* builtins.def (BUILT_IN_CRC8S, BUILT_IN_CRC16S, BUILT_IN_CRC32S):
New builtin functions.
* cgraph.cc (cgraph_node::verify_node):
Allow const calls without a callgraph edge.
* common.opt (fcrc): New option.
* doc/invoke.texi (-fcrc): Document.
* gimple-match-head.cc: #include predict.h .
* internal-fn.cc (crc_direct): Define.
(expand_crc_optab_fn): New function.
(direct_crc_optab_supported_p): Define.
* internal-fn.def (CRC, CRC_BE): New internal optab functions.
* match.pd: Match a pair of crc operations.
* optabs.def (crc_optab, crc_be_optab): New optabs.
* passes.def (pass_crc): Add new pass.
* tree-crc.cc: New file.
* tree-pass.h (make_pass_crc): Declare.

testsuite:
* gcc.c-torture/compile/crc.c: New test.
* gcc.dg/tree-ssa/crc.c: Likewise.
* gcc.dg/tree-ssa/crc-2.c: likewise.
* gcc.dg/tree-ssa/pr59597.c: Add flag -fno-crc .

config/riscv:
* crc.md: New file.
* riscv-protos.h (expand_crc_lookup, print_crc_table): Declare.
* riscv.cc (compute_crc): New function.
(print_crc_table, expand_crc_lookup): Likewise.
* riscv.md: Include crc.md.
* riscv.opt (msmall-memory): New option.
* tree-crc-doc.txt: New file.

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 31ff95500c9..a901925511b 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1612,6 +1612,7 @@ OBJS = \
tree-cfgcleanup.o \
tree-chrec.o \
tree-complex.o \
+   tree-crc.o \
tree-data-ref.o \
tree-dfa.o \
tree-diagnostic.o \
diff --git a/gcc/builtin-types.def b/gcc/builtin-types.def
index 3a7cecdf087..aa7751a6d5a 100644
--- a/gcc/builtin-types.def
+++ b/gcc/builtin-types.def
@@ -872,3 +872,9 @@ DEF_FUNCTION_TYPE_2 (BT_FN_VOID_VPTR_LDOUBLE, BT_VOID,
 BT_VOLATILE_PTR, BT_LONGDOUBLE)
 DEF_FUNCTION_TYPE_2 (BT_FN_VOID_VPTR_SIZE, BT_VOID,
 BT_VOLATILE_PTR, BT_SIZE)
+DEF_FUNCTION_TYPE_3 (BT_FN_UINT16_UINT16_UINT8_CONST_SIZE, BT_UINT16,
+BT_UINT16, BT_UINT8, BT_CONST_SIZE)
+DEF_FUNCTION_TYPE_3 (BT_FN_UINT16_UINT16_UINT16_CONST_SIZE, BT_UINT16,
+BT_UINT16, BT_UINT16, BT_CONST_SIZE)
+DEF_FUNCTION_TYPE_3 (BT_FN_UINT16_UINT16_UINT32_CONST_SIZE, BT_UINT16,
+BT_UINT16, BT_UINT32, BT_CONST_SIZE)
diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index 4c6c2939053..37c28c930ac 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -2175,6 +2175,9 @@ associated_internal_fn (built_in_function fn, tree 
return_type)
return IFN_LDEXP;
   return IFN_LAST;
 
+case BUILT_IN_CRC8S: case BUILT_IN_CRC16S: case BUILT_IN_CRC32S:
+  return IFN_CRC;
+
 default:
   return IFN_LAST;
 }
diff --git a/gcc/builtins.def b/gcc/builtins.def
index 005976f34e9..24aaca34406 100644
--- a/gcc/builtins.def
+++ b/gcc/builtins.def
@@ -850,6 +850,9 @@ DEF_GCC_BUILTIN(BUILT_IN_CLRSB, "clrsb", 
BT_FN_INT_INT, ATTR_CONST_NOTHR
 DEF_GCC_BUILTIN(BUILT_IN_CLRSBIMAX, "clrsbimax", BT_FN_INT_INTMAX, 
ATTR_CONST_NOTHROW_LEAF_LIST)
 DEF_GCC_BUILTIN(BUILT_IN_CLRSBL, "clrsbl", BT_FN_INT_LONG, 
ATTR_CONST_NOTHROW_LEAF_LIST)
 DEF_GCC_BUILTIN(BUILT_IN_CLRSBLL, "clrsbll", BT_FN_INT_LONGLONG, 
ATTR_CONST_NOTHROW_LEAF_LIST)
+DEF_GCC_BUILTIN(BUILT_IN_CRC8S, "crc8s", 
BT_FN_UINT16_UINT16_UINT8_CONST_SIZE, ATTR_CONST_NOTHROW_LEAF_LIST)

Call for testers: shrink wrapping without a prologue

2022-03-14 Thread Joern Rennecke
I noticed that when there are registers to save (that can vary with
ABI), shrink-wrapping would
arrange for a more expeditious early return than when there were no
registers to save,
but still some dull argument copies to make for the main function,
even if they are not
needed for the early return path.  Most of the logic to do
shrink-wrapping also in the absence
of register saves is already there, and the generated code indeed
looks better when this
is thus used.  However, I couldn't find a difference in the execution
time of the benchmarks
I was looking at, presumably because the function didn't actually
return early (doing
things with an array of N elements where N might be zero... but it
isn't for the actual data).

Does someone have a benchmark / computing load where the early return
is beneficial?  Or conversely, harmful?
2022-03-14  Joern Rennecke  

* common.opt (fearly-return): New option.
* shrink-wrap.cc (try_early_return): New function.
(try_shrink_wrapping): Call try_early_return.

diff --git a/gcc/common.opt b/gcc/common.opt
index 8b6513de47c..901287fcad6 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -3607,4 +3607,8 @@ fipa-ra
 Common Var(flag_ipa_ra) Optimization
 Use caller save register across calls if possible.
 
+fearly-return
+Common Var(flag_early_return) Optimization Init(1)
+Extend shrink-wrapping to prologue-free functions.
+
 ; This comment is to ensure we retain the blank line above.
diff --git a/gcc/shrink-wrap.cc b/gcc/shrink-wrap.cc
index 30166bd20eb..31ab0ecff10 100644
--- a/gcc/shrink-wrap.cc
+++ b/gcc/shrink-wrap.cc
@@ -586,6 +586,42 @@ handle_simple_exit (edge e)
 INSN_UID (ret), e->src->index);
 }
 
+/* Even if there is no prologue, we might have a number of argument
+   copy and initialization statements in the first basic block that
+   might be unnecessary if we return early.  */
+/* ??? This might be overly agressive for super-scalar processors without
+   speculative execution in that we migth want to keep enough instructions
+   in front of the branch to fill all issue slots.
+
+   If the branch depends on a register copied from another register
+   immediately before, later passes already take care of propagating the
+   copy into the branch.  */
+void
+try_early_return (edge *entry_edge)
+{
+  basic_block entry = (*entry_edge)->dest;
+  if (EDGE_COUNT (entry->succs) != 2 || !single_pred_p (entry))
+return;
+  edge e;
+  edge_iterator ei;
+  const int max_depth = 20;
+
+  FOR_EACH_EDGE (e, ei, entry->succs)
+{
+  basic_block dst = e->dest;
+  for (int i = max_depth; --i; dst = single_succ (dst))
+   {
+ if (dst == EXIT_BLOCK_PTR_FOR_FN (cfun))
+   {
+ prepare_shrink_wrap (entry);
+ return;
+   }
+ if (!single_succ_p (dst))
+   break;
+   }
+}
+}
+
 /* Try to perform a kind of shrink-wrapping, making sure the
prologue/epilogue is emitted only around those parts of the
function that require it.
@@ -666,7 +702,11 @@ try_shrink_wrapping (edge *entry_edge, rtx_insn 
*prologue_seq)
break;
   }
   if (empty_prologue)
-return;
+{
+  if (flag_early_return)
+   try_early_return (entry_edge);
+  return;
+}
 
   /* Move some code down to expose more shrink-wrapping opportunities.  */
 


RFA: avoid infinite lra loop for constant addresses

2021-05-18 Thread Joern Rennecke
I find that when compiling some files, lra goes into an infinite loop
reloading constant
addresses.  This patch allows them to just be recognized as matching addresses
immediately, which also saves a bit of space for a few other files.

Bootstrapped and regression tested on x86_64-pc-linux-gnu.
gcc/
* lra-constraints.c: New arguments mem_mode and as.  Changed caller.
If equivalence search has yielded a constant that is valid as an
address, use it.

diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
index a766f1fd7e8..8451da58164 100644
--- a/gcc/lra-constraints.c
+++ b/gcc/lra-constraints.c
@@ -1454,10 +1454,14 @@ static int curr_swapped;
the RTL was changed.
 
if CHECK_ONLY_P is true, check that the *LOC is a correct address
-   register.  Return false if the address register is correct.  */
+   register.  Return false if the address register is correct.
+
+   if MEM_MODE is not VOIDmode, then *LOC is the entire address for a
+   memory access of MODE in address space AS, and *LOC may be replaced
+   with a constant if that is a valid address.  */
 static bool
 process_addr_reg (rtx *loc, bool check_only_p, rtx_insn **before, rtx_insn 
**after,
- enum reg_class cl)
+ enum reg_class cl, machine_mode mem_mode, addr_space_t as)
 {
   int regno;
   enum reg_class rclass, new_class;
@@ -1502,6 +1506,13 @@ process_addr_reg (rtx *loc, bool check_only_p, rtx_insn 
**before, rtx_insn **aft
   if (! check_only_p
  && (*loc = get_equiv_with_elimination (reg, curr_insn)) != reg)
{
+ /* If the elimination has yielded a constant that is fine as an
+address, don't try to reload that.  */
+ if (CONSTANT_P (*loc) && mem_mode != VOIDmode
+ && strict_memory_address_addr_space_p
+  (mem_mode, *loc, as))
+   return true;
+
  if (lra_dump_file != NULL)
{
  fprintf (lra_dump_file,
@@ -3523,7 +3534,12 @@ process_address_1 (int nop, bool check_only_p,
 REGNO (*ad.base_term)) != NULL_RTX)
? after : NULL),
   base_reg_class (ad.mode, ad.as, ad.base_outer_code,
-  get_index_code ()
+  get_index_code ()),
+  ((MEM_P (mem) &&  (mem, 0) == ad.base_term)
+   || (SUBREG_P (op) && MEM_P (SUBREG_REG (op))
+   &&  (SUBREG_REG (op), 0) == ad.base_term)
+   ? ad.mode : VOIDmode),
+  ad.as)))
 {
   change_p = true;
   if (ad.base_term2 != NULL)
@@ -3531,7 +3547,7 @@ process_address_1 (int nop, bool check_only_p,
 }
   if (ad.index_term != NULL
   && process_addr_reg (ad.index_term, check_only_p,
-  before, NULL, INDEX_REG_CLASS))
+  before, NULL, INDEX_REG_CLASS, VOIDmode, ad.as))
 change_p = true;
 
   /* Target hooks sometimes don't treat extra-constraint addresses as


Re: RFA: Add option -fretry-compilation

2021-05-17 Thread Joern Rennecke
On Mon, 17 May 2021 at 11:59, Richard Biener  wrote:

> The plan for reload is to axe it similar to CC0 support.  Sooner than later, 
> but
> give it's still used exclusively by a lot of target means it might
> take some time.

> So for you it's always just -fretry-compilation -m[no-]lra?  Given -m[no-]lra
> is a thing cycling between the two directly in RA lra/reload should be 
> possible?

Even if that were possible, it wouldn't solve the problem.  When I try compiling
newlib without -fretry-compilation, it's falling over first for
libc/time/strftime.c .
With lra, lra finishes, but it ignores an earlyclobber constraint, so
reload_cse_simplify_operands ICEs.  With reload, you get a spill failure.
I've tried various options, but only -O0 seems to work.  Compiling strftime with
-O0 is not really an issue because the target is too deeply embedded to hope
to link something that uses strftime.  But identifyig all the files
that can't be
compiled with optimization and treating them differently is a problem if it has
to be done by hand.

> Or are reload/LRA too greedy in that they ICE when having transformed half
> of the code already?

Both of them do a lot of transformations before they ICE.  Or they don't even
ICE themselves, but leave behind invalid rtl that a later pass catches.

Even if we fixed both passes so that they could roll back everything
(which I think would be a lot harder for lra; reload can already roll
back a lot),
what's the point if you axe reload soon after?

> I see.  It's of course difficult for the FSF tree to cater for
> extremes that are not
> represented in its tree.  I wonder what prevents you from contributing the 
> port?

I can neither confirm nor deny that I can't contribute the port.

> Still if that solves a lot of the issues this seems like the way to go.

It has merit in it's own right, but it can't fix all the ICEs, and thus doesn't
make building libraries manageable.


Re: RFA: Add option -fretry-compilation

2021-05-17 Thread Joern Rennecke
On Mon, 17 May 2021 at 08:36, Richard Biener  wrote:
>
> On Sun, May 16, 2021 at 8:53 PM Joern Rennecke
>  wrote:
> >
> > For architectures with likely spilled register classes, neither
> > register allocator is guaranteed
> > to succeed when using optimization.  If you have just a few files to
> > compile, you can try
> > by hand which compiler options will succeed and still give reasonable
> > code, but for large projects,
> > hand-tweaking library / program build rules on a file-by-file basis is
> > time intensive and does not
> > scale well across different build environments and compiler versions.
> >
> > The attached patch adds a new option -fretry-compilation that allows
> > you to specify a list - or
> > lists - of options to use for a compilation retry, which is
> > implemented in the compiler driver.
> >
> > Bootstrapped on x86_64-pc-linux-gnu.
>
> Eh, no ;)  But funny idea, nevertheless.

Why no?

lra just throws a ton of transformations at the code with no theoretical
concept that I can discern that it should - modulo bugs - succeed for
all well-formed code.  It works well most of the time so I'd like to use it as
a default, but how are you supposed to compile libgcc and newlib with
a register allocator that only works most of the time?

reload is more robust in the basic design, but it's so complex that it's
rather time-consuming to debug.  The failures I had left with reload
were not spill-failures per se, but code that was considered mal-formed by
the postreload passes and it's hard to decide which one was actually wrong.
And if I debug the failures seeen with realod, will this do any good in the
long run, or will it just be changed beyond all recognition (with works for
the top five most popular processor architectures but not quite for anything
else) or plain ripped out a few years down the line?

I had a proof-of-concept for the option in the target code first, but that used
fork(2) and thus left non-POSIX hosts (even if they have a pretend POSIX
subsystem) high and dry.  The logical place to implement the option to
make it portable is in the compiler driver.
I've called the option originally -mretry-regalloc / -fretry-regalloc, but when
I got around to write the invoke.texi patch, I realized that the option can be
used more generally to work around glitches, so it's more apt to name it
-fretry-compilation .

> Do you run into the issues
> with the first scheduling pass disabled?

The target doesn't have anything that needs scheduling, and hence no scheduling
description.  But it also has more severe register pressures for
memory access than
ports in the FSF tree.

The bane of lra are memory-memory moves.  Instead of using an intermediate
register, it starts by reloading the well-formed addresses and thus jacking up
the base register pressure.

I had a patch for that, but I found it needs a bit more work.


Re: RFA: Improve message for wrong number of alternatives

2021-05-17 Thread Joern Rennecke
On Sun, 16 May 2021 at 22:01, Martin Sebor  wrote:
 > I think it's very helpful to provide this sort of detail.  Just as
> a matter of readability, the new error message
>
>"wrong number of alternatives in operand %d, %d, expected %d"
>
> would be improved by avoiding the two consecutive %d's,

We could also do that by phrasing it:

"wrong number of alternatives in operand %d, seen: %d, expected: %d"

so that the change is just about adding extra information.

> e.g., by
> rephrasing it like so:
>
>"%d alternatives provided to operand %d where %d are expected"

This has an additional change in that we no longer jump to the conclusion
that the operand where we notice the discrepancy is the point that's wrong.
I suppose that conclusion is more often right than wrong (assuming more than
two operands on average for patterns that have alternatives and at least two
operands), but when it's wrong, it's particularly confusing and/or jarring,
so it's an improvement to just stick to the known facts.
But if we go that way, I suppose we should spell also out where the
expectation comes from: we have a loop over the operands, and we look at
operand 0 first.  We could do that by using the diagnostic:

  error_at (d->loc,
"alternative number mismatch: operand %d has
%d, operand %d had %d",
start, d->operand[start].n_alternatives, 0, n);


I notice in passing here that printf is actually awkward for repharasings
and hence also for translations, because we can't interchange the order of
the data in the message string.

But for multi-alternative patterns, we also have the awkwardness of
repeating the abstract of the error message and the recap of the number
of alternatives of operand 0.

So I propose the attached patch now.

Bootstrapped on x86_64-pc-linux-gnu.
2021-05-17  Joern Rennecke  

Make "wrong number of alternatives" message more specific, and
remove assumption on where the problem is.

diff --git a/gcc/genoutput.c b/gcc/genoutput.c
index 8e911cce2f5..6313b722cf7 100644
--- a/gcc/genoutput.c
+++ b/gcc/genoutput.c
@@ -757,6 +757,7 @@ validate_insn_alternatives (class data *d)
int which_alternative = 0;
int alternative_count_unsure = 0;
bool seen_write = false;
+   bool alt_mismatch = false;
 
for (p = d->operand[start].constraint; (c = *p); p += len)
  {
@@ -813,8 +814,19 @@ validate_insn_alternatives (class data *d)
if (n == 0)
  n = d->operand[start].n_alternatives;
else if (n != d->operand[start].n_alternatives)
- error_at (d->loc, "wrong number of alternatives in operand %d",
-   start);
+ {
+   if (!alt_mismatch)
+ {
+   alt_mismatch = true;
+   error_at (d->loc,
+ "alternative number mismatch: "
+ "operand %d had %d, operand %d has %d",
+ 0, n, start, d->operand[start].n_alternatives);
+ }
+   else
+ error_at (d->loc, "operand %d has %d alternatives",
+   start, d->operand[start].n_alternatives);
+ }
  }
   }
 


RFA: Don't squash target character arrays into a narrower host string

2021-05-16 Thread Joern Rennecke
braced_list_to_string creates a host string, so it's not suitable when
e.g. the host
has 8 bit chars, but the target has 16 bit chars.

The attached patch checks if  host and target char sizes are different
and in that case
falls back to leaving the array as an array.

Bootstrapped on x86_64-pc-linux-gnu.

FWIW, we also have patches for cpplib / lexer / parser char and string
handling to make 8 -> 16 bit char cross-compiling work, but they can't
be ported forward easily because the parser has changed since gcc9.
2021-04-16  Joern Rennecke  

* c-family/c-common.c (braced_lists_to_strings): Don't call
braced_list_to_string if host and target character sizes don't match.

diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 7bd799d1825..5e87d8ba4df 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -9169,7 +9169,7 @@ braced_lists_to_strings (tree type, tree ctor, bool 
member)
 return ctor;
 
   if ((TREE_CODE (ttp) == ARRAY_TYPE || TREE_CODE (ttp) == INTEGER_TYPE)
-  && TYPE_STRING_FLAG (ttp))
+  && TYPE_STRING_FLAG (ttp) && TYPE_PRECISION (char_type_node) == CHAR_BIT)
 return braced_list_to_string (type, ctor, member);
 
   code = TREE_CODE (ttp);


RFA: Add option -fretry-compilation

2021-05-16 Thread Joern Rennecke
For architectures with likely spilled register classes, neither
register allocator is guaranteed
to succeed when using optimization.  If you have just a few files to
compile, you can try
by hand which compiler options will succeed and still give reasonable
code, but for large projects,
hand-tweaking library / program build rules on a file-by-file basis is
time intensive and does not
scale well across different build environments and compiler versions.

The attached patch adds a new option -fretry-compilation that allows
you to specify a list - or
lists - of options to use for a compilation retry, which is
implemented in the compiler driver.

Bootstrapped on x86_64-pc-linux-gnu.
2021-05-16  Joern Rennecke  

* common.opt: New option -fretry-compilation=.
* gcc.c (execute): Implement -fretry-compilation.
* doc/invoke.texi: Document -fretry-compilation.

diff --git a/gcc/common.opt b/gcc/common.opt
index a75b44ee47e..d4db372572f 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1446,6 +1446,10 @@ Common Driver Var(flag_report_bug)
 Collect and dump debug information into temporary file if ICE in C/C++
 compiler occurred.
 
+fretry-compilation=
+Common Driver RejectNegative Joined Var(retry_compilation_str)
+If the compiler fails, retry with named options appeded.  Separate multiple 
options with ',', and multiple alternatives with ':' .
+
 fdump-internal-locations
 Common Var(flag_dump_locations) Init(0)
 Dump detailed information on GCC's internal representation of source code 
locations.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 519881509a6..8f94fd1aa42 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -541,6 +541,7 @@ Objective-C and Objective-C++ Dialects}.
 -freorder-blocks-algorithm=@var{algorithm} @gol
 -freorder-blocks-and-partition  -freorder-functions @gol
 -frerun-cse-after-loop  -freschedule-modulo-scheduled-loops @gol
+-fretry-compilation=@var{option-list} @gol
 -frounding-math  -fsave-optimization-record @gol
 -fsched2-use-superblocks  -fsched-pressure @gol
 -fsched-spec-load  -fsched-spec-load-dangerous @gol
@@ -10796,6 +10797,44 @@ Perform a number of minor optimizations that are 
relatively expensive.
 
 Enabled at levels @option{-O2}, @option{-O3}, @option{-Os}.
 
+@item -fretry-compilation=@var{option-list}
+@opindex -fretry-compilation
+If the compilation fails, retry with additional options as specified in
+@var{option-list}.  This is actally implemented in the compiler driver,
+but the purpose is that you can use options that sporadically fail, and
+in that case, fall back to another option combination.  This is useful
+e.g. when you compile a large program or library and don't want to tweak
+the rules for each object file.
+option-list can specify one or options in a comma-separated list that are
+added at the end of the option list in a retry.  Multiple retries can be
+separated with a colon.  For example,
+@smallexample
+@option{-O3} @option{-fstd=c90} 
@option{-fretry-compilation=-mno-lra:-fno-tree-partial-pre,-fno-web:-O0}
+@end smallexample
+
+Will first run the compiler with the options
+@smallexample
+@option{-O3} @option{-fstd=c90}
+@end smallexample
+
+If that fails, it will re-try the compilation with:
+
+@smallexample
+@option{-O3} @option{-fstd=c90} @option{-mno-lra}
+@end smallexample
+
+If that too fails, it will re-try compilation with:
+
+@smallexample
+@option{-O3} @option{-fstd=c90} @option{-fno-tree-partial-pre} 
@option{-fno-web}
+@end smallexample
+
+And finally, if that too fails, it will re-try compilation with:
+
+@smallexample
+@option{-O3} @option{-fstd=c90} @option{-O0}
+@end smallexample
+
 @item -free
 @opindex free
 Attempt to remove redundant extension instructions.  This is especially
diff --git a/gcc/gcc.c b/gcc/gcc.c
index 4c1a659d5e8..7b056ac2840 100644
--- a/gcc/gcc.c
+++ b/gcc/gcc.c
@@ -3286,6 +3286,9 @@ execute (void)
n_commands++;
   }
 
+ retry:
+  bool retry_compilation_p = false;
+
   /* If -v, print what we are about to do, and maybe query.  */
 
   if (verbose_flag)
@@ -3506,6 +3509,12 @@ execute (void)
  try_generate_repro (commands[0].argv);
if (WEXITSTATUS (status) > greatest_status)
  greatest_status = WEXITSTATUS (status);
+   if (retry_compilation_str
+   && WEXITSTATUS (status) == ICE_EXIT_CODE
+   && i == 0
+   && (p = strrchr (commands[0].argv[0], DIR_SEPARATOR))
+   && ! strncmp (p + 1, "cc1", 3))
+ retry_compilation_p = true;
ret_code = -1;
  }
 
@@ -3561,6 +3570,40 @@ execute (void)
  }
   }
 
+commands[0].argv = argbuf.address ();
+while (retry_compilation_p)
+  {
+   int nargs, n_extra;
+   const char *p, *q, **new_argv;
+   for (nargs = 0; commands[0].argv[nargs] != NULL; ++nargs)
+ /* Only retry compiler ICEs, not preprocessor ones.  */
+  

RFA: reduce lra spill failures by splitting likely-spilled-reg hogging pseudo

2021-05-16 Thread Joern Rennecke
Bootstrapped regtested and on x86_64-pc-linux-gnu.
2021-02-22  Joern Rennecke  

lra fix to reduce fatal spill failures.

* lra-constraints.c (split_reg): No longer static.
* lra-int.h (split_reg): Declare.
* lra-assigns.c (lra_split_hard_reg_for): Add strategy to split a
longer range pseudo to accomodate a short range pseudo in a
likely-spilled reg.

diff --git a/gcc/lra-assigns.c b/gcc/lra-assigns.c
index c6a941fe663..4f765bbd8de 100644
--- a/gcc/lra-assigns.c
+++ b/gcc/lra-assigns.c
@@ -1799,6 +1799,35 @@ lra_split_hard_reg_for (void)
bitmap_clear (_reload_pseudos);
return true;
  }
+   /* For a likely spilled class, a pseudo hogging a hard register
+  and a hard reg use are pretty much interchangable.
+  If the use is for adjacent insns, we can win by splitting
+  a conflicting pseudo that has a larger range.  */
+   if (next_nonnote_insn (first) == last
+   && targetm.class_likely_spilled_p (rclass))
+ {
+   int j;
+   rtx_insn *j_first, *j_last;
+   for (j = lra_constraint_new_regno_start; j < max_regno; j++)
+ if (reg_renumber[j] >= 0
+ && REGNO_REG_CLASS (reg_renumber[j]) == rclass
+ && (hard_regno_nregs (reg_renumber[j],
+   GET_MODE (regno_reg_rtx[j]))
+ >= hard_regno_nregs (reg_renumber[j],
+  GET_MODE (regno_reg_rtx[i])))
+ && find_reload_regno_insns (j, j_first, j_last)
+ && j_first != j_last && j_last != last)
+   {
+ for (insn = NEXT_INSN (j_first); insn != j_last;
+  insn = NEXT_INSN (insn))
+   if (insn == first
+   && split_reg (TRUE, j, first, NULL, last))
+ {
+   bitmap_clear (_reload_pseudos);
+   return true;
+ }
+   }
+ }
bitmap_set_bit (_reload_pseudos, i);
   }
   bitmap_clear (_reload_pseudos);
diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
index a766f1fd7e8..a13d02a9028 100644
--- a/gcc/lra-constraints.c
+++ b/gcc/lra-constraints.c
@@ -5774,7 +5774,7 @@ lra_copy_reg_equiv (unsigned int new_regno, unsigned int 
original_regno)
register and s is a new split pseudo.  The save is put before INSN
if BEFORE_P is true. Return true if we succeed in such
transformation.  */
-static bool
+bool
 split_reg (bool before_p, int original_regno, rtx_insn *insn,
   rtx next_usage_insns, rtx_insn *to)
 {
diff --git a/gcc/lra-int.h b/gcc/lra-int.h
index 4dadccc79f4..eece250eafb 100644
--- a/gcc/lra-int.h
+++ b/gcc/lra-int.h
@@ -346,6 +346,9 @@ extern void lra_constraints_finish (void);
 extern bool spill_hard_reg_in_range (int, enum reg_class, rtx_insn *, rtx_insn 
*);
 extern void lra_inheritance (void);
 extern bool lra_undo_inheritance (void);
+extern bool split_reg (bool before_p, int original_regno, rtx_insn *insn,
+  rtx next_usage_insns, rtx_insn *to);
+
 
 /* lra-lives.c: */
 


RFA: Support cobbers in define_cond_exec

2021-05-16 Thread Joern Rennecke
At the moment, define_cond_exec allows only a single substitution
pattern.  That is
rather limiting if the target needs to clobber a scratch register in
order to compute the
required condition.
The attached patch allows to add clobber patterns after the main
pattern, and also adds
support for MATCH_SCRATCH in alter_predicate_for_insn.

This makes most sense together with the previous patch for MATCH_DUP support,
although the latter can also be used stand-alone, so have posted and
tested these
patches separately.

Bootstrapped on x86_64-pc-linux-gnu.
2020-12-12  Joern Rennecke  

Fix define_cond_exec flaw of not accepting clobbers.
* gensupport.c (alter_predicate_for_insn): Handle MATCH_SCRATCH.
(process_one_cond_exec): Allow extra patterns for clobbers.

diff --git a/gcc/gensupport.c b/gcc/gensupport.c
index e1ca06dbc1e..b472dc115b5 100644
--- a/gcc/gensupport.c
+++ b/gcc/gensupport.c
@@ -1198,8 +1198,11 @@ alter_predicate_for_insn (rtx pattern, int alt, int 
max_op,
   switch (code)
 {
 case MATCH_OPERAND:
+case MATCH_SCRATCH:
   {
-   const char *c = XSTR (pattern, 2);
+   const char **altstr_loc
+ =  (pattern, code == MATCH_SCRATCH ? 1 : 2);
+   const char *c = *altstr_loc;
 
if (n_alternatives (c) != 1)
  {
@@ -1216,19 +1219,22 @@ alter_predicate_for_insn (rtx pattern, int alt, int 
max_op,
char *new_c = XNEWVEC (char, len);
 
memcpy (new_c, c, c_len);
+   char *wp = new_c + c_len;
+   if (*c == '=')
+ c++, c_len--;
for (i = 1; i < alt; ++i)
  {
-   new_c[i * (c_len + 1) - 1] = ',';
-   memcpy (_c[i * (c_len + 1)], c, c_len);
+   *wp++ = ',';
+   memcpy (wp, c, c_len);
+   wp += c_len;
  }
-   new_c[len - 1] = '\0';
-   XSTR (pattern, 2) = new_c;
+   *wp = '\0';
+   *altstr_loc = new_c;
  }
   }
   /* Fall through.  */
 
 case MATCH_OPERATOR:
-case MATCH_SCRATCH:
 case MATCH_PARALLEL:
   XINT (pattern, 0) += max_op;
   break;
@@ -1754,13 +1760,18 @@ process_one_cond_exec (class queue_elem *ce_elem)
   collect_insn_data (insn_elem->data, , _operand);
   max_operand += 1;
 
-  if (XVECLEN (ce_elem->data, 0) != 1)
+  for (i = XVECLEN (ce_elem->data, 0) - 1; i > 0; i--)
{
- error_at (ce_elem->loc, "too many patterns in predicate");
- return;
+ rtx part = XVECEXP (ce_elem->data, 0, i);
+ if (GET_CODE (part) != CLOBBER)
+   {
+ error_at (ce_elem->loc, "too many patterns in predicate");
+ return;
+   }
}
 
   pred = copy_rtx (XVECEXP (ce_elem->data, 0, 0));
+  int n_clobbers = XVECLEN (ce_elem->data, 0) - 1;
   pred = alter_predicate_for_insn (pred, alternatives, max_operand,
   ce_elem->loc);
   if (pred == NULL)
@@ -1774,8 +1785,15 @@ process_one_cond_exec (class queue_elem *ce_elem)
   pattern = rtx_alloc (COND_EXEC);
   XEXP (pattern, 0) = pred;
   XEXP (pattern, 1) = add_implicit_parallel (XVEC (insn, 1));
-  XVEC (insn, 1) = rtvec_alloc (1);
+  XVEC (insn, 1) = rtvec_alloc (1 + n_clobbers);
   XVECEXP (insn, 1, 0) = pattern;
+  for (int i = n_clobbers; i > 0; i--)
+   {
+ rtx clobber = copy_rtx (XVECEXP (ce_elem->data, 0, i));
+ clobber = alter_predicate_for_insn (clobber, alternatives,
+ max_operand, ce_elem->loc);
+ XVECEXP (insn, 1, i) = clobber;
+   }
 
if (XVEC (ce_elem->data, 3) != NULL)
{


RFA: Fix match_dup numbering bug in define_cond_exec

2021-05-16 Thread Joern Rennecke
(Sorry about re-sending - I accidentally forgot to add a subject in
the last post, which would make it hard to have a meaningful thread.)

At the moment, for a match_dup in a define_cond_exec, you'd have to
give the number in the
resulting pattern(s) rather than in the substitute pattern.  That's
not only wrong, but can also
be impossible when the pattern should apply to multiple patterns with
different operand numbers.

The attached patch fixes this.

Bootstrapped on x86_64-pc-linux-gnu.
2020-12-12  Joern Rennecke  

Fix match_dup bug of define_cond_exec.
* gensupport.c (alter_predicate_for_insn): Handle MATCH_DUP.

diff --git a/gcc/gensupport.c b/gcc/gensupport.c
index e1ca06dbc1e..92275358078 100644
--- a/gcc/gensupport.c
+++ b/gcc/gensupport.c
@@ -1230,6 +1230,7 @@ alter_predicate_for_insn (rtx pattern, int alt, int 
max_op,
 case MATCH_OPERATOR:
 case MATCH_SCRATCH:
 case MATCH_PARALLEL:
+case MATCH_DUP:
   XINT (pattern, 0) += max_op;
   break;
 


[no subject]

2021-05-16 Thread Joern Rennecke
At the moment, for a match_dup in a define_cond_exec, you'd have to
give the number in the
resulting pattern(s) rather than in the substitute pattern.  That's
not only wrong, but can also
be impossible when the pattern should apply to multiple patterns with
different operand numbers.

The attached patch fixes this.

Bootstrapped on x86_64-pc-linux-gnu.
2020-12-12  Joern Rennecke  

Fix match_dup bug of define_cond_exec.
* gensupport.c (alter_predicate_for_insn): Handle MATCH_DUP.

diff --git a/gcc/gensupport.c b/gcc/gensupport.c
index e1ca06dbc1e..92275358078 100644
--- a/gcc/gensupport.c
+++ b/gcc/gensupport.c
@@ -1230,6 +1230,7 @@ alter_predicate_for_insn (rtx pattern, int alt, int 
max_op,
 case MATCH_OPERATOR:
 case MATCH_SCRATCH:
 case MATCH_PARALLEL:
+case MATCH_DUP:
   XINT (pattern, 0) += max_op;
   break;
 


RFA: Improve message for wrong number of alternatives

2021-05-16 Thread Joern Rennecke
When you have lots of operands and lots of alternatives in a pattern,
it is often not immediately apparent if the problem is in the
indicated alternative or in the one that genoutput uses as a reference
for the 'correct' number of alternatives, and/or if you dropped a
comma or had one too many.  By making genoutput tell you what the
argument counts are, this gets a little bit easier.

Bootstrapped on x86_64-pc-linux-gnu.
2021-01-13  Joern Rennecke  

Make "wrong number of alternatives" message a bit more specific.

diff --git a/gcc/genoutput.c b/gcc/genoutput.c
index 8e911cce2f5..d2836f85bbf 100644
--- a/gcc/genoutput.c
+++ b/gcc/genoutput.c
@@ -813,8 +813,8 @@ validate_insn_alternatives (class data *d)
if (n == 0)
  n = d->operand[start].n_alternatives;
else if (n != d->operand[start].n_alternatives)
- error_at (d->loc, "wrong number of alternatives in operand %d",
-   start);
+ error_at (d->loc, "wrong number of alternatives in operand %d, 
%d, expected %d",
+   start, d->operand[start].n_alternatives, n);
  }
   }
 


RFA: Fix match_scratch bug in define_subst

2021-05-16 Thread Joern Rennecke
Bootstrapped on x86_64-pc-linux-gnu.
2020-12-10  Joern Rennecke  

Fix bug in the define_subst handling that made match_scratch unusable for
multi-alternative patterns.

diff --git a/gcc/gensupport.c b/gcc/gensupport.c
index e1ca06dbc1e..4022c661adb 100644
--- a/gcc/gensupport.c
+++ b/gcc/gensupport.c
@@ -1291,6 +1291,9 @@ alter_constraints (rtx pattern, int n_dup, 
constraints_handler_t alter)
 case MATCH_OPERAND:
   XSTR (pattern, 2) = alter (XSTR (pattern, 2), n_dup);
   break;
+case MATCH_SCRATCH:
+  XSTR (pattern, 1) = alter (XSTR (pattern, 1), n_dup);
+  break;
 
 default:
   break;


Re: RFA: Fix uninitialized memory use in sched_macro_fuse_insns

2019-04-05 Thread Joern Rennecke
On Fri, 5 Apr 2019 at 11:07, Richard Sandiford
 wrote:


> > 2019-04-04  Joern Rennecke  
> >
> >   * sched-deps.c (sched_macro_fuse_insns): Check return value of
> >   targetm.fixed_condition_code_regs.
>
> OK, thanks.

Thanks for the review.

Is that OK restricted to delayed applying once the gcc 9 branch has
been cut and gcc 10 stage 1 opened (because the bug is not a
regression unless going back to 2013)
or also OK to apply to the current 9.0.0 trunk (since this should be a
safe patch and leaving the bug in might hinder debugging to find
actual regressions) ?


RFA: Fix uninitialized memory use in sched_macro_fuse_insns

2019-04-04 Thread Joern Rennecke
sched_macro_fuse_insns uses the value in condreg1 without
checking the return value of targetm.fixed_condition_code_regs.  As
this variables
is not initialized anywhere, this leads to constructing cc_reg_1 with
an undefined value,
and then using that in reg_referenced_p, if TARGET_FIXED_CONDITION_CODE_REGS
has the default value as defined in target.def (hook_bool_uintp_uintp_false).

The attached patch fixes this by checking the return value of
targetm.fixed_condition_code_regs.  Bootstrapped & regtested on
x86_64-pc-linux-gnu .
2019-04-04  Joern Rennecke  

* sched-deps.c (sched_macro_fuse_insns): Check return value of
targetm.fixed_condition_code_regs.

Index: sched-deps.c
===
--- sched-deps.c(revision 270146)
+++ sched-deps.c(working copy)
@@ -2857,14 +2857,16 @@ sched_macro_fuse_insns (rtx_insn *insn)
 {
   unsigned int condreg1, condreg2;
   rtx cc_reg_1;
-  targetm.fixed_condition_code_regs (, );
-  cc_reg_1 = gen_rtx_REG (CCmode, condreg1);
-  if (reg_referenced_p (cc_reg_1, PATTERN (insn))
- && modified_in_p (cc_reg_1, prev))
+  if (targetm.fixed_condition_code_regs (, ))
{
- if (targetm.sched.macro_fusion_pair_p (prev, insn))
-   SCHED_GROUP_P (insn) = 1;
- return;
+ cc_reg_1 = gen_rtx_REG (CCmode, condreg1);
+ if (reg_referenced_p (cc_reg_1, PATTERN (insn))
+ && modified_in_p (cc_reg_1, prev))
+   {
+ if (targetm.sched.macro_fusion_pair_p (prev, insn))
+   SCHED_GROUP_P (insn) = 1;
+ return;
+   }
}
 }
 


RFA: fix avr C++ preprocessing to pick up device defines

2014-11-09 Thread Joern Rennecke
The defaults.h definition of
#define CPLUSPLUS_CPP_SPEC CPP_SPEC
does not do the right thing with the cpp spec picked up from a spec file,
which is now needed for -mmcu processing.
Also, a spec file can't override CPLUSPLUS_CPP_SPEC as such, since
that string is hard-coded into the compiler.
By setting CPLUSPLUS_CPP_SPEC to %(cpp), we let the cc1plus
preprocessor look up the actual value of the preprocessor specs.

OK to apply?
2014-11-09  Joern Rennecke  joern.renne...@embecosm.com

* /config/avr/avr.h (CPLUSPLUS_CPP_SPEC): Define.

diff --git a/gcc/config/avr/avr.h b/gcc/config/avr/avr.h
index 0b48423..46ed0a4 100644
--- a/gcc/config/avr/avr.h
+++ b/gcc/config/avr/avr.h
@@ -505,6 +505,10 @@ typedef struct avr_args
 #define DRIVER_SELF_SPECS  %{mmcu=*:-specs=device-specs/specs-%*%s %mmcu=*} 
 #define CPP_SPEC 
 
+/* We want cc1plus used as a preprocessor to pick up the cpp spec from the
+   per-device spec files  */
+#define CPLUSPLUS_CPP_SPEC %(cpp)
+
 #define CC1_SPEC 
 
 #define CC1PLUS_SPEC %{!frtti:-fno-rtti} \


Committed: Fix typo in low_io_address_operand

2014-11-09 Thread Joern Rennecke
I forgot the 'x' number base specifier in r216034.
Committed as obvious.
2014-11-09  Joern Rennecke  joern.renne...@embecosm.com

	* config/avr/predicates.md (low_io_address_operand): Fix typo.

Index: config/avr/predicates.md
===
--- config/avr/predicates.md	(revision 217265)
+++ config/avr/predicates.md	(working copy)
@@ -46,7 +46,7 @@ (define_predicate stack_register_operan
 (define_special_predicate low_io_address_operand
   (ior (and (match_code const_int)
 	(match_test IN_RANGE (INTVAL (op) - avr_current_arch-sfr_offset,
-   0, 020 - GET_MODE_SIZE (mode
+   0, 0x20 - GET_MODE_SIZE (mode
(and (match_code symbol_ref)
 	(match_test SYMBOL_REF_FLAGS (op)  SYMBOL_FLAG_IO_LOW
 


Re: [Patch 3/7 arc] Deprecate *_BY_PIECES_P, move to hookized version

2014-11-04 Thread Joern Rennecke
On 31 October 2014 15:10, James Greenhalgh james.greenha...@arm.com wrote:

 While I am there, arc defines a macro CAN_MOVE_BY_PIECES, which is
 unused, so clean that up too.

That's not a clean-up.  This pertains to PR 39350.
Which, incidentally, this hookization completely ignores, entrenching
the conflation of
move expander and move cost estimates.
Thus, can_move_by_pieces gives the wrong result for purposes of rtl
optimizations
when a target-specific movmem etc expander emits target-specific code.
The patch at https://gcc.gnu.org/ml/gcc-patches/2009-03/txt00018.txt
shows a number of call sites that are affected.

 arc only implements MOVE_BY_PIECES_P, wiring it to false. Mirror that
 behaviour, and use the default hook for other by_pieces operations.

 I tried building a compiler but no amount of fiddling with target
 strings got me to a sensible result, so this patch is completely
 untested.

You could just pick one of the configs in contrib/config-list.mk


Re: [Patch 3/7 arc] Deprecate *_BY_PIECES_P, move to hookized version

2014-11-04 Thread Joern Rennecke
On 4 November 2014 14:24, James Greenhalgh james.greenha...@arm.com wrote:
 On Tue, Nov 04, 2014 at 12:07:56PM +, Joern Rennecke wrote:
 On 31 October 2014 15:10, James Greenhalgh james.greenha...@arm.com wrote:

  While I am there, arc defines a macro CAN_MOVE_BY_PIECES, which is
  unused, so clean that up too.

 That's not a clean-up.  This pertains to PR 39350.

 Well, it is a clean-up in the sense that this macro is completely unused
 in the compiler and has no effect, but please revert this hunk if that
 is your preference.

 Which, incidentally, this hookization completely ignores, entrenching
 the conflation of move expander and move cost estimates.

 No, I have to disagree. The use_by_pieces_infrastructure_p hook doesn't
 conflate anything - it gives a response to the question Should the
 by_pieces infrastructure be used?. A target specific movmem pattern
 - though it might itself choose to move things by pieces, is
 categorically not using the move_by_pieces infrastructure.

 If we want to keep a clean separation of concerns here, we would
 want a similar target hook asking the single question will your
 movmem/setmem expander succeed?.

That would not be helpful.  What the rtl optimizers actually want to know is
will this block copy / memset be cheap? .
A movmem expander might succeed (or not) for various reasons.  The one that's
interesting for the above question is if the call has been inlined
with a fast set
of instructions.

 Thus, can_move_by_pieces gives the wrong result for purposes of rtl
 optimizations
 when a target-specific movmem etc expander emits target-specific code.
 The patch at https://gcc.gnu.org/ml/gcc-patches/2009-03/txt00018.txt
 shows a number of call sites that are affected.

 can_move_by_pieces (likewise can_store_by_pieces) gives the right
 result, the RTL expanders are using it wrong.

I could agree with that view if there was a good strategy agreed what the rtl
expanders should do instead.

 I disagree with the approach taken in your patch as it overloads the
 purpose of can_move_by_pieces. However, I would support a patch pulling
 this out in to two hooks, so the call in
 value-prof.c:gimple_stringops_transform would change from:

   if (!can_move_by_pieces (val, MIN (dest_align, src_align)))
 return false;

 to something like:

   if (!can_move_by_pieces (val, MIN (dest_align, src_align))
!targetm.can_expand_mem_op_p (val, MIN (dest_align, src_align),
MOVE_BY_PIECES))
 return false;

But this goes back to the problem that it's not about if we can expand the mem
op at all, but if we have a fast expansion.  We can always expand via libcall
(the middle end does this as a fall-back).  Also, the target might do some
target-specific slow expansion, e.g. call a function with another name
and maybe a
modified ABI, but still relatively slow to work.

So, either the new hook would answer the wrong question, or it would be
misnamed, in which case it's likely that the semantics will sooner or
later follow
the name.
it will gravitate to answer the wrong question again.

 But let's not confuse the use of what should be a simple hook!

What would that be?  TARGET_RTX_COST is unsuitable because the RTL
for the call hasn't been made yet, and it it was, it would tend to be multiple
instructions, maybe even a loop.
Should we have an analogous TARGET_TREE_COST hook, so that you can ask the
target what it thinks the cost of a tree will be once it's expanded?


Re: [patch,avr] correct incorrect spec string for device specs

2014-11-03 Thread Joern Rennecke
On 3 November 2014 14:33, Sivanupandi, Pitchumani
pitchumani.sivanupa...@atmel.com wrote:
 Hi,

 Unrecognized option error is issued by avr-gcc for devices with AVR_ISA_RMW.
 This is because of an incorrect spec string device spec generation.

 Below patch corrects the incorrect spec string in gen-avr-mmcu-specs.c.
 If OK, could someone commit please?

 diff --git a/gcc/config/avr/gen-avr-mmcu-specs.c 
 b/gcc/config/avr/gen-avr-mmcu-specs.c
 index 73bacf4..772e862 100644
 --- a/gcc/config/avr/gen-avr-mmcu-specs.c
 +++ b/gcc/config/avr/gen-avr-mmcu-specs.c
 @@ -53,7 +53,7 @@ print_mcu (const avr_mcu_t *mcu)
  ?  -msp8 :  %msp8);

errata_skip = (mcu-dev_attribute  AVR_ERRATA_SKIP) ?  -mskip-bug : ;
 -  rmw = (mcu-dev_attribute  AVR_ISA_RMW) ? %%{!mno-rmw: -mrmw} : ;
 +  rmw = (mcu-dev_attribute  AVR_ISA_RMW) ? %{!mno-rmw: -mrmw} : ;

const char *arch_name = avr_arch_types[mcu-arch].arch_name;

 Regards,
 Pitchumani

 Gcc/ChangeLog

 2014-11-03  Pitchumani Sivanupandi pitchuman...@atmel.com

 * config/avr/gen-avr-mmcu-specs.c: Remove unnecessary format specifier.

Oops, indeed.  Although the way I'd put it is that what you're removing is an
extraneous %-printf quoting - extraneous because the variable rmw is not
actually a format string, but a string that is emitted under control
of format strings.


Re: [ARC] RFA: Use new rtl iterators in small_data_pattern

2014-11-02 Thread Joern Rennecke
On 25 October 2014 10:54, Richard Sandiford rdsandif...@googlemail.com wrote:
 This is part of a series to remove uses of for_each_rtx from the ports.

 Tested by making sure there were no code changes for gcc.dg, gcc.c-torture
 and g++.dg for arc-elf.  OK to install?

OK.


Re: [ARC] RFA: Use new rtl iterators in arc_rewrite_small_data

2014-11-02 Thread Joern Rennecke
On 25 October 2014 10:53, Richard Sandiford rdsandif...@googlemail.com wrote:
 ...
 Tested by making sure there were no code changes for gcc.dg, gcc.c-torture
 and g++.dg for arc-elf.  OK to install?

 Thanks,
 Richard


 gcc/
 * config/arc/arc.c: Include rtl-iter.h.
 (arc_rewrite_small_data_1): Delete.
 (arc_rewrite_small_data): Use FOR_EACH_SUBRTX_PTR.

OK.


Re: [ARC] RFA: Use new rtl iterators in arc600_corereg_hazard

2014-11-02 Thread Joern Rennecke
On 25 October 2014 10:56, Richard Sandiford rdsandif...@googlemail.com wrote:
 This is part of a series to remove uses of for_each_rtx from the ports.

 Tested by making sure there were no code changes for gcc.dg, gcc.c-torture
 and g++.dg for arc-elf.  OK to install?

 Thanks,
 Richard


 gcc/
 * config/arc/arc.c (arc600_corereg_hazard_1): Delete.
 (arc600_corereg_hazard): Use FOR_EACH_SUBRTX.

OK.

 +  FOR_EACH_SUBRTX (iter, array, PATTERN (pred), NONCONST)

I was wondering for a while what kind of (NON)CONST this was about...
but as I glean
from the source, as long as nobody packs an address with a side effect
into a (CONST (MEM (...)) -
which should never happen in the first place, no matter if the MEM
itself is really const - we
should be fine.


Re: [ARC] RFA: Use new rtl iterators in arc_write_ext_corereg

2014-11-02 Thread Joern Rennecke
On 25 October 2014 10:58, Richard Sandiford rdsandif...@googlemail.com wrote:
 This is part of a series to remove uses of for_each_rtx from the ports.

 Tested by making sure there were no code changes for gcc.dg, gcc.c-torture
 and g++.dg for arc-elf.  OK to install?

 Thanks,
 Richard


 gcc/
 * config/arc/arc.c (write_ext_corereg_1): Delete.
 (arc_write_ext_corereg): Use FOR_EACH_SUBRTX.

OK.


RFA: Add libstdc++-v3 support for avr 1/7: toplevel Makefile check-target-*

2014-10-21 Thread Joern Rennecke
Make can't 'build' check-c++ without rules for check-target-libgomp-c++ /
check-target-libitm-c++

This patch makes sure that there's at least a dummy rule available.


Re: RFA: Add libstdc++-v3 support for avr 1/7: toplevel Makefile check-target-*

2014-10-21 Thread Joern Rennecke
On 21 October 2014 16:35, Joern Rennecke joern.renne...@embecosm.com wrote:
 Make can't 'build' check-c++ without rules for check-target-libgomp-c++ /
 check-target-libitm-c++

 This patch makes sure that there's at least a dummy rule available.

Sorry - forgot to attach the patch - here it is:
toplevel:

2014-09-15  Joern Rennecke  joern.renne...@embecosm.com

Allow check-c++ to work on avr:
* Makefile.tpl (check-target-libgomp-c++): Always provide some rule.
(check-target-libitm-c++): Likewise.
* Makefile.in: Regenerate.

Index: Makefile.tpl
===
--- Makefile.tpl(revision 216243)
+++ Makefile.tpl(working copy)
@@ -1411,16 +1411,20 @@ TARGET-stage[+id+]-[+prefix+][+module+]
 [+ ENDFOR recursive_targets +]
 [+ ENDFOR target_modules +]
 
-@if target-libgomp
 .PHONY: check-target-libgomp-c++
 check-target-libgomp-c++:
+
+@if target-libgomp
+check-target-libgomp-c++:
$(MAKE) RUNTESTFLAGS=$(RUNTESTFLAGS) c++.exp check-target-libgomp
 
 @endif target-libgomp
 
-@if target-libitm
 .PHONY: check-target-libitm-c++
 check-target-libitm-c++:
+
+@if target-libitm
+check-target-libitm-c++:
$(MAKE) RUNTESTFLAGS=$(RUNTESTFLAGS) c++.exp check-target-libitm
 
 @endif target-libitm
Index: Makefile.in
===
--- Makefile.in (revision 216243)
+++ Makefile.in (working copy)
@@ -45679,16 +45679,20 @@ TARGET-stagefeedback-target-libgomp = $(
 
 
 
-@if target-libgomp
 .PHONY: check-target-libgomp-c++
 check-target-libgomp-c++:
+
+@if target-libgomp
+check-target-libgomp-c++:
$(MAKE) RUNTESTFLAGS=$(RUNTESTFLAGS) c++.exp check-target-libgomp
 
 @endif target-libgomp
 
-@if target-libitm
 .PHONY: check-target-libitm-c++
 check-target-libitm-c++:
+
+@if target-libitm
+check-target-libitm-c++:
$(MAKE) RUNTESTFLAGS=$(RUNTESTFLAGS) c++.exp check-target-libitm
 
 @endif target-libitm


RFA: Add libstdc++-v3 support for avr 2/7: config/avr

2014-10-21 Thread Joern Rennecke

gcc:

2014-09-23  Joern Rennecke  joern.renne...@embecosm.com

* config/avr/avr.h (LIBSTDCXX): Don't define.

* config/avr/avr.c (TARGET_UNWIND_WORD_MODE): Define.
(avr_unwind_word_mode): New function.

* config/avr/avr.c (avr_asm_function_rodata_section):
When merging something into a .gnu.linkonce.t.* function-specific
section, always add the SECTION_CODE flag.

Index: config/avr/avr.c
===
--- config/avr/avr.c(revision 216243)
+++ config/avr/avr.c(working copy)
@@ -8667,8 +8667,20 @@ avr_asm_function_rodata_section (tree de
 {
   const char *rname = ACONCAT ((new_prefix,
 name + strlen (old_prefix), NULL));
-  flags = ~SECTION_CODE;
-  flags |= AVR_HAVE_JMP_CALL ? 0 : SECTION_CODE;
+ if (i == 0)
+   {
+ flags = ~SECTION_CODE;
+ flags |= AVR_HAVE_JMP_CALL ? 0 : SECTION_CODE;
+   }
+ else
+   {
+ /* The flags have to match the existing section where the
+function proper went, lest varasm.c:get_section will
+complain: ...include/bits/locale_facets_nonio.tcc:
+In member function '447 chars of c++ name':
+ 447 chars of c++ name causes a section type conflict  */
+ flags |= SECTION_CODE;
+   }
 
   return get_section (rname, flags, frodata-named.decl);
 }
@@ -12721,6 +12733,16 @@ #define TARGET_ADDR_SPACE_LEGITIMATE_ADD
 #undef  TARGET_PRINT_OPERAND_PUNCT_VALID_P
 #define TARGET_PRINT_OPERAND_PUNCT_VALID_P avr_print_operand_punct_valid_p
 
+#undef  TARGET_UNWIND_WORD_MODE
+#define TARGET_UNWIND_WORD_MODE avr_unwind_word_mode
+
+static enum machine_mode
+avr_unwind_word_mode (void)
+{
+  return Pmode;
+}
+
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 
Index: config/avr/avr.h
===
--- config/avr/avr.h(revision 216243)
+++ config/avr/avr.h(working copy)
@@ -516,9 +516,6 @@ #define LINK_SPEC \
 #define LIB_SPEC \
   
%{!mmcu=at90s1*:%{!mmcu=attiny11:%{!mmcu=attiny12:%{!mmcu=attiny15:%{!mmcu=attiny28:
 -lc }
 
-#define LIBSTDCXX gcc
-/* No libstdc++ for now.  Empty string doesn't work.  */
-
 #define LIBGCC_SPEC \
   
%{!mmcu=at90s1*:%{!mmcu=attiny11:%{!mmcu=attiny12:%{!mmcu=attiny15:%{!mmcu=attiny28:
 -lgcc }
 


RFA: Add libstdc++-v3 support for avr 3/7: libstdc+-v3 avr configuration

2014-10-21 Thread Joern Rennecke

libstdc++-v3:

2013-06-14  Joern Rennecke joern.renne...@embecosm.com

* configure.ac [avr-*-*]: Don't use AC_LIBTOOL_DLOPEN.
* crossconfig.m4: Add avr-*-* settings.
* configure: Regenerate.

Index: configure.ac
===
--- configure.ac(revision 216243)
+++ configure.ac(working copy)
@@ -90,7 +90,13 @@ AH_TEMPLATE(VERSION, [Version number of
 
 # Libtool setup.
 if test x${with_newlib} != xyes; then
-  AC_LIBTOOL_DLOPEN
+  case ${host} in
+avr-*-*)
+  ;;
+*)
+  AC_LIBTOOL_DLOPEN
+  ;;
+  esac
 fi
 AM_PROG_LIBTOOL
 ACX_LT_HOST_FLAGS
Index: crossconfig.m4
===
--- crossconfig.m4  (revision 216243)
+++ crossconfig.m4  (working copy)
@@ -9,6 +9,37 @@ AC_DEFUN([GLIBCXX_CROSSCONFIG],[
 # This is a freestanding configuration; there is nothing to do here.
 ;;
 
+  avr-*-*)
+AC_DEFINE(HAVE_ISINF)
+AC_DEFINE(HAVE_ISNAN)
+AC_DEFINE(HAVE_HYPOT)
+
+# ??? avr-libc/include/math.h used crude defines for float functions.
+# Should we rather disable these for c++ than acknowledge them?
+AC_DEFINE(HAVE_FABSF)
+AC_DEFINE(HAVE_ACOSF)
+AC_DEFINE(HAVE_ASINF)
+AC_DEFINE(HAVE_ATANF)
+AC_DEFINE(HAVE_ATAN2F)
+AC_DEFINE(HAVE_CEILF)
+AC_DEFINE(HAVE_COSF)
+AC_DEFINE(HAVE_COSHF)
+AC_DEFINE(HAVE_EXPF)
+AC_DEFINE(HAVE_FLOORF)
+AC_DEFINE(HAVE_FMODF)
+AC_DEFINE(HAVE_FREXPF)
+AC_DEFINE(HAVE_SQRTF)
+AC_DEFINE(HAVE_HYPOTF)
+AC_DEFINE(HAVE_LDEXPF)
+AC_DEFINE(HAVE_LOGF)
+AC_DEFINE(HAVE_LOG10F)
+AC_DEFINE(HAVE_POWF)
+AC_DEFINE(HAVE_SINF)
+AC_DEFINE(HAVE_SINHF)
+AC_DEFINE(HAVE_TANF)
+AC_DEFINE(HAVE_TANHF)
+;;
+
   mips*-sde-elf*)
 # These definitions are for the SDE C library rather than newlib.
 SECTION_FLAGS='-ffunction-sections -fdata-sections'
Index: configure
===
--- configure   (revision 216243)
+++ configure   (working copy)
@@ -5301,10 +5301,16 @@ if (eval $ac_cpp conftest.$ac_ext) 2
 
 # Libtool setup.
 if test x${with_newlib} != xyes; then
-  enable_dlopen=yes
+  case ${host} in
+avr-*-*)
+  ;;
+*)
+  enable_dlopen=yes
 
 
 
+  ;;
+  esac
 fi
 case `pwd` in
   *\ * | *\*)
@@ -11531,7 +11537,7 @@ return dld_link ();
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat  conftest.$ac_ext _LT_EOF
-#line 11534 configure
+#line 11540 configure
 #include confdefs.h
 
 #if HAVE_DLFCN_H
@@ -11637,7 +11643,7 @@ int main ()
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat  conftest.$ac_ext _LT_EOF
-#line 11640 configure
+#line 11646 configure
 #include confdefs.h
 
 #if HAVE_DLFCN_H
@@ -15057,7 +15063,7 @@ main ()
 #
 # Fake what AC_TRY_COMPILE does.  XXX Look at redoing this new-style.
 cat  conftest.$ac_ext  EOF
-#line 15060 configure
+#line 15066 configure
 struct S { ~S(); };
 void bar();
 void foo()
@@ -15409,7 +15415,7 @@ main ()
   # Fake what AC_TRY_COMPILE does.
 
 cat  conftest.$ac_ext  EOF
-#line 15412 configure
+#line 15418 configure
 int main()
 {
   typedef bool atomic_type;
@@ -15444,7 +15450,7 @@ int main()
 rm -f conftest*
 
 cat  conftest.$ac_ext  EOF
-#line 15447 configure
+#line 15453 configure
 int main()
 {
   typedef short atomic_type;
@@ -15479,7 +15485,7 @@ int main()
 rm -f conftest*
 
 cat  conftest.$ac_ext  EOF
-#line 15482 configure
+#line 15488 configure
 int main()
 {
   // NB: _Atomic_word not necessarily int.
@@ -15515,7 +15521,7 @@ int main()
 rm -f conftest*
 
 cat  conftest.$ac_ext  EOF
-#line 15518 configure
+#line 15524 configure
 int main()
 {
   typedef long long atomic_type;
@@ -15594,7 +15600,7 @@ int main()
   # unnecessary for this test.
 
 cat  conftest.$ac_ext  EOF
-#line 15597 configure
+#line 15603 configure
 int main()
 {
   _Decimal32 d1;
@@ -15636,7 +15642,7 @@ int main()
   # unnecessary for this test.
 
 cat  conftest.$ac_ext  EOF
-#line 15639 configure
+#line 15645 configure
 templatetypename T1, typename T2
   struct same
   { typedef T2 type; };
@@ -15670,7 +15676,7 @@ int main()
 rm -f conftest*
 
 cat  conftest.$ac_ext  EOF
-#line 15673 configure
+#line 15679 configure
 templatetypename T1, typename T2
   struct same
   { typedef T2 type; };
@@ -27853,6 +27859,62 @@ main ()
 # This is a freestanding configuration; there is nothing to do here.
 ;;
 
+  avr-*-*)
+$as_echo #define HAVE_ISINF 1 confdefs.h
+
+$as_echo #define HAVE_ISNAN 1 confdefs.h
+
+$as_echo #define HAVE_HYPOT 1 confdefs.h
+
+
+# ??? avr-libc/include/math.h used crude defines for float functions.
+# Should we rather disable these for c++ than acknowledge them?
+$as_echo #define HAVE_FABSF 1 confdefs.h
+
+$as_echo #define HAVE_ACOSF 1 confdefs.h

RFA: Add libstdc++-v3 support for avr 4/7: fix locale_facets_nonio overloading on struct tm using template

2014-10-21 Thread Joern Rennecke

libstdc++-v3:

2013-06-14  Joern Rennecke joern.renne...@embecosm.com

* include/bits/locale_facets_nonio.h (__tm_small_int): typedef/define.
(_M_extract_num): Templatize base type of __member argument.
(_M_extract_name): Change type of __member argument to __tm_small_int.
* include/bits/locale_facets_nonio.tcc (_M_extract_via_format) Z:
Change type of __tmp to __tm_small_int.
(_M_extract_num): Templatize base type of __member argument.
(_M_extract_name): Change type of __member argument to __tm_small_int.

Index: include/bits/locale_facets_nonio.h
===
--- include/bits/locale_facets_nonio.h  (revision 216243)
+++ include/bits/locale_facets_nonio.h  (working copy)
@@ -42,6 +42,15 @@ namespace std _GLIBCXX_VISIBILITY(defaul
 {
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
+#ifdef __AVR__
+/* The struct tm defined in avr-libc/include/time.h uses int8_t for a number
+   of fields.  To allow overload resolution to succeed, we need to adjust
+   some data structures and functions to match.  */
+typedef typeof (((tm*)0)-tm_sec) __tm_small_int;
+#else /* For 100% mangling compatibility, use int directly.  */
+#define __tm_small_int int
+#endif
+
   /**
*  @brief  Time format ordering data.
*  @ingroup locales
@@ -654,14 +663,16 @@ namespace std _GLIBCXX_VISIBILITY(defaul
  ios_base::iostate __err, tm* __tm) const;
 
   // Extract numeric component of length __len.
+  template typename _Member_t
   iter_type
-  _M_extract_num(iter_type __beg, iter_type __end, int __member,
+  _M_extract_num(iter_type __beg, iter_type __end, _Member_t __member,
 int __min, int __max, size_t __len,
 ios_base __io, ios_base::iostate __err) const;
 
   // Extract any unique array of string literals in a const _CharT* array.
   iter_type
-  _M_extract_name(iter_type __beg, iter_type __end, int __member,
+  _M_extract_name(iter_type __beg, iter_type __end,
+ __tm_small_int __member,
  const _CharT** __names, size_t __indexlen,
  ios_base __io, ios_base::iostate __err) const;
 
Index: include/bits/locale_facets_nonio.tcc
===
--- include/bits/locale_facets_nonio.tcc(revision 216243)
+++ include/bits/locale_facets_nonio.tcc(working copy)
@@ -796,7 +796,7 @@ namespace std _GLIBCXX_VISIBILITY(defaul
  // Timezone info.
  if (__ctype.is(ctype_base::upper, *__beg))
{
- int __tmp;
+ __tm_small_int __tmp;
  __beg = _M_extract_name(__beg, __end, __tmp,
   __timepunct_cache_CharT::_S_timezones,
  14, __io, __tmperr);
@@ -837,9 +837,10 @@ namespace std _GLIBCXX_VISIBILITY(defaul
 }
 
   templatetypename _CharT, typename _InIter
+  templatetypename _Member_t
 _InIter
 time_get_CharT, _InIter::
-_M_extract_num(iter_type __beg, iter_type __end, int __member,
+_M_extract_num(iter_type __beg, iter_type __end, _Member_t __member,
   int __min, int __max, size_t __len,
   ios_base __io, ios_base::iostate __err) const
 {
@@ -882,7 +883,7 @@ namespace std _GLIBCXX_VISIBILITY(defaul
   templatetypename _CharT, typename _InIter
 _InIter
 time_get_CharT, _InIter::
-_M_extract_name(iter_type __beg, iter_type __end, int __member,
+_M_extract_name(iter_type __beg, iter_type __end, __tm_small_int __member,
const _CharT** __names, size_t __indexlen,
ios_base __io, ios_base::iostate __err) const
 {


RFA: Add libstdc++-v3 support for avr 5/7: libstdc++-v3 fix cross testing

2014-10-21 Thread Joern Rennecke
The gdb version check ends up trying to invoke gdb on the target -
not so nice if your target is too small to accomodate gdb in the first place.

I've added a check similar to the one in gdb-test to punt on non-native
targets.
libstdc++-v3:

2013-09-17  Joern Rennecke joern.renne...@embecosm.com

* testsuite/lib/gdb-test.exp (gdb_batch_check): Don't invoke
gdb on cross targets.

Index: lib/gdb-test.exp
===
--- lib/gdb-test.exp(revision 216243)
+++ lib/gdb-test.exp(working copy)
@@ -229,6 +229,8 @@
 
 # Invoke gdb with a command and pattern-match the output.
 proc gdb_batch_check {command pattern} {
+if { ![isnative] || [is_remote target] } { return 0 }
+
 set gdb_name $::env(GUALITY_GDB_NAME)
 set cmd $gdb_name -nw -nx -quiet -batch -ex \$command\
 send_log Spawning: $cmd\n


RFA: Add libstdc++-v3 support for avr 6/7: Run -frtti tests with -frtti

2014-10-21 Thread Joern Rennecke
We got a couple of tests that assume -frtti; this is the default for
most targets, but not for avr.
libstdc++-v3:

2013-09-24  Joern Rennecke joern.renne...@embecosm.com

* testsuite/18_support/type_info/hash_code.cc (dg-options): Add -frtti.
* testsuite/20_util/shared_ptr/cons/unique_ptr_deleter_ref_2.cc
(dg-options): Likewise.
* testsuite/20_util/shared_ptr/creation/private.cc (dg-options):
Likewise.
* testsuite/20_util/typeindex/hash.cc (dg-options): Likewise.
* testsuite/20_util/typeindex/hash_code.cc (dg-options): Likewise.
* testsuite/experimental/any/observers/type.cc (dg-options): Likewise.

* testsuite/20_util/shared_ptr/creation/58594.cc (dg-options):
Add -frtti.
* testsuite/20_util/typeindex/comparison_operators.cc (dg-options):
Likewise.
* testsuite/20_util/typeindex/name.cc (dg-options): Likewise.
* 
testsuite/23_containers/array/requirements/non_default_constructible.cc 
(dg-options):
Likewise.

Index: testsuite/18_support/type_info/hash_code.cc
===
--- testsuite/18_support/type_info/hash_code.cc (revision 216243)
+++ testsuite/18_support/type_info/hash_code.cc (working copy)
@@ -1,4 +1,4 @@
-// { dg-options -std=gnu++0x }
+// { dg-options -std=gnu++0x -frtti }
 
 // 2010-09-21  Paolo Carlini  paolo.carl...@oracle.com
 //
Index: testsuite/20_util/shared_ptr/cons/unique_ptr_deleter_ref_2.cc
===
--- testsuite/20_util/shared_ptr/cons/unique_ptr_deleter_ref_2.cc   
(revision 216243)
+++ testsuite/20_util/shared_ptr/cons/unique_ptr_deleter_ref_2.cc   
(working copy)
@@ -1,4 +1,4 @@
-// { dg-options -std=gnu++0x }
+// { dg-options -std=gnu++0x -frtti }
 
 // Copyright (C) 2008-2014 Free Software Foundation, Inc.
 //
Index: testsuite/20_util/shared_ptr/creation/58594.cc
===
--- testsuite/20_util/shared_ptr/creation/58594.cc  (revision 216243)
+++ testsuite/20_util/shared_ptr/creation/58594.cc  (working copy)
@@ -1,4 +1,4 @@
-// { dg-options -std=gnu++11 }
+// { dg-options -std=gnu++11 -frtti }
 // { dg-do compile }
 
 // Copyright (C) 2013-2014 Free Software Foundation, Inc.
Index: testsuite/20_util/shared_ptr/creation/private.cc
===
--- testsuite/20_util/shared_ptr/creation/private.cc(revision 216243)
+++ testsuite/20_util/shared_ptr/creation/private.cc(working copy)
@@ -1,4 +1,4 @@
-// { dg-options -std=gnu++0x }
+// { dg-options -std=gnu++0x -frtti }
 
 // Copyright (C) 2011-2014 Free Software Foundation, Inc.
 //
Index: testsuite/20_util/typeindex/comparison_operators.cc
===
--- testsuite/20_util/typeindex/comparison_operators.cc (revision 216243)
+++ testsuite/20_util/typeindex/comparison_operators.cc (working copy)
@@ -1,4 +1,4 @@
-// { dg-options -std=gnu++0x }
+// { dg-options -std=gnu++0x -frtti }
 
 // 2010-09-22  Paolo Carlini  paolo.carl...@oracle.com
 //
Index: testsuite/20_util/typeindex/hash.cc
===
--- testsuite/20_util/typeindex/hash.cc (revision 216243)
+++ testsuite/20_util/typeindex/hash.cc (working copy)
@@ -1,4 +1,4 @@
-// { dg-options -std=gnu++0x }
+// { dg-options -std=gnu++0x -frtti }
 
 // 2010-09-22  Paolo Carlini  paolo.carl...@oracle.com
 //
Index: testsuite/20_util/typeindex/hash_code.cc
===
--- testsuite/20_util/typeindex/hash_code.cc(revision 216243)
+++ testsuite/20_util/typeindex/hash_code.cc(working copy)
@@ -1,4 +1,4 @@
-// { dg-options -std=gnu++0x }
+// { dg-options -std=gnu++0x -frtti }
 
 // 2010-09-22  Paolo Carlini  paolo.carl...@oracle.com
 //
Index: testsuite/20_util/typeindex/name.cc
===
--- testsuite/20_util/typeindex/name.cc (revision 216243)
+++ testsuite/20_util/typeindex/name.cc (working copy)
@@ -1,4 +1,4 @@
-// { dg-options -std=gnu++0x }
+// { dg-options -std=gnu++0x -frtti }
 
 // 2010-09-22  Paolo Carlini  paolo.carl...@oracle.com
 //
Index: testsuite/23_containers/array/requirements/non_default_constructible.cc
===
--- testsuite/23_containers/array/requirements/non_default_constructible.cc 
(revision 216243)
+++ testsuite/23_containers/array/requirements/non_default_constructible.cc 
(working copy)
@@ -1,4 +1,4 @@
-// { dg-options -std=gnu++11 }
+// { dg-options -std=gnu++11 -frtti }
 // { dg-do compile }
 
 // Copyright (C) 2012-2014 Free Software Foundation, Inc.
Index: testsuite/experimental/any/observers/type.cc
===
--- testsuite/experimental/any

RFA: Add libstdc++-v3 support for avr 7/7: Add missing qualifier for size_t in a couple of libstdc++-v3 tests

2014-10-21 Thread Joern Rennecke
A couple of tests fail because an unqualified size_t is used.
2014-09-15  Joern Rennecke  joern.renne...@embecosm.com

* libstdc++-v3/testsuite/util/io/prog_bar.cc: Qualify size_t.
* libstdc++-v3/testsuite/util/io/prog_bar.hpp: Likewise.
* libstdc++-v3/testsuite/util/io/verified_cmd_line_input.hpp: Likewise,

Index: testsuite/util/io/prog_bar.cc
===
--- testsuite/util/io/prog_bar.cc   (revision 216243)
+++ testsuite/util/io/prog_bar.cc   (working copy)
@@ -41,7 +41,7 @@
   namespace test
   {
 prog_bar::
-prog_bar(size_t max, std::ostream r_os, bool display/*= true*/) :
+prog_bar(std::size_t max, std::ostream r_os, bool display/*= true*/) :
   m_cur(0),
   m_max(max),
   m_cur_disp(0),
Index: testsuite/util/io/prog_bar.hpp
===
--- testsuite/util/io/prog_bar.hpp  (revision 216243)
+++ testsuite/util/io/prog_bar.hpp  (working copy)
@@ -57,7 +57,7 @@
   enum{num_disp = 40};
 
 public:
-  prog_bar(size_t max, std::ostream r_os, bool display = true);
+  prog_bar(std::size_t max, std::ostream r_os, bool display = true);
 
   void
   inc();
@@ -69,10 +69,10 @@
   operator=(const prog_bar );
 
 private:
-  size_t m_cur;
-  const size_t m_max;
+  std::size_t m_cur;
+  const std::size_t m_max;
 
-  size_t m_cur_disp;
+  std::size_t m_cur_disp;
 
   std::ostream m_r_os;
 
Index: testsuite/util/io/verified_cmd_line_input.hpp
===
--- testsuite/util/io/verified_cmd_line_input.hpp   (revision 216243)
+++ testsuite/util/io/verified_cmd_line_input.hpp   (working copy)
@@ -45,7 +45,7 @@
   namespace test
   {
 void
-verify_argc(size_t given, size_t required);
+verify_argc(std::size_t given, std::size_t required);
 
 void
 verify_prob(double prob);
@@ -56,7 +56,7 @@
 double
 get_cmd_line_prob(int argc, char* a_p_argv[], int argn);
 
-size_t
+std::size_t
 get_cmd_line_size(int argc, char* a_p_argv[], int argn);
 
 bool


Re: RFA: Add libstdc++-v3 support for avr 4/7: fix locale_facets_nonio overloading on struct tm using template

2014-10-21 Thread Joern Rennecke
On 21 October 2014 17:29, Jonathan Wakely jwak...@redhat.com wrote:
  +typedef typeof (((tm*)0)-tm_sec) __tm_small_int;


 I think this should probably use __typeof__ to work with
 -Wpedantic-errors

Ok, makes sense, and it's a straightforward change.


 +#else /* For 100% mangling compatibility, use int directly.  */
 +#define __tm_small_int int
 +#endif


 I'd prefer to always use a typedef, which can be a private member of
 std::time_get, instead of defining a macro (even a macro using a
 reserved name).

Is the typedef mangling compatible with the original int type?

   /**
*  @brief  Time format ordering data.
*  @ingroup locales
 @@ -654,14 +663,16 @@ namespace std _GLIBCXX_VISIBILITY(defaul
   ios_base::iostate __err, tm* __tm) const;

   // Extract numeric component of length __len.
 +  template typename _Member_t
   iter_type
 -  _M_extract_num(iter_type __beg, iter_type __end, int __member,
 +  _M_extract_num(iter_type __beg, iter_type __end, _Member_t
 __member,
  int __min, int __max, size_t __len,
  ios_base __io, ios_base::iostate __err) const;


 I think this function is exported from the library, so turning it into
 a template would be an ABI change.

The avr needs both an int and and int8_t __member variant of M_extract_num.
So do template instantiations mangle differently from directly defined
functions?

In that case, what is the preferred solution?  Duplicate the code (with all the
maintenance ugliness that entails)?
Or convert the function definition into an uber-ugly macro that is invoked twice
to get what the template implementation denies us - metaprogramming with
mangling compatibility?

Define a templated function with a different name, and then define two
_M_extract_num
overloads as a wrapper?
Does that even work in the case  __tm_small_int is a typedef for int?


Re: RFA: fix mode confusion in caller-save.c:replace_reg_with_saved_mem

2014-10-13 Thread Joern Rennecke
On 13 October 2014 20:43, Jeff Law l...@redhat.com wrote:
...
 I think you want smode in the mode_for_size call rather than mode, right
 (both instances)?

No, nregs is the number of hard registers of regno in mode.  Hence
we must use the
size of mode.
How to choose the mode class is not so clear-cut.  For the code that
went wrong with the
old code, mode and smode are both of MODE_INT.
To get some case where there's a difference, I was thinking of an
architecture that
has partial integer mode registers that can be grouped together as
integral integer mode
registers (e.g. one reg is HImode or PSImode, save_mode would be PSImode,
two regs form SImode).  In that case, you'd want something so that you can piece
together mode, i.e. either GET_MODE_CLASS (mode) or MODE_INT
(which happen to be again the same), but not GET_MODE_CLASS(smode), which would
be MODE_PARTIAL_INT  .


Re: RFA: fix mode confusion in caller-save.c:replace_reg_with_saved_mem

2014-10-11 Thread Joern Rennecke
On 10 October 2014 21:13, Jeff Law l...@redhat.com wrote:
...
 ISTM it would be better to find the mode of the same class that corresponds
 to GET_MODE_SIZE (mode) / nregs.  In your case that's obviously QImode :-)

Like this?
Or did you mean to remove the save_mode[regno] use altogether?  I can
think of arguments for or against, but got no
concrete examples for either.
2014-10-11  Joern Rennecke  joern.renne...@embecosm.com
Jeff Law  l...@redhat.com

* caller-save.c (replace_reg_with_saved_mem): If saved_mode covers
multiple hard registers, use word_mode.

diff --git a/gcc/caller-save.c b/gcc/caller-save.c
index e28facb..31b1a36 100644
--- a/gcc/caller-save.c
+++ b/gcc/caller-save.c
@@ -1158,9 +1158,12 @@ replace_reg_with_saved_mem (rtx *loc,
  }
else
  {
-   gcc_assert (save_mode[regno] != VOIDmode);
-   XVECEXP (mem, 0, i) = gen_rtx_REG (save_mode [regno],
-  regno + i);
+   enum machine_mode smode = save_mode[regno];
+   gcc_assert (smode != VOIDmode);
+   if (hard_regno_nregs [regno][smode]  1)
+ smode = mode_for_size (GET_MODE_SIZE (mode) / nregs,
+GET_MODE_CLASS (mode), 0);
+   XVECEXP (mem, 0, i) = gen_rtx_REG (smode, regno + i);
  }
 }
 


RFA: Fix debug address mode for TARGET_MEM_REF

2014-10-08 Thread Joern Rennecke
Trying to build avr2 strftime of avr-libc ICEs as we are trying to
convert a PSImode address to HImode.  There is no reason to
do this conversion in the first place - it is a case of failing to recognize
the proper address space.

The attached patch fixes this.

Bootstrapped on i686-pc-linux-gnu.

OK to apply?


debug-tgt-mem-patch
Description: Binary data


Re: RFA: Fix debug address mode for TARGET_MEM_REF

2014-10-08 Thread Joern Rennecke
On 8 October 2014 12:02, Richard Biener richard.guent...@gmail.com wrote:
...
 -  if (POINTER_TYPE_P (TREE_TYPE (exp)))
 -   as = TYPE_ADDR_SPACE (TREE_TYPE (TREE_TYPE (exp)));
 -  else
 -   as = ADDR_SPACE_GENERIC;
 -
 +  as = TYPE_ADDR_SPACE (TREE_TYPE (TREE_TYPE (TREE_OPERAND (exp, 0;
op0 = convert_debug_memory_address (targetm.addr_space.address_mode 
 (as),
   op0, as);
if (op0 == NULL_RTX)

 is pre-approved.

Thanks.  manually applied, avr2 strftime build confirmed,
i386-pc-linux-gnu bootstrapped, and checked in.


RFA: AVR: add infrastructure for device packages

2014-10-08 Thread Joern Rennecke
As the steering commitee still hasn't spoken on the maintainership issue,
apparently this still has to go the write-after-approval route.

The purpose of this patch is to make it possible to add support for new
devices (MCUs) to the AVR toolchain, without having to re-build the
entire toolchain.  This capability is desirable because new MCUs are added
fairly frequently.

There are multiple parts of the toolchain involved.
gcc changes multilibbing to key off the new -march option; the -mmcu option
is translated via DRIVER_SELF_SPECS into a -specs option, and the
individual spec files contain the required settings like -march, and various
more detailed settings (some of which are for new options).

binutils provides new relocation and relaxation facilities to allow referring
symbolically to symbol differences and/or I/O addresses.
avr-libc puts the device-specifc header settings in avr/io*.h, and a few
small device-specific likbale functions into a device-specific library.

The other toolchain parts are staged here:
g...@github.com:embecosm/avr-binutils-gdb.git avr-mainline
g...@github.com:embecosm/avr-libc.git avr-libc-embecosm-mainline


Attached is the GCC patch for the basic device package infrastructure.
OK to apply?


I intend to send the patch sets for avrtiny support (modified for device
package support) and for libstdc++-v3 support next, each building on the
previous patch set.  The avrtiny support is also dependent on the
caller-save patch https://gcc.gnu.org/ml/gcc-patches/2014-10/msg00420.html ,
as avr-libc won't build otherwise.
2014-10-08  Joern Rennecke  joern.renne...@embecosm.com

* config/avr/avr.opt (mmcu=): Change to have a string value.
(mn-flash=, mskip-bug, march=, mrmw): New options.
(HeaderInclude): New.
(mmcu=): Remove Var / Init clauses.
* config/avr/avr.h (DRIVER_SELF_SPECS): Translate -mmcu into a
-specs option.
(SYMBOL_FLAG_IO, SYMBOL_FLAG_ADDRESS): Define.
(ASM_OUTPUT_ALIGNED_BSS): Use avr_asm_asm_output_aligned_bss.
(SYMBOL_FLAG_IO_LOW): Define.
(avr_device_to_as, avr_device_to_ld): Don't declare.
(avr_device_to_data_start, avr_device_to_startfiles): Likewise.
(avr_device_to_devicelib, avr_device_to_sp8): Likewise.
(EXTRA_SPEC_FUNCTIONS): Don't define.
(ASM_SPEC): Translate -arch= option to -mmcu= option.
(LINK_SPEC): Translate -arch= option to -m= option.
Don't use device_to_ld / device_to_data_start.
(STARTFILE_SPEC): Now empty.
(ASM_SPEC): Add -%{mrelax: --mlink-relax}.
* config/avr/gen-avr-mmcu-specs.c: New file.
* config/avr/t-avr (gen-avr-mmcu-specs$(build_exeext)): New rule.
(s-device-specs): Likewise.
(GCC_PASSES): Add s-device-specs.
(install-driver): Depend on install-device-specs.
(install-device-specs): New rule.
* config/avr/avr.c (avr_option_override): Look up mcu arch by
avr_arch_index and provide fallback initialization for avr_n_flash.
(varasm.h): #include.
(avr_print_operand) i: Allow SYMBOL_REF with SYMBOL_FLAG_IO;
(avr_handle_addr_attribute, avr_eval_addr_attrib): New functions.
(avr_attribute_table): Add io, address and io_low.
(avr_asm_output_aligned_decl_common): Change type of decl to tree.
Add special handling for symbols with io and/or address attributes.
(avr_asm_asm_output_aligned_bss): New function.
(avr_encode_section_info): Set SYMBOL_FLAG_IO and SYMBOL_FLAG_ADDRESS
as appropriate.  Handle io_low attribute.
(avr_out_sbxx_branch): Handle symbolic io addresses.
(avr_xload_libgcc_p, avr_nonconst_pointer_addrspace): Use
avr_n_flash instead of avr_current_device-n_flash.
(avr_pgm_check_var_decl, avr_insert_attributes): Likewise.
(avr_emit_movmemhi): Likewise.
* config/avr/avr-c.c (avr_cpu_cpp_builtins): Likewise.
Use TARGET_RMW instead of avr_current_device-dev_attributes.
Don't define avr_current_device-macro (that's the specfile's job).
Use TARGET_SKIP_BUG instead of avr_current_device-errata_skip.
* config/avr/avr.c (avr_2word_insn_p): Likewise.
* config/avr/avr.md (*cpse.ne): Likewise.
(movmode): Use avr_eval_addr_attrib.
(cbi): Change constraint for low_io_address_operand operand to i.
(sbi, sbix_branch, sbix_branch_bit7, insv.io, insv.not.io): Likewise.
* config/avr/predicates.md (io_address_operand):
Allow SYMBOL_REF with SYMBOL_FLAG_IO.
(low_io_address_operand): Allow SYMBOL_REF with SYMBOL_FLAG_IO_LOW.
* config/avr/avr-protos.h (avr_asm_output_aligned_decl_common):
Update prototype.
(avr_eval_addr_attrib, avr_asm_asm_output_aligned_bss): Prototype.
* config/avr/genmultilib.awk: Use -march=.
Remove Multilib matches processing.
* config/avr/t-multilib, config/avr/avr-tables.opt

Re: RFA: fix mode confusion in caller-save.c:replace_reg_with_saved_mem

2014-10-07 Thread Joern Rennecke
On 7 October 2014 18:38, Jeff Law l...@redhat.com wrote:
 On 10/06/14 20:57, Joern Rennecke wrote:

 On 6 October 2014 19:58, Jeff Law l...@redhat.com wrote:

 What makes word_mode special here?  ie, why is special casing for
 word_mode
 the right thing to do?


 The patch does not special-case word mode.  The if condition tests if
 smode would
 cover multiple hard registers.
 If that would be the case, smode is replaced with word_mode.

 SO I'll ask another way.  Why do you want to change smode to word_mode?

Because SImode covers four hard registers, wheras the intention is to
have a single
one.

(concatn:SI [
(reg:SI 18 r18)
(reg:SI 19 r19)
(mem/c:QI (plus:HI (reg/f:HI 28 r28)
(const_int 43 [0x2b])) [6  S1 A8])
(mem/c:QI (plus:HI (reg/f:HI 28 r28)
(const_int 44 [0x2c])) [6  S1 A8])
])

(see original post) is invalid RTL, and thuis the cause of the later ICE.


RFA: fix mode confusion in caller-save.c:replace_reg_with_saved_mem

2014-10-06 Thread Joern Rennecke
Investigating an ICE while trying to compile libgcc2.c:__udivmoddi4 for
a new avr variant with different register set/allocation order, I found
replace_reg_with_saved_mem falling over its own nonsense.  The instruction:

(debug_insn 97 96 98 2 (var_location:SI __x2 (mult:SI (lshiftrt:SI
(reg/v:SI 18 r18 [orig:58 q0 ] [58])
(const_int 16 [0x10]))
(reg/v:SI 61 [ __vl ])))
../../gcc/gcc/testsuite/gcc.target/avr/tiny-caller-save.c:67 -1
 (nil))

would be transformed into:

(debug_insn 97 96 98 2 (var_location:SI __x2 (mult:SI (lshiftrt:SI (concatn:SI [
(reg:SI 18 r18)
(reg:SI 19 r19)
(mem/c:QI (plus:HI (reg/f:HI 28 r28)
(const_int 43 [0x2b])) [6  S1 A8])
(mem/c:QI (plus:HI (reg/f:HI 28 r28)
(const_int 44 [0x2c])) [6  S1 A8])
])
(const_int 16 [0x10]))
(reg/v:SI 61 [ __vl ])))
../../gcc/gcc/testsuite/gcc.target/avr/tiny-caller-save.c:67 -1

Note that r18 and r19 inside the concatn are supposed to be single hard
registers, but as the word size of this processor is 8 bit, SImode
extends actually
over four hard registers.  save_mode is SImode because four registers can be
saved/restored at once.

The attached patch fixes the failure by using word_mode if smode would cover
multiple hard registers.
bootstrapped  regtested on i686-pc-linux-gnu.

OK to apply?


caller-save-patch
Description: Binary data


Re: RFA: fix mode confusion in caller-save.c:replace_reg_with_saved_mem

2014-10-06 Thread Joern Rennecke
On 6 October 2014 19:58, Jeff Law l...@redhat.com wrote:
 What makes word_mode special here?  ie, why is special casing for word_mode
 the right thing to do?

The patch does not special-case word mode.  The if condition tests if
smode would
cover multiple hard registers.
If that would be the case, smode is replaced with word_mode.


[wwwdocs]: Mention ARC port contribution in changes.html

2014-07-07 Thread Joern Rennecke
As ARC maintainer, I have applied the appended patch to changes.html.


tmp
Description: Binary data


[wwwdocs]: simplify heading for Arc port contribution intem in changes.html

2014-07-07 Thread Joern Rennecke
I've changed the heading to ARC and put the item in its
proper alpha-sorted position to make it easier to find.


tmp
Description: Binary data


Re: [PING*2][PATCH] Extend mode-switching to support toggle (1/2)

2014-06-10 Thread Joern Rennecke
On 13 May 2014 22:41, Oleg Endo oleg.e...@t-online.de wrote:

 Right.  I was thinking to add FPSCR.SZ mode switching to SH, in order to
 do float vector moves.  For that SZ and PR need to be switched both at
 the same time (only SH4A has both, fpchg and fschg).  So basically I'd
 add another mode entity, which would emit SZ mode changes in addition to
 the PR mode changes.  But then adjacent FPSCR-changing insns could be
 combined ... any idea/suggestion how to accomplish that?

If they are sufficiently adjacent, you can use a peephole2 pattern for this.

I see Cristian's patch addresses this in a different way - keeping size and
precision in the same entity, and emitting toggles as appropriate.

The problem get's a bit more interesting if you have some instruction patterns
that care about one setting but not the other.
Describing this exactly allows lazy code motion to be a bit more lazy, but OTOH
it can make it harder to combine mode switching instructions if you
still want to
do that.


Re: [PING*2][PATCH] Extend mode-switching to support toggle (1/2)

2014-06-10 Thread Joern Rennecke
On 2 June 2014 13:34, Christian Bruel christian.br...@st.com wrote:
 Hello,

 Any feedback for this ? I'd like to commit only when OK for Epiphany.

 Joern, is this new target macro interface OK with you ?

Yes, this interface should allow me to do switches between rounding
and truncating
floating-point modes with an add/subtract immediate.

However, the implentation, as posted, doesn't work - it causes memory
corruption.

It appears to work with the attached amendment patch.

=== gcc Summary ===

# of expected passes82184
# of unexpected failures41
# of unexpected successes   1
# of expected failures  90
# of unresolved testcases   2
# of unsupported tests  1585
/ssd/adapteva/bld-epiphany/gcc/xgcc  version 4.10.0 20140608
(experimental) (Epiphany toolchain (built 20140610))

This is the same as before applying the patch(es).


tmp
Description: Binary data


Re: [PING*2][PATCH] Extend mode-switching to support toggle (1/2)

2014-05-13 Thread Joern Rennecke
On 12 May 2014 23:39, Oleg Endo oleg.e...@t-online.de wrote:

 This is the same as changing/setting the FP modes (PR, SZ) on SH while
 preserving the other FPSCR bits, or did I miss something?

It's more like if you have to control multiple bits at once to get a
specific mode.
Say you have to turn SZ off and PR on.  You you knew that only one bit needs
changing, you can do with one less arithmetic operation.


Re: [PING*2][PATCH] Extend mode-switching to support toggle (1/2)

2014-05-12 Thread Joern Rennecke
On 12 May 2014 10:06, Christian Bruel christian.br...@st.com wrote:
 Just saw the Jeff's approval for the RTL part. Sorry for the crossed answers

 remains the target maintainers.  Joern, Kaz ?

 Many thanks.

 Christian

 On 05/12/2014 10:44 AM, Christian Bruel wrote:
 Hello,

 I'd still wish to ping for the following set of patches. Those changes
 does not impact other targets than SH4 but, as suggested by Joern, I
 have hooked the macros and moved the SH4A specific support to the target
 parts (so a different target can eventually implement other models than
 dual mode).

 Patch2 only does very little restructuring  but if is not interesting
 enough for all targets, patch 1 should not be that intrusive.

 For RTL middle end and (X86, SH, Epiphany) target reviewers,

 Many thanks,

 Christian

 On 04/28/2014 10:08 AM, Christian Bruel wrote:
 Hello,

 I'd like to ping the following patches

 [Hookize mode-switching]
 http://gcc.gnu.org/ml/gcc-patches/2014-04/msg01003.html

 [Add new hooks to support toggle and SH4A fpchg instruction]
 http://gcc.gnu.org/ml/gcc-patches/2014-04/msg01005.html

Sorry, I only saw the first part and thought I' d need to wait till I
see the second part - and I somehow missed that.

I think the previous known mode should be passed to the
TARGET_MODE_EMIT hook - no need to have extra hooks
for toggling, and, as I mentioned earlier, fixating on the toggle is
actually an SH artifact - other ports have multi-way
modes settings that can benefit from knowing the previous mode.


Re: [PING*2][PATCH] Extend mode-switching to support toggle (1/2)

2014-05-12 Thread Joern Rennecke
On 12 May 2014 13:16, Christian Bruel christian.br...@st.com wrote:

 Just for my curiosity, which other targets have multi-way toggling
 support ?

The epiphany has, sort of: you read a control register, AND and/or OR
some mask(s) to the value,
and write it back.
If we knew the previous mode, we might elide and AND or an OR.

I think this is actually quite a common issue.


Re: [PING*2][PATCH] Extend mode-switching to support toggle (1/2)

2014-05-12 Thread Joern Rennecke
On 12 May 2014 13:51, Joern Rennecke joern.renne...@embecosm.com wrote:
 On 12 May 2014 13:16, Christian Bruel christian.br...@st.com wrote:

 Just for my curiosity, which other targets have multi-way toggling
 support ?

 The epiphany has, sort of: you read a control register, AND and/or OR
 some mask(s) to the value,
 and write it back.
 If we knew the previous mode, we might elide and AND or an OR.

 I think this is actually quite a common issue.

P.S.: In some cases, multiple modes input could still be handled if we
knew that certain other modes
don't appear in the input, so a more powerful interface than providing
the previous mode - if known,
is to provide a set of potential predecessor modes.
The case where we don't know anything then obviously is represented as
the full base set.
In the mode switching infrastructure, you can just calculate the union
of the incoming (potential) mode(s) from each incoming edge.


Re: RFA: Fix PR rtl-optimization/60651

2014-04-02 Thread Joern Rennecke
On 28 March 2014 10:20, Eric Botcazou ebotca...@adacore.com wrote:
 However, the first call is for blocks with incoming abnormal edges.
 If these are empty, the change as I wrote it yesterday is fine, but not
 when they are non-empty; in that case, we should indeed insert before the
 first instruction in that block.

 OK, so the issue is specific to empty basic blocks and boils down to inserting
 instructions in a FIFO manner into them.

Actually, the issue also applies to abnormal edges where lcm did leave a set -
but these are rare, and my last patch should handle these properly in any event,
by no longer using the NOTE_INSN_BASIC_BLOCK itself unless the block is
empty.

 This can be archived by finding an insert-before position using NEXT_INSN
 on the basic block head; this amounts to the very same insertion place
 as inserting after the basic block head.  Also, we will continue to set no
 location, and use the same bb, because both add_insn_before and
 add_insn_after (in contradiction to its block comment) will infer the basic
 block from the insn given (in the case for add_insn_before, I assume
 that the basic block doesn't start with a BARRIER - that would be invalid -
 and that the insn it starts with has a valid BLOCK_FOR_INSN setting the
 same way the basic block head has.

 This looks reasonable, but I think that we need more commentary because it's
 not straightforward to understand, so I would:

   1. explicitly state that we enforce an order on the entities in addition to
 the order on priority, both in the code (for example create a 4th paragraph in
 the comment at the top of the file, before More details ...) and in the doc
 as you already did, but ordering the two orders for the sake of clarity:
 first the order on priority then, for the same priority, the order to the
 entities.

Actually, all the patch provides is a partial order, just as I stated.
Providing the strict order you describe would require adding another
loop nesting to the entity/basic block/seginfo loop, and it wouldn't
really be useful for targets.
To order by entity first, then by priority, could be useful for some targets,
so that they can express a dependency chain of mode switching events
to be computed in a single lcm pass without inflating the mode count
(which determines how often we have to invoke the lcm machinery).
However, that would require having separate buckets for each entity for
each  insert_insn_on_edge point.

For epiphany,  EPIPHANY_MSW_ENTITY_FPU_OMNIBUS (for -O0) and
EPIPHANY_MSW_ENTITY_ROUND_KNOWN (used when optimizing)
depend on EPIPHANY_MSW_ENTITY_AND,  EPIPHANY_MSW_ENTITY_OR and
EPIPHANY_MSW_ENTITY_CONFIG.
The latter three only have two modes, an the former two use the
enum attr_fp_mode values, the first of which is FP_MODE_ROUND_UNKNOWN.
That value does not actually appear as a needed mode for these entities, hence
the partial order is sufficient.

EPIPHANY_MSW_ENTITY_FPU_OMNIBUS also depends on EPIPHANY_MSW_ENTITY_OR.

   2. add a line in the head comment of new_seginfo saying that INSN may not be
 a NOTE_BASIC_BLOCK, unless BB is empty.

   3. add a comment above the trick in optimize_mode_switching saying that it
 is both required to implement the FIFO insertion and valid because we know
 that the basic block was initially empty.

Done.

 It's not clear to me whether this is a regression or not, so you'll also need
 to run it by the RMs.

I don't think it's a regression.
2014-04-02  Joern Rennecke  joern.renne...@embecosm.com

gcc:
PR rtl-optimization/60651
* mode-switching.c (optimize_mode_switching): Make sure to emit
sets of a lower numbered entity before sets of a higher numbered
entity to a mode of the same or lower priority.
(new_seginfo): Document and enforce requirement that
NOTE_INSN_BASIC_BLOCK only appears for empty blocks.
* doc/tm.texi.in: Document ordering constraint for emitted mode sets.
* doc/tm.texi: Regenerate.
gcc/testsuite:
PR rtl-optimization/60651
* gcc.target/epiphany/mode-switch.c: New test.

diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index f7024a7..b8ca17e 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -9778,6 +9778,8 @@ for @var{entity}.  For any fixed @var{entity}, 
@code{mode_priority_to_mode}
 Generate one or more insns to set @var{entity} to @var{mode}.
 @var{hard_reg_live} is the set of hard registers live at the point where
 the insn(s) are to be inserted.
+Sets of a lower numbered entity will be emitted before sets of a higher
+numbered entity to a mode of the same or lower priority.
 @end defmac
 
 @node Target Attributes
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 6dcbde4..d793d26 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -7447,6 +7447,8 @@ for @var{entity}.  For any fixed @var{entity}, 
@code{mode_priority_to_mode}
 Generate one or more insns to set @var{entity} to @var{mode}.
 @var{hard_reg_live} is the set of hard registers live

Re: RFA: Fix PR rtl-optimization/60651

2014-04-02 Thread Joern Rennecke
Hmm, the sanity check in new_seginfo caused a boostrap failure
building libjava on x86.
There was a block with CODE_LABEL as basic block head, otherwise empty.


Re: RFA: Fix PR rtl-optimization/60651

2014-04-02 Thread Joern Rennecke
On 2 April 2014 17:34, Joern Rennecke joern.renne...@embecosm.com wrote:
 Hmm, the sanity check in new_seginfo caused a boostrap failure
 building libjava on x86.
 There was a block with CODE_LABEL as basic block head, otherwise empty.

I've added the testcase - and a bit more detail on this issue - in the PR.

I've attached an updated patch, which skips past the CODE_LABEL.
And this one bootstraps on i686-pc-linuc-gnu.
2014-04-02  Joern Rennecke  joern.renne...@embecosm.com

gcc:
PR rtl-optimization/60651
* mode-switching.c (optimize_mode_switching): Make sure to emit
sets of a lower numbered entity before sets of a higher numbered
entity to a mode of the same or lower priority.
When creating a seginfo for a basic block that starts with a code
label, move the insertion point past the code label.
(new_seginfo): Document and enforce requirement that
NOTE_INSN_BASIC_BLOCK only appears for empty blocks.
* doc/tm.texi.in: Document ordering constraint for emitted mode sets.
* doc/tm.texi: Regenerate.
gcc/testsuite:
PR rtl-optimization/60651
* gcc.target/epiphany/mode-switch.c: New test.

diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index f7024a7..b8ca17e 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -9778,6 +9778,8 @@ for @var{entity}.  For any fixed @var{entity}, 
@code{mode_priority_to_mode}
 Generate one or more insns to set @var{entity} to @var{mode}.
 @var{hard_reg_live} is the set of hard registers live at the point where
 the insn(s) are to be inserted.
+Sets of a lower numbered entity will be emitted before sets of a higher
+numbered entity to a mode of the same or lower priority.
 @end defmac
 
 @node Target Attributes
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 6dcbde4..d793d26 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -7447,6 +7447,8 @@ for @var{entity}.  For any fixed @var{entity}, 
@code{mode_priority_to_mode}
 Generate one or more insns to set @var{entity} to @var{mode}.
 @var{hard_reg_live} is the set of hard registers live at the point where
 the insn(s) are to be inserted.
+Sets of a lower numbered entity will be emitted before sets of a higher
+numbered entity to a mode of the same or lower priority.
 @end defmac
 
 @node Target Attributes
diff --git a/gcc/mode-switching.c b/gcc/mode-switching.c
index 88543b2..088156c 100644
--- a/gcc/mode-switching.c
+++ b/gcc/mode-switching.c
@@ -96,12 +96,18 @@ static void make_preds_opaque (basic_block, int);
 
 
 /* This function will allocate a new BBINFO structure, initialized
-   with the MODE, INSN, and basic block BB parameters.  */
+   with the MODE, INSN, and basic block BB parameters.
+   INSN may not be a NOTE_INSN_BASIC_BLOCK, unless it is en empty
+   basic block; that allows us later to insert instructions in a FIFO-like
+   manner.  */
 
 static struct seginfo *
 new_seginfo (int mode, rtx insn, int bb, HARD_REG_SET regs_live)
 {
   struct seginfo *ptr;
+
+  gcc_assert (!NOTE_INSN_BASIC_BLOCK_P (insn)
+ || insn == BB_END (NOTE_BASIC_BLOCK (insn)));
   ptr = XNEW (struct seginfo);
   ptr-mode = mode;
   ptr-insn_ptr = insn;
@@ -534,7 +540,13 @@ optimize_mode_switching (void)
break;
if (e)
  {
-   ptr = new_seginfo (no_mode, BB_HEAD (bb), bb-index, live_now);
+   rtx ins_pos = BB_HEAD (bb);
+   if (LABEL_P (ins_pos))
+ ins_pos = NEXT_INSN (ins_pos);
+   gcc_assert (NOTE_INSN_BASIC_BLOCK_P (ins_pos));
+   if (ins_pos != BB_END (bb))
+ ins_pos = NEXT_INSN (ins_pos);
+   ptr = new_seginfo (no_mode, ins_pos, bb-index, live_now);
add_seginfo (info + bb-index, ptr);
bitmap_clear_bit (transp[bb-index], j);
  }
@@ -733,7 +745,15 @@ optimize_mode_switching (void)
{
  emitted = true;
  if (NOTE_INSN_BASIC_BLOCK_P (ptr-insn_ptr))
-   emit_insn_after (mode_set, ptr-insn_ptr);
+   /* We need to emit the insns in a FIFO-like manner,
+  i.e. the first to be emitted at our insertion
+  point ends up first in the instruction steam.
+  Because we made sure that NOTE_INSN_BASIC_BLOCK is
+  only used for initially empty basic blocks, we
+  can archive this by appending at the end of
+  the block.  */
+   emit_insn_after
+ (mode_set, BB_END (NOTE_BASIC_BLOCK (ptr-insn_ptr)));
  else
emit_insn_before (mode_set, ptr-insn_ptr);
}
--- /dev/null   2014-03-19 18:18:19.244212660 +
+++ b/gcc/testsuite/gcc.target/epiphany/mode-switch.c   2014-03-25 
13:31:41.186140611 +
@@ -0,0 +1,12

Re: RFA: Fix PR rtl-optimization/60651

2014-03-26 Thread Joern Rennecke
On 26 March 2014 08:15, Eric Botcazou ebotca...@adacore.com wrote:
 As described in the PR, this patch fixes a wrong-code bug by making the
 order of emitted mode switching instructions more consistet  predictable.

 I don't understand this change (but I'm not a specialist of mode switching):
 currently the mode setting sequence is always emitted before the insns that
 need it but, with the change, if an insn right after a NOTE_BASIC_BLOCK note
 needs it, if will be emitted either before it (if insn_ptr is the insn) or
 after it (if insn_ptr is the NOTE_BASIC_BLOCK note).

When the seginfo is for an initially empty block, appending the mode
switching instruction at the end is fine.
Now that I'm trying to prove that this is always the case when insn_ptr
is set to a a NOTE_INSN_BASIC_BLOCK, I find that is not actually true.
insn_ptr gets set in new_seginfo, and there are three calls to that function.
The second call is for instructions that themselves need a particular mode,
so these are not basic block heads.  The third call is for and BB_END, and
this is a NOTE_INSN_BASIC_BLOCK exactly iff the block is empty.

However, the first call is for blocks with incoming abnormal edges.
If these are empty, the change as I wrote it yesterday is fine, but not
when they are non-empty; in that case, we should indeed insert before the
first instruction in that block.

This can be archived by finding an insert-before position using NEXT_INSN
on the basic block head; this amounts to the very same insertion place
as inserting after the basic block head.  Also, we will continue to set no
location, and use the same bb, because both add_insn_before and
add_insn_after (in contradiction to its block comment) will infer the basic
block from the insn given (in the case for add_insn_before, I assume
that the basic block doesn't start with a BARRIER - that would be invalid -
and that the insn it starts with has a valid BLOCK_FOR_INSN setting the
same way the basic block head has.

bootstrapped on i686-pc-linux-gnu, regtest in progress.


tmp
Description: Binary data


Re: RFA: Fix PR rtl-optimization/60651

2014-03-26 Thread Joern Rennecke
On 26 March 2014 12:35, Joern Rennecke joern.renne...@embecosm.com wrote:

 bootstrapped on i686-pc-linux-gnu, regtest in progress.

Passed now.


RFA: Fix PR rtl-optimization/60651

2014-03-25 Thread Joern Rennecke
As described in the PR, this patch fixes a wrong-code bug by making the order of
emitted mode switching instructions more consistet  predictable.

Bootstrapped / regtested on i686-pc-linux-gnu.


tmp
Description: Binary data


RFA: Add PchIgnore option property

2014-03-03 Thread Joern Rennecke
I've been looking how to make the precompiled header mechanism allow
me to use the
ARC -misize option (which outputs additional information about gcc's
idea of instruction
addresses for the purpose of branch shortening, to help debugging the
latter) in a
compilation involving precompiled headers.
I can't use TARGET_CHECK_PCH_TARGET_FLAGS for that purpose because
-misize uses its own variable (to save on target_flags bits).
If I wanted to use the TARGET_PCH_VALID_P hook, I'd have to duplicate
lots of code
from default_pch_valid_p, which is intricately tied with the pch
implementation, and
also 'knows' that non-target flags never affect pch.
Moreover, having this extra information encoded in a separate
function, instead of
at the specific option(s) in config/target/target.opt is rather messy.

Therefore, I propose to add a new option property to mark an option that should
be ignored for the purpose of checking pch validity.

The attached patch implements this as PchIgnore.

bootstrapped on i686-pc-linux.gnu.

OK to apply?


pch-ignore-patch
Description: Binary data


RFA: fix compile/pr17906.c / compile/pr35432.c -O3 -g ICEs

2014-02-19 Thread Joern Rennecke
When compiling compile/pr17906.c, compute_frame_pointer_to_fb_displacement
passes the argument pointer to eliminate_regs.  This eliminates it to
the frame pointer,
which later causes and ICE because frame_pointer_needed is not set.

The problem is that ELIMINABLE_REGS in avr.h does not specify a direct
elimination
from the argument pointer to the stack pointer; the attached patch
rectifies that.
Regression tested with the avr simulator.

OK to apply?


fb_offset-fix
Description: Binary data


  1   2   3   4   >