[PATCH] rs6000: Remove useless toc-fusion option

2021-08-31 Thread Kewen.Lin via Gcc-patches
Hi!

Option toc-fusion was intended for Power9 toc fusion previously,
but Power9 doesn't support fusion at all eventually, this patch
is to remove this useless option.

Is it ok for trunk?

BR,
Kewen
-
gcc/ChangeLog:

* config/rs6000/rs6000.opt (-mtoc-fusion): Remove.
---
 gcc/config/rs6000/rs6000.opt | 4 
 1 file changed, 4 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index 0538db387dc..a104ffa6558 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -557,10 +557,6 @@ mpower9-minmax
 Target Undocumented Mask(P9_MINMAX) Var(rs6000_isa_flags)
 Use the new min/max instructions defined in ISA 3.0.
 
-mtoc-fusion
-Target Undocumented Mask(TOC_FUSION) Var(rs6000_isa_flags)
-Fuse medium/large code model toc references with the memory instruction.
-
 mmodulo
 Target Undocumented Mask(MODULO) Var(rs6000_isa_flags)
 Generate the integer modulo instructions.
-- 
2.17.1



[PATCH] rs6000: Fix some issues in rs6000_can_inline_p [PR102059]

2021-08-31 Thread Kewen.Lin via Gcc-patches
Hi!

This patch is to fix the inconsistent behaviors for non-LTO mode
and LTO mode.  As Martin pointed out, currently the function
rs6000_can_inline_p simply makes it inlinable if callee_tree is
NULL, but it's wrong, we should use the command line options
from target_option_default_node as default.  It also replaces
rs6000_isa_flags with the one from target_option_default_node
when caller_tree is NULL as rs6000_isa_flags could probably
change since initialization.

It also extends the scope of the check for the case that callee
has explicit set options, for test case pr102059-2.c inlining can
happen unexpectedly before, it's fixed accordingly.

As Richi/Mike pointed out, some tuning flags like MASK_P8_FUSION
can be neglected for inlining, this patch also exludes them when
the callee is attributed by always_inline.

Bootstrapped and regtested on powerpc64le-linux-gnu Power9.

BR,
Kewen
-
gcc/ChangeLog:

PR ipa/102059
* config/rs6000/rs6000.c (rs6000_can_inline_p): Adjust with
target_option_default_node and consider always_inline_safe flags.

gcc/testsuite/ChangeLog:

PR ipa/102059
* gcc.target/powerpc/pr102059-1.c: New test.
* gcc.target/powerpc/pr102059-2.c: New test.
* gcc.target/powerpc/pr102059-3.c: New test.
* gcc.target/powerpc/pr102059-4.c: New test.
---
 gcc/config/rs6000/rs6000.c| 87 +++--
 gcc/testsuite/gcc.target/powerpc/pr102059-1.c | 24 +
 gcc/testsuite/gcc.target/powerpc/pr102059-2.c | 20 
 gcc/testsuite/gcc.target/powerpc/pr102059-3.c | 95 +++
 gcc/testsuite/gcc.target/powerpc/pr102059-4.c | 22 +
 5 files changed, 221 insertions(+), 27 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr102059-1.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr102059-2.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr102059-3.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr102059-4.c

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 46b8909104e..c2582a3efab 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -25058,45 +25058,78 @@ rs6000_generate_version_dispatcher_body (void *node_p)
 static bool
 rs6000_can_inline_p (tree caller, tree callee)
 {
-  bool ret = false;
   tree caller_tree = DECL_FUNCTION_SPECIFIC_TARGET (caller);
   tree callee_tree = DECL_FUNCTION_SPECIFIC_TARGET (callee);
 
-  /* If the callee has no option attributes, then it is ok to inline.  */
+  /* If the caller/callee has option attributes, then use them.
+ Otherwise, use the command line options.  */
   if (!callee_tree)
-ret = true;
+callee_tree = target_option_default_node;
+  if (!caller_tree)
+caller_tree = target_option_default_node;
+
+  struct cl_target_option *caller_opts = TREE_TARGET_OPTION (caller_tree);
+  struct cl_target_option *callee_opts = TREE_TARGET_OPTION (callee_tree);
+  HOST_WIDE_INT caller_isa = caller_opts->x_rs6000_isa_flags;
+  HOST_WIDE_INT callee_isa = callee_opts->x_rs6000_isa_flags;
+
+  bool always_inline =
+(DECL_DISREGARD_INLINE_LIMITS (callee)
+ && lookup_attribute ("always_inline", DECL_ATTRIBUTES (callee)));
+
+  /* Some flags such as fusion can be tolerated for always inlines.  */
+  unsigned HOST_WIDE_INT always_inline_safe_mask =
+(MASK_P8_FUSION | MASK_P10_FUSION | OPTION_MASK_SAVE_TOC_INDIRECT
+ | OPTION_MASK_P8_FUSION_SIGN | OPTION_MASK_P10_FUSION_LD_CMPI
+ | OPTION_MASK_P10_FUSION_2LOGICAL | OPTION_MASK_P10_FUSION_LOGADD
+ | OPTION_MASK_P10_FUSION_ADDLOG | OPTION_MASK_P10_FUSION_2ADD
+ | OPTION_MASK_PCREL_OPT);
+
+  if (always_inline) {
+caller_isa &= ~always_inline_safe_mask;
+callee_isa &= ~always_inline_safe_mask;
+  }
 
-  else
+  /* The callee's options must be a subset of the caller's options, i.e.
+ a vsx function may inline an altivec function, but a no-vsx function
+ must not inline a vsx function.  */
+  if ((caller_isa & callee_isa) != callee_isa)
 {
-  HOST_WIDE_INT caller_isa;
-  struct cl_target_option *callee_opts = TREE_TARGET_OPTION (callee_tree);
-  HOST_WIDE_INT callee_isa = callee_opts->x_rs6000_isa_flags;
-  HOST_WIDE_INT explicit_isa = callee_opts->x_rs6000_isa_flags_explicit;
-
-  /* If the caller has option attributes, then use them.
-Otherwise, use the command line options.  */
-  if (caller_tree)
-   caller_isa = TREE_TARGET_OPTION (caller_tree)->x_rs6000_isa_flags;
-  else
-   caller_isa = rs6000_isa_flags;
+  if (TARGET_DEBUG_TARGET)
+   fprintf (stderr,
+"rs6000_can_inline_p:, caller %s, callee %s, cannot "
+"inline since callee's options set isn't a subset of "
+"caller's options set.\n",
+get_decl_name (caller), get_decl_name (callee));
+  return false;
+}
 
-  /* The callee's options must be a subset of the caller's options, i.e.
-a vsx function may

Re: [Committed] Fix subreg_promoted_mode breakage on various platforms

2021-08-31 Thread Christophe LYON via Gcc-patches



On 31/08/2021 18:33, Roger Sayle wrote:

My apologies for the inconvenience.  My recent patch to preserve
SUBREG_PROMOTED_VAR_P on (extend:HI (subreg/s:QI (reg:SI))), and other
places in the middle-end, has broken the build on several targets.

The change to convert_modes inadvertently used the same
subreg_promoted_mode idiom for retrieving the mode of a SUBREG_REG
as the existing code just a few lines earlier.  Alas in the meantime,
the original SUBREG gets replaced by one without SUBREG_PROMOTED_VAR_P,
the whole raison-d'etre for my patch, and I'd not realized/noticed
that subreg_promoted_mode asserts for this.  Alas neither the bootstrap
and regression test on x86_64-pc-linux-gnu nor my testing on nvptx-none
must have hit this particular case.  The logic of this transformation
is sound, it's the implementation that's bitten me.

This patch has been committed, after another "make bootstrap" on
x86_64-pc-linux-gnu (just in case), and confirmation/pre-approval
from Jeff Law that this indeed fixes the build failures seen on
several platforms.

My humble apologies again.



Thanks, I confirm it fixes the aarch64 build too. Sorry for the delay.

Christophe




2021-08-31  Roger Sayle  

gcc/ChangeLog
* expr.c (convert_modes): Don't use subreg_promoted_mode on a
SUBREG if it can't be guaranteed to a SUBREG_PROMOTED_VAR_P set.
Instead use the standard (safer) is_a  idiom.

Roger
--




Re: [PATCH] Check the type of mask while generating cond_op in gimple simplication.

2021-08-31 Thread Hongtao Liu via Gcc-patches
On Tue, Aug 31, 2021 at 7:56 PM Richard Biener
 wrote:
>
> On Tue, Aug 31, 2021 at 12:18 PM Hongtao Liu  wrote:
> >
> > On Mon, Aug 30, 2021 at 8:25 PM Richard Biener via Gcc-patches
> >  wrote:
> > >
> > > On Fri, Aug 27, 2021 at 8:53 AM liuhongt  wrote:
> > > >
> > > >   When gimple simplifcation try to combine op and vec_cond_expr to 
> > > > cond_op,
> > > > it doesn't check if mask type matches. It causes an ICE when expand 
> > > > cond_op
> > > > with mismatched mode.
> > > >   This patch add a function named 
> > > > cond_vectorized_internal_fn_supported_p
> > > >  to additionally check mask type than 
> > > > vectorized_internal_fn_supported_p.
> > > >
> > > >   Bootstrapped and regtested on x86_64-linux-gnu{-m32,}.
> > > >   Ok for trunk?
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > PR middle-end/102080
> > > > * internal-fn.c (cond_vectorized_internal_fn_supported_p): New 
> > > > functions.
> > > > * internal-fn.h (cond_vectorized_internal_fn_supported_p): New 
> > > > declaration.
> > > > * match.pd: Check the type of mask while generating cond_op in
> > > > gimple simplication.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > > PR middle-end/102080
> > > > * gcc.target/i386/pr102080.c: New test.
> > > > ---
> > > >  gcc/internal-fn.c| 22 ++
> > > >  gcc/internal-fn.h|  1 +
> > > >  gcc/match.pd | 24 
> > > >  gcc/testsuite/gcc.target/i386/pr102080.c | 16 
> > > >  4 files changed, 55 insertions(+), 8 deletions(-)
> > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr102080.c
> > > >
> > > > diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
> > > > index 1360a00f0b9..8b2b65db1a7 100644
> > > > --- a/gcc/internal-fn.c
> > > > +++ b/gcc/internal-fn.c
> > > > @@ -4102,6 +4102,28 @@ expand_internal_call (gcall *stmt)
> > > >expand_internal_call (gimple_call_internal_fn (stmt), stmt);
> > > >  }
> > > >
> > > > +/* Check cond_op for vector modes since 
> > > > vectorized_internal_fn_supported_p
> > > > +   doesn't check if mask type matches.  */
> > > > +bool
> > > > +cond_vectorized_internal_fn_supported_p (internal_fn ifn, tree type,
> > > > +tree mask_type)
> > > > +{
> > > > +  if (!vectorized_internal_fn_supported_p (ifn, type))
> > > > +return false;
> > > > +
> > > > +  machine_mode mask_mode;
> > > > +  machine_mode vmode = TYPE_MODE (type);
> > > > +  int size1, size2;
> > > > +  if (VECTOR_MODE_P (vmode)
> > > > +  && targetm.vectorize.get_mask_mode (vmode).exists(&mask_mode)
> > > > +  && GET_MODE_SIZE (mask_mode).is_constant (&size1)
> > > > +  && GET_MODE_SIZE (TYPE_MODE (mask_type)).is_constant (&size2)
> > > > +  && size1 != size2)
> > >
> > > Why do we check for equal size rather than just mode equality which
> > I originally thought  TYPE_MODE of vector(8)  was
> > not QImode, Changed the patch to check mode equality.
> > Update patch.
>
> Looking at all this it seems the match.pd patterns should have not
> used vectorized_internal_fn_supported_p but direct_internal_fn_supported_p
> which is equivalent here because we're always working with vector modes?
>
> And then shouldn't we look at the actual optab whether the mask mode matches
> the expectation rather than going around via the target hook which may not 
> have
> enough context to decide which mask mode to use?
How about this?

+/* Return true if target supports cond_op with data TYPE and
+   mask MASK_TYPE.  */
+bool
+cond_internal_fn_supported_p (internal_fn ifn, tree type,
+   tree mask_type)
+{
+  tree_pair types = tree_pair (type, type);
+  optab tmp = direct_internal_fn_optab (ifn, types);
+  machine_mode vmode = TYPE_MODE (type);
+  insn_code icode = direct_optab_handler (tmp, vmode);
+  if (icode == CODE_FOR_nothing)
+return false;
+
+  machine_mode mask_mode = TYPE_MODE (mask_type);
+  /* Can't create rtx and use insn_operand_matches here.  */
+  return insn_data[icode].operand[0].mode == vmode
+&& insn_data[icode].operand[1].mode == mask_mode;
+}
+

Update patch

>
> In any case if the approach of the patch is correct shouldn't it do
>
>   if (VECTOR_MODE_P (vmode)
>   && (!targetm.vectorize.get_mask_mode (vmode).exists(&mask_mode)
>  || mask_mode != TYPE_MODE (mask_type)))
> return false;
>
> that is, not return true if there's no mask mode for the data mode?
>
> Given the first observation should we call the function
> direct_cond_internal_fn_supported_p () instead and as to the second
> observation, look at the optab operands mode?
>
> Richard.
>
> > > I think would work for non-constant sized modes as well?  And when
> > > using sizes you'd instead use maybe_ne (GET_MODE_SIZE (mask_mode),
> > > GET_MODE_SIZE (TYPE_MODE (mask_type)))
> > >
> > > Thanks,
> > > Richard.
> > >
> > > > +return false;
> > > > +
> > >

Re: [PATCH] Set bound/cmp/control for until wrap loop.

2021-08-31 Thread Jiufu Guo via Gcc-patches



在 2021/9/1 上午11:30, Jiufu Guo via Gcc-patches 写道:

Richard Biener  writes:


On Tue, 31 Aug 2021, guojiufu wrote:


On 2021-08-30 20:02, Richard Biener wrote:
> On Mon, 30 Aug 2021, guojiufu wrote:
> >> On 2021-08-30 14:15, Jiufu Guo wrote:
>> > Hi,
>> >
>> > In patch r12-3136, niter->control, niter->bound and >> > 
niter->cmp are
>> > derived from number_of_iterations_lt.  While for 'until >> > 
wrap condition',
>> > the calculation in number_of_iterations_lt is not align >> > 
the requirements
>> > on the define of them and requirements in >> > 
determine_exit_conditions.

>> >
>> > This patch calculate niter->control, niter->bound and >> > 
niter->cmp in

>> > number_of_iterations_until_wrap.
>> >
>> > The ICEs in the PR are pass with this patch.
>> > Bootstrap and reg-tests pass on ppc64/ppc64le and x86.
>> > Is this ok for trunk?
>> >
>> > BR.
>> > Jiufu Guo
>> >
>> Add ChangeLog:
>> >  create mode 100644 gcc/testsuite/gcc.dg/pr102087.c
>> >
>> > diff --git a/gcc/tree-ssa-loop-niter.c >> > 
b/gcc/tree-ssa-loop-niter.c

>> > index 7af92d1c893..747f04d3ce0 100644
>> > --- a/gcc/tree-ssa-loop-niter.c
>> > +++ b/gcc/tree-ssa-loop-niter.c
>> > @@ -1482,7 +1482,7 @@ number_of_iterations_until_wrap >> > 
(class loop *,

>> > tree type, affine_iv *iv0,
>> >   affine_iv *iv1, class >> >  tree_niter_desc 
*niter)

>> >  {
>> >    tree niter_type = unsigned_type_for (type);
>> > -  tree step, num, assumptions, may_be_zero;
>> > +  tree step, num, assumptions, may_be_zero, span;
>> >    wide_int high, low, max, min;
>> >
>> >    may_be_zero = fold_build2 (LE_EXPR, boolean_type_node, >> 
>    iv1->base,

>> > iv0->base);
>> > @@ -1513,6 +1513,8 @@ number_of_iterations_until_wrap >> > 
(class loop *,

>> > tree type, affine_iv *iv0,
>> >   low = wi::to_wide (iv0->base);
>> >  else
>> > low = min;
>> > +
>> > +  niter->control = *iv1;
>> >  }
>> >    /* {base, -C} < n.  */
>> >    else if (tree_int_cst_sign_bit (iv0->step) && >> >    
integer_zerop

>> > (iv1->step))
>> > @@ -1533,6 +1535,8 @@ number_of_iterations_until_wrap >> > 
(class loop *,

>> > tree type, affine_iv *iv0,
>> >   high = wi::to_wide (iv1->base);
>> >  else
>> > high = max;
>> > +
>> > +  niter->control = *iv0;
>> >  }
>> >    else
>> >  return false;
> > it looks like the above two should already be in effect from > the
> caller (guarding with integer_nozerop)?

I add them just because set these fields in one function.
Yes, they have been set in caller already,  I could remove them here.

> >> > @@ -1556,6 +1560,14 @@ number_of_iterations_until_wrap >> > 
(class loop *,

>> > tree type, affine_iv *iv0,
>> >    niter->assumptions, assumptions);
>> >
>> >    niter->control.no_overflow = false;
>> > +  niter->control.base = fold_build2 (MINUS_EXPR, >> > niter_type,
>> > + niter->control.base,
>> > niter->control.step);
> > how do we know IVn - STEP doesn't already wrap?

The last IV value is just cross the max/min value of the type
at the last iteration,  then IVn - STEP is the nearest value
to max(or min) and not wrap.

> A comment might be
> good to explain you're turning the simplified exit condition > into
> >    { IVbase - STEP, +, STEP } != niter * STEP + (IVbase - >    
STEP)
> > which, when mathematically looking at it makes me wonder why > 
there's

> the seemingly redundant '- STEP' term?  Also is NE_EXPR > really
> correct since STEP might be not 1?  Only for non equality > compares
> the '- STEP' should matter?

I need to add comments for this.  This is a little tricky.
The last value of the original IV just cross max/min at most one STEP,
at there wrapping already happen.
Using "{IVbase, +, STEP} != niter * STEP + IVbase" is not wrong
in the aspect of exit condition.

But this would not work well with existing code:
like determine_exit_conditions, which will convert NE_EXP to
LT_EXPR/GT_EXPR.  And so, the '- STEP' is added to adjust the
IV.base and bound, with '- STEP' the bound will be the last value
just before wrap.


Hmm.  The control IV is documented as

  /* The simplified shape of the exit condition.  The loop exits   if
 CONTROL CMP BOUND is false, where CMP is one of NE_EXPR,
 LT_EXPR, or GT_EXPR, and step of CONTROL is positive if CMP  is
 LE_EXPR and negative if CMP is GE_EXPR.  This information  
is used

 by loop unrolling.  */
  affine_iv control;

but determine_exit_conditions seems to assume the IV does not wrap?


Strictly speaking , I would say yes,  determine_exit_conditions assume
IV does not wrap: there is code:

 if (cmp == LT_EXPR)
   assum = fold_build2 (GE_EXPR, boolean_type_node,
 bound,
 fold_build2 (PLUS_EXPR, type, min, delta));
 else
    
    This means if 'bound' is the value after wrap, the 'assum' with be 
false.

This is also the reason that we may need to biase 'bound' and 'base' by
'step * 1'.  Because, in our case like "while(n
In fact determine_exit_conditio

Re: [PATCH] Fix arm target build with inhibit_libc

2021-08-31 Thread Sebastian Huber

On 30/08/2021 14:01, Sebastian Huber wrote:

Do not declare abort in "libgcc/unwind-arm-common.inc" since it is already
provided by "tsystem.h".  It fixes the following build error:

In file included from libgcc/config/arm/unwind-arm.c:144:
libgcc/unwind-arm-common.inc:55:24: error: macro "abort" passed 1 arguments, 
but takes just 0
55 | extern void abort (void);

libgcc/

* unwind-arm-common.inc (abort): Remove.


Could someone please have a look at this patch. Currently, the arm build 
with inhibit_libc is broken.


--
embedded brains GmbH
Herr Sebastian HUBER
Dornierstr. 4
82178 Puchheim
Germany
email: sebastian.hu...@embedded-brains.de
phone: +49-89-18 94 741 - 16
fax:   +49-89-18 94 741 - 08

Registergericht: Amtsgericht München
Registernummer: HRB 157899
Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler
Unsere Datenschutzerklärung finden Sie hier:
https://embedded-brains.de/datenschutzerklaerung/


Re: [PATCH] Set bound/cmp/control for until wrap loop.

2021-08-31 Thread Jiufu Guo via Gcc-patches

Richard Biener  writes:


On Tue, 31 Aug 2021, guojiufu wrote:


On 2021-08-30 20:02, Richard Biener wrote:
> On Mon, 30 Aug 2021, guojiufu wrote:
> 
>> On 2021-08-30 14:15, Jiufu Guo wrote:

>> > Hi,
>> >
>> > In patch r12-3136, niter->control, niter->bound and 
>> > niter->cmp are
>> > derived from number_of_iterations_lt.  While for 'until 
>> > wrap condition',
>> > the calculation in number_of_iterations_lt is not align 
>> > the requirements
>> > on the define of them and requirements in 
>> > determine_exit_conditions.

>> >
>> > This patch calculate niter->control, niter->bound and 
>> > niter->cmp in

>> > number_of_iterations_until_wrap.
>> >
>> > The ICEs in the PR are pass with this patch.
>> > Bootstrap and reg-tests pass on ppc64/ppc64le and x86.
>> > Is this ok for trunk?
>> >
>> > BR.
>> > Jiufu Guo
>> >
>> Add ChangeLog:
>> >  create mode 100644 gcc/testsuite/gcc.dg/pr102087.c
>> >
>> > diff --git a/gcc/tree-ssa-loop-niter.c 
>> > b/gcc/tree-ssa-loop-niter.c

>> > index 7af92d1c893..747f04d3ce0 100644
>> > --- a/gcc/tree-ssa-loop-niter.c
>> > +++ b/gcc/tree-ssa-loop-niter.c
>> > @@ -1482,7 +1482,7 @@ number_of_iterations_until_wrap 
>> > (class loop *,

>> > tree type, affine_iv *iv0,
>> >   affine_iv *iv1, class 
>> >  tree_niter_desc *niter)

>> >  {
>> >tree niter_type = unsigned_type_for (type);
>> > -  tree step, num, assumptions, may_be_zero;
>> > +  tree step, num, assumptions, may_be_zero, span;
>> >wide_int high, low, max, min;
>> >
>> >may_be_zero = fold_build2 (LE_EXPR, boolean_type_node, 
>> >iv1->base,

>> > iv0->base);
>> > @@ -1513,6 +1513,8 @@ number_of_iterations_until_wrap 
>> > (class loop *,

>> > tree type, affine_iv *iv0,
>> >   low = wi::to_wide (iv0->base);
>> >  else
>> >   low = min;
>> > +
>> > +  niter->control = *iv1;
>> >  }
>> >/* {base, -C} < n.  */
>> >else if (tree_int_cst_sign_bit (iv0->step) && 
>> >integer_zerop

>> > (iv1->step))
>> > @@ -1533,6 +1535,8 @@ number_of_iterations_until_wrap 
>> > (class loop *,

>> > tree type, affine_iv *iv0,
>> >   high = wi::to_wide (iv1->base);
>> >  else
>> >   high = max;
>> > +
>> > +  niter->control = *iv0;
>> >  }
>> >else
>> >  return false;
> 
> it looks like the above two should already be in effect from 
> the

> caller (guarding with integer_nozerop)?

I add them just because set these fields in one function.
Yes, they have been set in caller already,  I could remove them 
here.


> 
>> > @@ -1556,6 +1560,14 @@ number_of_iterations_until_wrap 
>> > (class loop *,

>> > tree type, affine_iv *iv0,
>> >niter->assumptions, assumptions);
>> >
>> >niter->control.no_overflow = false;
>> > +  niter->control.base = fold_build2 (MINUS_EXPR, 
>> > niter_type,

>> > +  niter->control.base,
>> > niter->control.step);
> 
> how do we know IVn - STEP doesn't already wrap?


The last IV value is just cross the max/min value of the type
at the last iteration,  then IVn - STEP is the nearest value
to max(or min) and not wrap.

> A comment might be
> good to explain you're turning the simplified exit condition 
> into
> 
>{ IVbase - STEP, +, STEP } != niter * STEP + (IVbase - 
>STEP)
> 
> which, when mathematically looking at it makes me wonder why 
> there's
> the seemingly redundant '- STEP' term?  Also is NE_EXPR 
> really
> correct since STEP might be not 1?  Only for non equality 
> compares

> the '- STEP' should matter?

I need to add comments for this.  This is a little tricky.
The last value of the original IV just cross max/min at most 
one STEP,

at there wrapping already happen.
Using "{IVbase, +, STEP} != niter * STEP + IVbase" is not wrong
in the aspect of exit condition.

But this would not work well with existing code:
like determine_exit_conditions, which will convert NE_EXP to
LT_EXPR/GT_EXPR.  And so, the '- STEP' is added to adjust the
IV.base and bound, with '- STEP' the bound will be the last 
value

just before wrap.


Hmm.  The control IV is documented as

  /* The simplified shape of the exit condition.  The loop exits 
  if

 CONTROL CMP BOUND is false, where CMP is one of NE_EXPR,
 LT_EXPR, or GT_EXPR, and step of CONTROL is positive if CMP 
 is
 LE_EXPR and negative if CMP is GE_EXPR.  This information 
 is used

 by loop unrolling.  */
  affine_iv control;

but determine_exit_conditions seems to assume the IV does not 
wrap?


Strictly speaking , I would say yes,  determine_exit_conditions 
assume

IV does not wrap: there is code:

 if (cmp == LT_EXPR)
   assum = fold_build2 (GE_EXPR, boolean_type_node,
 bound,
			 fold_build2 (PLUS_EXPR, type, min, 
			 delta));

 else


This means if 'bound' is the value after wrap, the 'assum' with be 
false.
This is also the reason that we may need to biase 'bound' and 
'base' by

'step * 1'.  Because, in our case like "while(nif we set 'bound' as 'iv.base + niter * step', the val

[PATCH] warn for more impossible null pointer tests

2021-08-31 Thread Martin Sebor via Gcc-patches

A Coverity run recently uncovered a latent bug in GCC that GCC should
be able to detect itself: comparing the address of a declared object
for equality to null, similar to:

  int f (void)
  {
int a[2][2];
return &a == 0;
  }

GCC issues -Waddress for this code, but the bug Coverity found was
actually closer to the following:

  int f (void)
  {
int a[2][2];
return a[0] == 0;
  }

where the hapless author (yours truly) meant to compare the value
of a[0][0] (as in r12-3268).

This variant is not diagnosed even though the bug in it is the same
and I'd expect more likely to occur in practice.  (&a[0] == 0 isn't
diagnosed either, though that's a less likely mistake to make).

The attached patch enhances -Waddress to detect this variant along
with a number of other similar instances of the problem, including
comparing the address of array members to null.

Besides these, the patch also issues -Waddress for null equality
tests of pointer-plus expressions such as in:

  int g (int i)
  {
return a[0] + i == 0;
  }

and in C++ more instances of pointers to members.

Testing on x86_64-linux, besides a few benign issues in GCC sources
a regression test, run shows a failure in gcc.dg/Waddress.c.  That's
a test added after GCC for some reason stopped warning for one of
the basic cases that other tools warn about (comparing an array to
null).  I suspect the change was unintentional because GCC still
warns for other very similar expressions.  The reporter who also
submitted the test in pr36299 argued that the warning wasn't
helpful because tests for arrays sometimes come from macros, and
the test was committed after it was noted that GCC no longer warned
for the reporter's simple case.  While it's certainly true that
the warning can be triggered by the null equality tests in macros
(the patch exposed two such instances in GCC) they are easy to
avoid (the patch adds a an additional escape hatch).  At the same
time, as is evident from the Coverity bug report and from the two
issues the enhancement exposes in the FORTRAN front end (even if
benign), issuing the warning in these cases does help find bugs
or mistaken assumptions.  With that, I've changed the test to
expect the restored -Waddress warning instead.

Testing with Glibc exposed a couple of harmless comparisons of
arrays a large macro in vfprintf-internal.c.  I'll submit a fix
to avoid the -Waddress instances if/when this enhancement is
approved.

Testing with Binutils/GDB also turned up a couple of pointless
comparison of arrays to null and a couple of uses in macros that
can be trivially suppressed.

Martin

PS Clang issues a warning for some of the same null pointer tests
the patch diagnoses, including gcc.dg/Waddress.c, except under at
least three different options: some under -Wpointer-bool-conversion,
others under -Wtautological-pointer-compare, and others still under
-Wtautological-compare.

Enhance -Waddress to detect more suspicious expressions.

Resolves:
PR c/102103 - missing warning comparing array address to null


gcc/ChangeLog:

	* doc/invoke.texi (-Waddress): Update.

gcc/c-family/ChangeLog:

	* c-common.c (decl_with_nonnull_addr_p): Handle members.

gcc/c/ChangeLog:

	* c-typeck.c (maybe_warn_for_null_address): New function.
	(build_binary_op): Call it.

gcc/cp/ChangeLog:

	* typeck.c (warn_for_null_address): Enhance.
	(cp_build_binary_op): Call it also for member pointers.

gcc/fortran/ChangeLog:

	* gcc/fortran/array.c: Remove an unnecessary test.
	* gcc/fortran/trans-array.c: Same.

gcc/testsuite/ChangeLog:

	* g++.dg/cpp0x/constexpr-array-ptr10.C: Suppress a valid warning.
	* g++.dg/warn/Wreturn-local-addr-6.C: Correct a cast.
	* gcc.dg/Waddress.c: Expect a warning.
	* c-c++-common/Waddress-3.c: New test.
	* c-c++-common/Waddress-4.c: New test.
	* g++.dg/warn/Waddress-5.C: New test.

diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 017e41537ac..ca3544bd066 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -3393,14 +3393,16 @@ c_wrap_maybe_const (tree expr, bool non_const)
   return expr;
 }
 
-/* Return whether EXPR is a declaration whose address can never be
-   NULL.  */
+/* Return whether EXPR is a declaration whose address can never be NULL.
+   The address of the first struct member could be NULL only if it were
+   accessed through a NULL pointer, and such an access would be invalid.  */
 
 bool
 decl_with_nonnull_addr_p (const_tree expr)
 {
   return (DECL_P (expr)
-	  && (TREE_CODE (expr) == PARM_DECL
+	  && (TREE_CODE (expr) == FIELD_DECL
+	  || TREE_CODE (expr) == PARM_DECL
 	  || TREE_CODE (expr) == LABEL_DECL
 	  || !DECL_WEAK (expr)));
 }
diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
index d9f26d67bd3..d6aa4fe9263 100644
--- a/gcc/c/c-typeck.c
+++ b/gcc/c/c-typeck.c
@@ -11541,6 +11541,78 @@ build_vec_cmp (tree_code code, tree type,
   return build3 (VEC_COND_EXPR, type, cmp, minus_one_vec, zero_vec);
 }
 
+/* Possibly warn about an address of OP never being 

Re: [PATCH] libstdc++-v3: Check for TLS support on mingw

2021-08-31 Thread Jonathan Yong via Gcc-patches

On 8/31/21 9:02 AM, Jonathan Wakely wrote:

It looks like my questions about this patch never got an answer, and
it never got applied.

Could somebody say whether TLS is enabled for native *-*-mingw*
builds? If it is, then we definitely need to add GCC_CHECK_TLS to the
cross-compiler config too.

For a linux-hosted x86_64-w64-mingw32 cross compiler I see TLS is not enabled:

/* Define to 1 if the target supports thread-local storage. */
/* #undef _GLIBCXX_HAVE_TLS */




On Mon, 19 Feb 2018 at 08:59, Hugo Beauzée-Luyssen  wrote:


libstdc++-v3: Check for TLS support on mingw

2018-02-16  Hugo Beauzée-Luyssen  

 * crossconfig.m4: Check for TLS support on mignw
 * configure: regenerate

Index: libstdc++-v3/crossconfig.m4
===
--- libstdc++-v3/crossconfig.m4 (revision 257730)
+++ libstdc++-v3/crossconfig.m4 (working copy)
@@ -197,6 +197,7 @@ case "${host}" in
  GLIBCXX_CHECK_LINKER_FEATURES
  GLIBCXX_CHECK_MATH_SUPPORT
  GLIBCXX_CHECK_STDLIB_SUPPORT
+GCC_CHECK_TLS
  ;;
*-netbsd*)
  SECTION_FLAGS='-ffunction-sections -fdata-sections'


According to MSYS2 native from 
https://mirror.msys2.org/mingw/ucrt64/mingw-w64-ucrt-x86_64-gcc-10.3.0-5-any.pkg.tar.zst:


x86_64-w64-mingw32/bits/c++config.h:#define _GLIBCXX_HAVE_TLS 1

So yes.


OpenPGP_0x713B5FE29C145D45.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH 2/2] RISC-V: Implement TARGET_COMPUTE_MULTILIB

2021-08-31 Thread Jim Wilson
On Tue, Aug 31, 2021 at 5:22 PM Jim Wilson  wrote:

> On Wed, Jul 21, 2021 at 2:28 AM Kito Cheng  wrote:
>
>> Use TARGET_COMPUTE_MULTILIB to search the multi-lib reuse for
>> riscv*-*-elf*,
>> according following rules:
>>
>
> I find the other_cond support a bit confusing.  Is this for -mcmodel
> perhaps?  Why not just say that if so?
>
> match_score:
> weigth -> weight
>
> riscv_multi_lib_info_t::parse
> Calls riscv_subset_list::parse twice when path == ".", the call inside
> the if looks unnecessary.
>
> riscv_multilib_lib_check:
> Can't found -> Can't find
>
> riscv_check_other_cond:
> might got -> might get
>
> riscv_compute_multilib:
> bare-matel -> bare-metal
> decition -> decision
> dection -> decision
>
> It isn't clear how the loop with the comment "ignore march and mabi
> option in cond string" can work.  It looks like it computes other_cond,
> but assumes that there is at most one other_cond, and that it is always
> at the end of the list since otherwise the length won't be computed
> correctly.  But it doesn't check these constraints.  Do you have examples
> showing how this works?
>   And maybe a little better commentary explaining what this loop does to
> make it easier to understand.  It doesn't mention that it computes
> other_cond for instance.
>

Otherwise it looks OK to me.

Jim


Re: [PATCH 2/2] RISC-V: Implement TARGET_COMPUTE_MULTILIB

2021-08-31 Thread Jim Wilson
On Wed, Jul 21, 2021 at 2:28 AM Kito Cheng  wrote:

> Use TARGET_COMPUTE_MULTILIB to search the multi-lib reuse for
> riscv*-*-elf*,
> according following rules:
>

I find the other_cond support a bit confusing.  Is this for -mcmodel
perhaps?  Why not just say that if so?

match_score:
weigth -> weight

riscv_multi_lib_info_t::parse
Calls riscv_subset_list::parse twice when path == ".", the call inside
the if looks unnecessary.

riscv_multilib_lib_check:
Can't found -> Can't find

riscv_check_other_cond:
might got -> might get

riscv_compute_multilib:
bare-matel -> bare-metal
decition -> decision
dection -> decision

It isn't clear how the loop with the comment "ignore march and mabi option
in cond string" can work.  It looks like it computes other_cond, but
assumes that there is at most one other_cond, and that it is always at the
end of the list since otherwise the length won't be computed correctly.
But it doesn't check these constraints.  Do you have examples showing how
this works?
  And maybe a little better commentary explaining what this loop does to
make it easier to understand.  It doesn't mention that it computes
other_cond for instance.

Jim


[PATCH] Add MIPS Linux support to gcc.misc-tests/linkage.c (testsuite/51748)

2021-08-31 Thread apinski--- via Gcc-patches
From: Andrew Pinski 

This adds MIPS Linux support to gcc.misc-tests/linkage.exp.  Basically
copying what was done for MIPS IRIX and changing the options to be correct.

OK?

gcc/testsuite/ChangeLog:

PR testsuite/51748
* gcc.misc-tests/linkage.exp: Add mips*-linux-* support.
---
 gcc/testsuite/gcc.misc-tests/linkage.exp | 12 
 1 file changed, 12 insertions(+)

diff --git a/gcc/testsuite/gcc.misc-tests/linkage.exp 
b/gcc/testsuite/gcc.misc-tests/linkage.exp
index afed2b811c9..2cb109e776e 100644
--- a/gcc/testsuite/gcc.misc-tests/linkage.exp
+++ b/gcc/testsuite/gcc.misc-tests/linkage.exp
@@ -38,6 +38,18 @@ if { [isnative] && ![is_remote host] } then {
 
# Need to ensure ABI for native compiler matches gcc
set native_cflags ""
+   if  [istarget "mips*-linux*"] {
+   set file_string [exec file "linkage-x.o"]
+   if [ string match "*64*" $file_string ] {
+   set native_cflags "-mabi=64"
+   }
+   if [ string match "*ELF 32*" $file_string ] {
+   set native_cflags "-mabi=32"
+   }
+   if [ string match "*N32*" $file_string ] {
+   set native_cflags "-mabi=n32"
+   }
+   }
if  [istarget "sparc*-sun-solaris2*"] {
set file_string [exec file "linkage-x.o"]
if [ string match "*64*" $file_string ] {
-- 
2.17.1



Re: [PATCH 1/2] Add TARGET_COMPUTE_MULTILIB hook to override multi-lib result.

2021-08-31 Thread Jim Wilson
On Wed, Jul 21, 2021 at 2:28 AM Kito Cheng  wrote:

> Create a new hook to let target could override the multi-lib result,
> the motivation is RISC-V might have very complicated multi-lib re-use
> rule*, which is hard to maintain and use current multi-lib scripts,
> we even hit the "argument list too long" error when we tried to add more
> multi-lib reuse rule.
>

This looks OK to me, though I would rewrite the docs a bit.

> +DEFHOOK
> +(compute_multilib,
> + "Some target like RISC-V might have complicated multilib reuse rule
> which is\
> +  hard to implemented on current multilib scheme, this hook allow target
> to\
> +  override the result from built-in multilib mechanism.\
> +  @var{switches} is the raw option list with @var{n_switches} items;\
> +  @var{multilib_dir} is the multi-lib result which compute by the
> built-in\
> +  multi-lib mechanism;\
> +  @var{multilib_defaults} is the default options list for multi-lib; \
> +  @var{multilib_select} is the string contain the list of supported
> multi-lib, \
> +  and the option checking list. \
> +  @var{multilib_matches}, @var{multilib_exclusions}, and
> @var{multilib_reuse} \
> +  are corresponding to @var{MULTILIB_MATCHES}, @var{MULTILIB_EXCLUSIONS} \
> +  @var{MULTILIB_REUSE}. \
> +  The default definition does nothing but return @var{multilib_dir}
> directly.",
>

I'd suggest instead

"Some targets like RISC-V might have complicated multilib reuse rules
which\n\
are hard to implement with the current multilib scheme.  This hook allows\n\
targets to override the result from the built-in multilib mechanism.\n\
@var{switches} is the raw option list with @var{n_switches} items;\n\
@var{multilib_dir} is the multi-lib result which is computed by the
built-in\n\
multi-lib mechanism;\n\
@var{multilib_defaults} is the default options list for multi-lib;\n\
@var{multilib_select} is the string containing the list of supported\n\
multi-libs, and the option checking list.\n\
@var{multilib_matches}, @var{multilib_exclusions}, and
@var{multilib_reuse}\n\
are corresponding to @var{MULTILIB_MATCHES}, @var{MULTILIB_EXCLUSIONS},\n\
and @var{MULTILIB_REUSE}.\n\
The default definition does nothing but return @var{multilib_dir} directly."

Jim


Re: [PATCH] Generate XXSPLTIDP on power10.

2021-08-31 Thread Segher Boessenkool
Hi!

Please do two separate patches.  The first that adds the instruction
(with a bit pattern, i.e. integer, input), and perhaps a second pattern
that has an fp as input and uses it if the constant is valid for the
insn (survives being converted to SP and back to DP (or the other way
around), and is not denormal).  That can be two patches if you want,
but :-)

Having the integer intermediate step not only makes the code hugely less
complicated, but is also allows e.g.

===
typedef unsigned long long v2u64 __attribute__ ((vector_size (16)));
v2u64 f(void)
{
v2u64 x = { 0x8000, 0x8000 };
return x;
}
===

to be optimised properly.

The second part is letting the existing code use such FP (and integer!)
contants.

On Wed, Aug 25, 2021 at 03:46:43PM -0400, Michael Meissner wrote:
> +;; SF/DF/V2DF scalar or vector constant that can be loaded with XXSPLTIDP
> +(define_constraint "eF"
> +  "A vector constant that can be loaded with the XXSPLTIDP instruction."
> +  (match_operand 0 "xxspltidp_operand"))

vector *or float*.  It should allow all vectors, not just FP ones.

> +;; Return 1 if operand is a SF/DF CONST_DOUBLE or V2DF CONST_VECTOR that can 
> be
> +;; loaded via the ISA 3.1 XXSPLTIDP instruction.
> +(define_predicate "xxspltidp_operand"
> +  (match_code "const_double,const_vector,vec_duplicate")
> +{
> +  HOST_WIDE_INT value = 0;
> +  return xxspltidp_constant_p (op, mode, &value);
> +})

Don't do that.  Factor the code properly.  A predicate function should
never have side effects.

Since this is the only place you want to convert the value to its bit
pattern, you should just do that here.

(Btw, initialising the value (although the function always writes it) is
not defensive programming, it is hiding bugs.  IMNSHO :-) )

> +bool
> +xxspltidp_constant_p (rtx op,
> +   machine_mode mode,
> +   HOST_WIDE_INT *constant_ptr)
> +{
> +  *constant_ptr = 0;

And a second time, too!  Don't do either.

> +  if (!TARGET_XXSPLTIDP || !TARGET_PREFIXED || !TARGET_VSX)
> +return false;

This is the wrong place to test these.  It belongs in the caller.

> +  if (CONST_VECTOR_P (op))
> + {
> +   element = CONST_VECTOR_ELT (op, 0);
> +   if (!rtx_equal_p (element, CONST_VECTOR_ELT (op, 1)))
> + return false;
> + }

const_vec_duplicate_p

(But you actually should check if the bit pattern is valid, nothing
more, nothing less).

> +  /* Don't return true for 0.0 since that is easy to create without
> + XXSPLTIDP.  */
> +  if (element == CONST0_RTX (mode))
> +return false;

Don't do that.  Instead have whatever decides what insn to use choose
more directly.

> +/* Whether a permute type instruction is a prefixed instruction.  This is
> +   called from the prefixed attribute processing.  */
> +
> +bool
> +prefixed_permute_p (rtx_insn *insn)

What does this have to do with this patch?

> +{
> +  rtx set = single_set (insn);
> +  if (!set)
> +return false;
> +
> +  rtx dest = SET_DEST (set);
> +  rtx src = SET_SRC (set);
> +  machine_mode mode = GET_MODE (dest);
> +
> +  if (!REG_P (dest) && !SUBREG_P (dest))
> +return false;
> +
> +  switch (mode)
> +{
> +case DFmode:
> +case SFmode:
> +case V2DFmode:
> +  return xxspltidp_operand (src, mode);

??!!??

That is not a permute insn at all.

Perhaps you mean it is executed in the PM pipe on current
implementations (all one of-em).  That does not make it a permute insn.
It is not a good idea to call insns that do not have semantics similar
to permutations "permute".

> @@ -7755,15 +7760,16 @@ (define_insn "movsf_hardfloat"
> @@ -8051,20 +8057,21 @@ (define_insn "*mov_hardfloat32"
> @@ -8091,19 +8098,19 @@ (define_insn "*mov_softfloat32"
> @@ -8125,18 +8132,19 @@ (define_insn "*mov_hardfloat64"
> @@ -8170,6 +8178,7 @@ (define_insn "*mov_softfloat64"

It would be a good idea to merge many of these patterns again.  We can
do this now that we have the "isa" and "enabled" attributes.


Segher


[PATCH] Fix target/101934: aarch64 memset code creates unaligned stores for -mstrict-align

2021-08-31 Thread apinski--- via Gcc-patches
From: Andrew Pinski 

The problem here is the aarch64_expand_setmem code did not check
STRICT_ALIGNMENT if it is creating an overlapping store.
This patch adds that check and the testcase works.

gcc/ChangeLog:

PR target/101934
* config/aarch64/aarch64.c (aarch64_expand_setmem):
Check STRICT_ALIGNMENT before creating an overlapping
store.

gcc/testsuite/ChangeLog:

PR target/101934
* gcc.target/aarch64/memset-strict-align-1.c: New test.
---
 gcc/config/aarch64/aarch64.c  |  4 +--
 .../aarch64/memset-strict-align-1.c   | 28 +++
 2 files changed, 30 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/memset-strict-align-1.c

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 3213585a588..26d59ba1e13 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -23566,8 +23566,8 @@ aarch64_expand_setmem (rtx *operands)
   /* Do certain trailing copies as overlapping if it's going to be
 cheaper.  i.e. less instructions to do so.  For instance doing a 15
 byte copy it's more efficient to do two overlapping 8 byte copies than
-8 + 4 + 2 + 1.  */
-  if (n > 0 && n < copy_limit / 2)
+8 + 4 + 2 + 1.  Only do this when -mstrict-align is not supplied.  */
+  if (n > 0 && n < copy_limit / 2 && !STRICT_ALIGNMENT)
{
  next_mode = smallest_mode_for_size (n, MODE_INT);
  int n_bits = GET_MODE_BITSIZE (next_mode).to_constant ();
diff --git a/gcc/testsuite/gcc.target/aarch64/memset-strict-align-1.c 
b/gcc/testsuite/gcc.target/aarch64/memset-strict-align-1.c
new file mode 100644
index 000..5cdc8a44968
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/memset-strict-align-1.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-Os -mstrict-align" } */
+
+struct s { char x[95]; };
+void foo (struct s *);
+void bar (void) { struct s s1 = {}; foo (&s1); }
+
+/* memset (s1 = {}, sizeof = 95) should be expanded out
+   such that there are no overlap stores when -mstrict-align
+   is in use.
+   so 2 pair 16 bytes stores (64 bytes).
+   1 16 byte stores
+   1 8 byte store
+   1 4 byte store
+   1 2 byte store
+   1 1 byte store
+   */
+
+/* { dg-final { scan-assembler-times "stp\tq" 2 } } */
+/* { dg-final { scan-assembler-times "str\tq" 1 } } */
+/* { dg-final { scan-assembler-times "str\txzr" 1 } } */
+/* { dg-final { scan-assembler-times "str\twzr" 1 } } */
+/* { dg-final { scan-assembler-times "strh\twzr" 1 } } */
+/* { dg-final { scan-assembler-times "strb\twzr" 1 } } */
+
+/* Also one store pair for the frame-pointer and the LR. */
+/* { dg-final { scan-assembler-times "stp\tx" 1 } } */
+
-- 
2.17.1



Re: [PATCH] Generate XXSPLTIDP on power10.

2021-08-31 Thread Segher Boessenkool
Hi!

On Thu, Aug 26, 2021 at 05:28:42PM -0400, Michael Meissner wrote:
> On Thu, Aug 26, 2021 at 02:17:57PM -0500, will schmidt wrote:
> > On Wed, 2021-08-25 at 15:46 -0400, Michael Meissner wrote:
> > > Generate XXSPLTIDP on power10.
> > > 
> > > I have added a temporary switch (-mxxspltidp) to control whether or not 
> > > the
> > > XXSPLTIDP instruction is generated.
> > 
> > How temporary?  
> 
> Until we decide we no longer need to disable the option to do tests.  Probably
> at the end of stage1.

Don't do it at all please.  If it is useful to disable some new strategy
for generating constants, a (temporary or at least undocumented) flag
for that can be handy.  But a flag to disable separate insns is a
liability, it makes the compiler much more fragile, makes changing the
compiler hard because of all the surprises hidden.

> > >   (xxspltidp_operand): New predicate.
> > 
> > Will there ever be another instruction using the SF/DF CONST_DOUBLE  or
> > V2DF CONST_VECTOR ?   I tentatively question the name of the operand,
> > but defer..
> 
> This is the convention I've used for adding other instructions like xxspltib.

The only reason it is a good idea here is because of the strange
behaviour this insn has with single precision subnormals.  In general
a better name here would be something like "sf_as_int_operand".  The
insn should probably not allow anything else than bit patterns, not
floating point constants, have a separate pattern for that (that can
then forward to the integer one).

> This way we have just one place that centralizes the knowledge about the
> instruction.

That one place should be the define_insn for it.


Segher


Re: [PATCH] Preserve SUBREG_PROMOTED_VAR_P on (extend:HI (subreg/s:QI (reg:SI)))

2021-08-31 Thread Rainer Orth
Hi Roger,

> I'm testing the attached patch, but without an aarch64, it'll take a while
> to figure
> out the toolchain to reproduce the failure.  Neither of the platforms I
> tested were
> affected, but I can see it's unsafe to reuse the subreg_promoted_reg idiom 
> from
> just a few lines earlier.  Any help testing the attached patch on an
> affected target
> would be much appreciated.

after reverting the patch that caused PR middle-end/102133 and had
already broken 32-bit sparc bootstrap, sparc was affected again by this
one for a couple of files in 64-bit libgcc.

Fortunately, this patch fixes the build (only tried a minimal
non-bootstrap so far).

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


[pushed] c++: Various small fixes

2021-08-31 Thread Jason Merrill via Gcc-patches
A copy-paste error, a couple of missed checks to guard undefined accesses,
and we don't need to use type_uses_auto to extract the auto node we just
built.

Tested x86_64-pc-linux-gnu, applying to trunk.

gcc/cp/ChangeLog:

* coroutines.cc (flatten_await_stmt): Fix copyo.
* decl.c (reshape_init_class): Simplify.
* module.cc (module_state::read_language): Add null check.
* parser.c (build_range_temp): Avoid type_uses_auto.
(cp_parser_class_specifier_1): Add null check.
---
 gcc/cp/coroutines.cc |  2 +-
 gcc/cp/decl.c|  3 +--
 gcc/cp/module.cc |  2 +-
 gcc/cp/parser.c  | 15 +++
 4 files changed, 10 insertions(+), 12 deletions(-)

diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
index 47c79e58db5..25269d9e51a 100644
--- a/gcc/cp/coroutines.cc
+++ b/gcc/cp/coroutines.cc
@@ -2905,7 +2905,7 @@ flatten_await_stmt (var_nest_node *n, hash_set 
*promoted,
  tree else_cl = COND_EXPR_ELSE (old_expr);
  if (!VOID_TYPE_P (TREE_TYPE (else_cl)))
{
- gcc_checking_assert (TREE_CODE (then_cl) != STATEMENT_LIST);
+ gcc_checking_assert (TREE_CODE (else_cl) != STATEMENT_LIST);
  else_cl
= build2 (init_expr ? INIT_EXPR : MODIFY_EXPR, var_type,
  var, else_cl);
diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 3414cbdc876..e981eadc6dd 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -6563,8 +6563,7 @@ reshape_init_class (tree type, reshape_iter *d, bool 
first_initializer_p,
 continue_:
   if (base_binfo)
{
- BINFO_BASE_ITERATE (binfo, ++binfo_idx, base_binfo);
- if (base_binfo)
+ if (BINFO_BASE_ITERATE (binfo, ++binfo_idx, base_binfo))
field = base_binfo;
  else
field = next_initializable_field (TYPE_FIELDS (type));
diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index ccbde292c22..4b2ad6f3db8 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -17977,7 +17977,7 @@ module_state::read_language (bool outermost)
 
   function_depth++; /* Prevent unexpected GCs.  */
 
-  if (counts[MSC_entities] != entity_num)
+  if (ok && counts[MSC_entities] != entity_num)
 ok = false;
   if (ok && counts[MSC_entities]
   && !read_entities (counts[MSC_entities],
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 1e2a4b121ea..d3c31be0967 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -13474,17 +13474,15 @@ cp_parser_range_for (cp_parser *parser, tree scope, 
tree init, tree range_decl,
 static tree
 build_range_temp (tree range_expr)
 {
-  tree range_type, range_temp;
-
   /* Find out the type deduced by the declaration
  `auto &&__range = range_expr'.  */
-  range_type = cp_build_reference_type (make_auto (), true);
-  range_type = do_auto_deduction (range_type, range_expr,
- type_uses_auto (range_type));
+  tree auto_node = make_auto ();
+  tree range_type = cp_build_reference_type (auto_node, true);
+  range_type = do_auto_deduction (range_type, range_expr, auto_node);
 
   /* Create the __range variable.  */
-  range_temp = build_decl (input_location, VAR_DECL, for_range__identifier,
-  range_type);
+  tree range_temp = build_decl (input_location, VAR_DECL,
+   for_range__identifier, range_type);
   TREE_USED (range_temp) = 1;
   DECL_ARTIFICIAL (range_temp) = 1;
 
@@ -25910,7 +25908,8 @@ cp_parser_class_specifier_1 (cp_parser* parser)
 so that maybe_instantiate_noexcept can tsubst the NOEXCEPT_EXPR
 in the pattern.  */
  for (tree i : DEFPARSE_INSTANTIATIONS (def_parse))
-   DEFERRED_NOEXCEPT_PATTERN (TREE_PURPOSE (i)) = TREE_PURPOSE (spec);
+   DEFERRED_NOEXCEPT_PATTERN (TREE_PURPOSE (i))
+ = spec ? TREE_PURPOSE (spec) : error_mark_node;
 
  /* Restore the state of local_variables_forbidden_p.  */
  parser->local_variables_forbidden_p = local_variables_forbidden_p;

base-commit: e4cb3bb9ac11b4126ffa718287dd509a4b10a658
-- 
2.27.0



Re: [PATCH] C: PR c/79412: Poison decls with error_mark_node after type mismatch

2021-08-31 Thread Joseph Myers
On Tue, 31 Aug 2021, Roger Sayle wrote:

> This patch has been tested on x86_64-pc-linux-gnu with "make bootstrap"
> and "make -k check" with no new failures.  Ok for mainline?

OK, with a space added before '(' in the call to seen_error.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] c++: shortcut bad convs during overload resolution [PR101904]

2021-08-31 Thread Patrick Palka via Gcc-patches
On Mon, 30 Aug 2021, Patrick Palka wrote:

> In the context of overload resolution we have the notion of a "bad"
> argument conversion, which is a conversion that "would be a permitted
> with a bending of the language standards", and we handle such bad
> conversions specially.  In particular, we rank a bad conversion as
> better than no conversion but worse than a good conversion, and a bad
> conversion doesn't necessarily make a candidate unviable.  With the
> flag -fpermissive, we permit the situation where overload resolution
> selects a candidate that contains a bad conversion (which we call a
> non-strictly viable candidate).  And without the flag we issue a
> distinct permerror in this situation instead.
> 
> One consequence of this defacto behavior is that in order to distinguish
> a non-strictly viable candidate from an unviable candidate, if we
> encounter a bad argument conversion during overload resolution we must
> keep converting subsequent arguments because a subsequent conversion
> could render the candidate unviable instead of just non-strictly viable.
> But checking subsequent arguments can force template instantiations and
> result in otherwise avoidable hard errors.  And in particular, all
> 'this' conversions are at worst bad, so this means the const/ref-qualifiers
> of a member function can't be used to prune a candidate quickly, which
> is the subject of the mentioned PR.
> 
> This patch tries to improve the situation without changing the defacto
> output of add_candidates.  Specifically, when considering a candidate
> during overload resolution this patch makes us shortcut argument
> conversion checking upon encountering the first bad conversion
> (tentatively marking the candidate as non-strictly viable, though it
> could ultimately be unviable) under the assumption that we'll eventually
> find a strictly viable candidate anyway (rendering the distinction
> between non-strictly viable and unviable moot, since both are worse
> than a strictly viable candidate).  If this assumption turns out to be
> false, we'll fully reconsider the candidate under the defacto behavior
> (without the shortcutting).
> 
> So in the best case (there's a strictly viable candidate), we avoid
> some argument conversions and/or template argument deduction that may
> cause a hard error.  In the worst case (there's no such candidate), we
> have to redundantly consider some candidates twice.  (In a previous
> version of the patch, to avoid this redundant checking I created a new
> "deferred" conversion type that represents a conversion that is yet to
> be performed, and instead of reconsidering a candidate I just realized
> its deferred conversions.  But it doesn't seem this redundancy is a
> significant performance issue to justify the added complexity of this
> other approach.)
> 
> Lots of care was taken to preserve the defacto behavior w.r.t.
> non-strictly viable candidates, but I wonder how important this behavior
> is nowadays?  Can the notion of a non-strictly viable candidate be done
> away with, or is it here to stay?

To expand on this, as a concrete alternative to this optimistic shortcutting
trick we could maybe recognize non-strictly viable candidates only when
-fpermissive (and just mark them as unviable when not -fpermissive).  IIUC
this would be a backwards compatible change overall -- only diagnostics would
be affected, probably for the better, since we'd explain the rejection reason
for more candidates in the event of overload resolution failure.

Here's a testcase for which such a change would result in better diagnostics:

  struct A {
void f(int, int) const; // #1
void f(int);// #2
  };
  
  int main() {
const A a;
a.f(0);
  }

We currently consider #2 to be a better candidate than #1 because the
bad conversion of the 'this' argument makes it only non-strictly
viable, whereas #1 is considered unviable due to the arity mismatch.
So overload resolution selects #2 and we end up making no mention of #1
in the subsequent diagnostic:

  : In function ‘int main()’:
  :8:8: error: passing ‘const A’ as ‘this’ argument discards qualifiers 
[-fpermissive]
  :3:8: note:   in call to ‘void A::f(int)’

Better would be to explain why neither candidate is a match:

  :8:6: error: no matching function for call to ‘A::f(int) const’
  :2:8: note: candidate: ‘void A::f(int, int) const’
  :2:8: note:   candidate expects 2 arguments, 1 provided
  :3:8: note: candidate: ‘void A::f(int)’
  :3:8: note:   passing ‘const A*’ as ‘this’ argument discards qualifiers


Same for

  void f(int, int);
  void f(int*);
  
  int main() {
f(42);
  }

for which we currently emit

  : In function ‘int main()’:
  :5:5: error: invalid conversion from ‘int’ to ‘int*’ [-fpermissive]
  :2:8: note:   initializing argument 1 of ‘void f(int*)’

instead of

  : In function ‘int main()’:
  :5:4: error: no matching function for call to ‘f(int)’
  :1:6: note: candidate: ‘void f(int, int)’
  :1:6: note:   cand

*PING* [PATCH] PR fortran/93834 - [9/10/11/12 Regression] ICE in trans_caf_is_present, at fortran/trans-intrinsic.c:8469

2021-08-31 Thread Harald Anlauf via Gcc-patches
PING!

> Gesendet: Dienstag, 24. August 2021 um 22:36 Uhr
> Von: "Harald Anlauf" 
> An: "fortran" , "gcc-patches" 
> Betreff: [PATCH] PR fortran/93834 - [9/10/11/12 Regression] ICE in 
> trans_caf_is_present, at fortran/trans-intrinsic.c:8469
>
> Dear Fortranners,
>
> here's a pretty obvious one: we didn't properly check the arguments
> for intrinsics when these had to be ALLOCATABLE and in the case that
> argument was a coarray object.  Simple solution: just reuse a check
> that was used for pointer etc.
>
> Regtested on x86_64-pc-linux-gnu.  OK for mainline / backports?
>
> Thanks,
> Harald
>
>
> Fortran - extend allocatable_check to coarrays
>
> gcc/fortran/ChangeLog:
>
>   PR fortran/93834
>   * check.c (allocatable_check): A coindexed array element is not an
>   allocatable object.
>
> gcc/testsuite/ChangeLog:
>
>   PR fortran/93834
>   * gfortran.dg/coarray_allocated.f90: New test.
>
>


Simplify 'gcc/tree.c:walk_tree_1' handling of 'OMP_CLAUSE' (was: Fix PR 25886. Convert OMP_CLAUSE_* into sub-codes.)

2021-08-31 Thread Thomas Schwinge
Hi!

On 2006-01-25T12:41:14-0500, Diego Novillo  wrote:
> This patch replaces all the OMP_CLAUSE_* tree codes with a single
> OMP_CLAUSE tree with sub-codes.

So, originally all OMP clauses were represented by their own tree codes,
which all had to be enumerated/handled individually.  But, with all these
having been unified into 'OMP_CLAUSE'...

> --- tree.c(revision 110178)
> +++ tree.c(working copy)

..., and given this:

> +/* Number of operands for each OpenMP clause.  */
> +unsigned char omp_clause_num_ops[] =
> +{
> +  0, /* OMP_CLAUSE_ERROR  */
> +  1, /* OMP_CLAUSE_PRIVATE  */
> +  1, /* OMP_CLAUSE_SHARED  */
> +  1, /* OMP_CLAUSE_FIRSTPRIVATE  */
> +  1, /* OMP_CLAUSE_LASTPRIVATE  */
> +  4, /* OMP_CLAUSE_REDUCTION  */
> +  1, /* OMP_CLAUSE_COPYIN  */
> +  1, /* OMP_CLAUSE_COPYPRIVATE  */
> +  1, /* OMP_CLAUSE_IF  */
> +  1, /* OMP_CLAUSE_NUM_THREADS  */
> +  1, /* OMP_CLAUSE_SCHEDULE  */
> +  0, /* OMP_CLAUSE_NOWAIT  */
> +  0, /* OMP_CLAUSE_ORDERED  */
> +  0  /* OMP_CLAUSE_DEFAULT  */
> +};

..., we may simplify this:

> @@ -7303,30 +7433,38 @@ walk_tree (tree *tp, walk_tree_fn func,
>}
>break;
>
> -case OMP_CLAUSE_PRIVATE:
> -[...]
> -case OMP_CLAUSE_SCHEDULE:
> -  WALK_SUBTREE (TREE_OPERAND (*tp, 0));
> -  /* FALLTHRU */
> +case OMP_CLAUSE:
> +  switch (OMP_CLAUSE_CODE (*tp))
> + {
> + case OMP_CLAUSE_PRIVATE:
> +[...]
> + case OMP_CLAUSE_SCHEDULE:
> +   WALK_SUBTREE (OMP_CLAUSE_OPERAND (*tp, 0));
> +   /* FALLTHRU */
>
> -case OMP_CLAUSE_NOWAIT:
> -case OMP_CLAUSE_ORDERED:
> -case OMP_CLAUSE_DEFAULT:
> -  WALK_SUBTREE_TAIL (OMP_CLAUSE_CHAIN (*tp));
> + case OMP_CLAUSE_NOWAIT:
> + case OMP_CLAUSE_ORDERED:
> + case OMP_CLAUSE_DEFAULT:
> +   WALK_SUBTREE_TAIL (OMP_CLAUSE_CHAIN (*tp));
>
> -case OMP_CLAUSE_REDUCTION:
> -  {
> - int i;
> - for (i = 0; i < 4; i++)
> -   WALK_SUBTREE (TREE_OPERAND (*tp, i));
> - WALK_SUBTREE_TAIL (OMP_CLAUSE_CHAIN (*tp));
> -  }
> + case OMP_CLAUSE_REDUCTION:
> +   {
> + int i;
> + for (i = 0; i < 4; i++)
> +   WALK_SUBTREE (OMP_CLAUSE_OPERAND (*tp, i));
> + WALK_SUBTREE_TAIL (OMP_CLAUSE_CHAIN (*tp));
> +   }
> +
> + default:
> +   gcc_unreachable ();
> + }
> +  break;

... considerably?  OK to push to master branch the attached
"Simplify 'gcc/tree.c:walk_tree_1' handling of 'OMP_CLAUSE'"?


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 4a22fd8b55cd1fe6fad1940127d09b30f47c90b2 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 27 Aug 2021 07:49:55 +0200
Subject: [PATCH] Simplify 'gcc/tree.c:walk_tree_1' handling of 'OMP_CLAUSE'

No behavioral change, other than that for a few clauses, operands are now
walked in a different order, and 'OMP_CLAUSE_ERROR' now no longer runs into
'default: gcc_unreachable ();' here (but instead will at some later stage).

Follow-up for r110243 (commit aaf46ef9792bbc562175b606bd1c3f225ea56924)
"Fix PR 25886.  Convert OMP_CLAUSE_* into sub-codes".

	gcc/
	* tree.c (walk_tree_1) : Simplify.
---
 gcc/tree.c | 134 -
 1 file changed, 8 insertions(+), 126 deletions(-)

diff --git a/gcc/tree.c b/gcc/tree.c
index 4c7e03b0f25..99571f8f9b8 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -275,7 +275,7 @@ struct int_n_trees_t int_n_trees [NUM_INT_N_ENTS];
 
 bool tree_contains_struct[MAX_TREE_CODES][64];
 
-/* Number of operands for each OpenMP clause.  */
+/* Number of operands for each OMP clause.  */
 unsigned const char omp_clause_num_ops[] =
 {
   0, /* OMP_CLAUSE_ERROR  */
@@ -10289,7 +10289,7 @@ build_empty_stmt (location_t loc)
 }
 
 
-/* Build an OpenMP clause with code CODE.  LOC is the location of the
+/* Build an OMP clause with code CODE.  LOC is the location of the
clause.  */
 
 tree
@@ -11091,130 +11091,12 @@ walk_tree_1 (tree *tp, walk_tree_fn func, void *data,
   break;
 
 case OMP_CLAUSE:
-  switch (OMP_CLAUSE_CODE (*tp))
-	{
-	case OMP_CLAUSE_GANG:
-	  WALK_SUBTREE (OMP_CLAUSE_OPERAND (*tp, 1));
-	  /* FALLTHRU */
-
-	case OMP_CLAUSE_AFFINITY:
-	case OMP_CLAUSE_ASYNC:
-	case OMP_CLAUSE_WAIT:
-	case OMP_CLAUSE_WORKER:
-	case OMP_CLAUSE_VECTOR:
-	case OMP_CLAUSE_NUM_GANGS:
-	case OMP_CLAUSE_NUM_WORKERS:
-	case OMP_CLAUSE_VECTOR_LENGTH:
-	case OMP_CLAUSE_PRIVATE:
-	case OMP_CLAUSE_SHARED:
-	case OMP_CLAUSE_FIRSTPRIVATE:
-	case OMP_CLAUSE_COPYIN:
-	case OMP_CLAUSE_COPYPRIVATE:
-	case OMP_CLAUSE_FINAL:
-	case OMP_CLAUSE_IF:
-	case OMP_CLAUSE_NUM_THREADS:
-	case OMP_CLAUSE_SCHEDULE:
-	case OMP_CLAUSE_UNIFORM:
-	case OMP_CLAUSE_DEPEND:
-	case OMP_CLAUSE_NONTEMPORAL:
-	case OMP_CLAUSE_NUM_TEAMS:
-	case OMP_CLAUSE_THREAD_LIMIT:
-	case OMP_CLAUSE_DE

Re: [PATCH 2/2]AArch64: Add better costing for vector constants and operations

2021-08-31 Thread Richard Sandiford via Gcc-patches
Tamar Christina  writes:
>> -Original Message-
>> From: Richard Sandiford 
>> Sent: Tuesday, August 31, 2021 5:07 PM
>> To: Tamar Christina 
>> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
>> ; Marcus Shawcroft
>> ; Kyrylo Tkachov 
>> Subject: Re: [PATCH 2/2]AArch64: Add better costing for vector constants
>> and operations
>> 
>> Tamar Christina  writes:
>> >> -Original Message-
>> >> From: Richard Sandiford 
>> >> Sent: Tuesday, August 31, 2021 4:14 PM
>> >> To: Tamar Christina 
>> >> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
>> >> ; Marcus Shawcroft
>> >> ; Kyrylo Tkachov
>> 
>> >> Subject: Re: [PATCH 2/2]AArch64: Add better costing for vector
>> >> constants and operations
>> >>
>> >> Tamar Christina  writes:
>> >> > @@ -13936,8 +13937,65 @@ cost_plus:
>> >> >  mode, MULT, 1, speed);
>> >> >return true;
>> >> >  }
>> >> > +   break;
>> >> > +case PARALLEL:
>> >> > +  /* Fall through */
>> >>
>> >> Which code paths lead to getting a PARALLEL here?
>> >
>> > Hi,
>> >
>> > Thanks for the review!
>> >
>> > I added it for completeness because CSE treats a parallel and
>> > CONST_VECTOR as equivalent when they each entry in the parallel defines
>> a constant.
>> 
>> Could you test whether it ever triggers in practice though?
>> The code would be much simpler without it.
>
> Will check 😊
>
>> 
>> >> > +case CONST_VECTOR:
>> >> > +   {
>> >> > + rtx gen_insn = aarch64_simd_make_constant (x, true);
>> >> > + /* Not a valid const vector.  */
>> >> > + if (!gen_insn)
>> >> > +   break;
>> >> >
>> >> > -  /* Fall through.  */
>> >> > + switch (GET_CODE (gen_insn))
>> >> > + {
>> >> > + case CONST_VECTOR:
>> >> > +   /* Load using MOVI/MVNI.  */
>> >> > +   if (aarch64_simd_valid_immediate (x, NULL))
>> >> > + *cost += extra_cost->vect.movi;
>> >> > +   else /* Load using constant pool.  */
>> >> > + *cost += extra_cost->ldst.load;
>> >> > +   break;
>> >> > + /* Load using a DUP.  */
>> >> > + case VEC_DUPLICATE:
>> >> > +   *cost += extra_cost->vect.dup;
>> >> > +   break;
>> >>
>> >> Does this trigger in practice?  The new check==true path (rightly)
>> >> stops the duplicated element from being forced into a register, but
>> >> then I would have
>> >> expected:
>> >>
>> >> rtx
>> >> gen_vec_duplicate (machine_mode mode, rtx x) {
>> >>   if (valid_for_const_vector_p (mode, x))
>> >> return gen_const_vec_duplicate (mode, x);
>> >>   return gen_rtx_VEC_DUPLICATE (mode, x); }
>> >>
>> >> to generate the original CONST_VECTOR again.
>> >
>> > Yes, but CSE is trying to see whether using a DUP is cheaper than another
>> instruction.
>> > Normal code won't hit this but CSE is just costing all the different
>> > ways one can semantically construct a vector, which RTL actually comes out
>> of it depends on how it's folded as you say.
>> 
>> But what I mean is, you call:
>> 
>>rtx gen_insn = aarch64_simd_make_constant (x, true);
>>/* Not a valid const vector.  */
>>if (!gen_insn)
>>  break;
>> 
>> where aarch64_simd_make_constant does:
>> 
>>   if (CONST_VECTOR_P (vals))
>> const_vec = vals;
>>   else if (GET_CODE (vals) == PARALLEL)
>> {
>>   /* A CONST_VECTOR must contain only CONST_INTs and
>>   CONST_DOUBLEs, but CONSTANT_P allows more (e.g. SYMBOL_REF).
>>   Only store valid constants in a CONST_VECTOR.  */
>>   int n_elts = XVECLEN (vals, 0);
>>   for (i = 0; i < n_elts; ++i)
>>  {
>>rtx x = XVECEXP (vals, 0, i);
>>if (CONST_INT_P (x) || CONST_DOUBLE_P (x))
>>  n_const++;
>>  }
>>   if (n_const == n_elts)
>>  const_vec = gen_rtx_CONST_VECTOR (mode, XVEC (vals, 0));
>> }
>>   else
>> gcc_unreachable ();
>> 
>>   if (const_vec != NULL_RTX
>>   && aarch64_simd_valid_immediate (const_vec, NULL))
>> /* Load using MOVI/MVNI.  */
>> return const_vec;
>>   else if ((const_dup = aarch64_simd_dup_constant (vals, check)) !=
>> NULL_RTX)
>> /* Loaded using DUP.  */
>> return const_dup;
>> 
>> and aarch64_simd_dup_constant does:
>> 
>>   machine_mode mode = GET_MODE (vals);
>>   machine_mode inner_mode = GET_MODE_INNER (mode);
>>   rtx x;
>> 
>>   if (!const_vec_duplicate_p (vals, &x))
>> return NULL_RTX;
>> 
>>   /* We can load this constant by using DUP and a constant in a
>>  single ARM register.  This will be cheaper than a vector
>>  load.  */
>>   if (!check)
>> x = copy_to_mode_reg (inner_mode, x);
>>   return gen_vec_duplicate (mode, x);
>> 
>> For the “check” case, “x” will be a constant, and so gen_vec_duplicate will 
>> call
>> gen_const_vec_duplicate, which will return a CONST_VECTOR.
>> It didn't seem to be possible for gen_insn to be a VEC_DUPLICATE.
>>
>
> Yes, but CSE can ask the cost of a VEC_DUPLICATE directly on a register 

[PATCH] PR fortran/56985 - gcc/fortran/resolve.c:920: "'%s' in cannot appear in COMMON ..."

2021-08-31 Thread Harald Anlauf via Gcc-patches
I intend to commit the fix to the error message using the patch below
within the next 24h unless there are objections or better suggestions.

The unchanged part of the error message is already covered by
gcc/testsuite/gfortran.dg/unlimited_polymorphic_2.f03 and does
not need to be adapted.

Thanks,
Harald


Fortran - improve wording of error message

gcc/fortran/ChangeLog:

PR fortran/56985
* resolve.c (resolve_common_vars): Fix grammar and improve wording
of error message rejecting an unlimited polymorphic in COMMON.

diff --git a/gcc/fortran/resolve.c b/gcc/fortran/resolve.c
index f641d0d4dae..8e5ed1c032c 100644
--- a/gcc/fortran/resolve.c
+++ b/gcc/fortran/resolve.c
@@ -979,7 +979,7 @@ resolve_common_vars (gfc_common_head *common_block, bool named_common)
 	}

   if (UNLIMITED_POLY (csym))
-	gfc_error_now ("%qs in cannot appear in COMMON at %L "
+	gfc_error_now ("%qs at %L cannot appear in COMMON "
 		   "[F2008:C5100]", csym->name, &csym->declared_at);

   if (csym->ts.type != BT_DERIVED)


Re: [PATCH 2/4] libffi: Sync with libffi 3.4.2

2021-08-31 Thread H.J. Lu via Gcc-patches
On Tue, Aug 31, 2021 at 9:32 AM Xi Ruoyao  wrote:
>
> Hi hj,
>
> libffi-3.4.2's new static trampoline feature is known to break some
> downstream packages with some specific use (or misuse?) of libffi,
> unexpected by the libffi developers.  For example
> https://gitlab.gnome.org/GNOME/gjs/-/issues/428.

It looks like a gjs bug.

> I've not use gccgo recently, so I don't know if it might break something
> here.  Just a reminder, if the regtest on x86 and arm (32-bit and 64-
> bit, they are the only platforms where libffi enables static trampoline)
> is OK there should be no problem.

There are no Go regressions on Linux/x86-64 with -m64 and -m32.

>
> On Tue, 2021-08-31 at 08:36 -0700, H.J. Lu via Gcc-patches wrote:
> > Merged commit: f9ea41683444ebe11cfa45b05223899764df28fb
> > ---
> >  libffi/.gitattributes | 4 +
> >  libffi/ChangeLog.libffi   |  7743 +-
> >  libffi/LICENSE| 2 +-
> >  libffi/LICENSE-BUILDTOOLS |   353 +
> >  libffi/MERGE  | 4 +
> >  libffi/Makefile.am|   249 +-
> >  libffi/Makefile.in|  1944 --
> >  libffi/README |   450 -
> >  libffi/README.md  |   495 +
> >  libffi/acinclude.m4   |38 +-
> >  libffi/aclocal.m4 |  1202 -
> >  libffi/configure  | 19411 
> >  libffi/configure.ac   |   199 +-
> >  libffi/configure.host |97 +-
> >  libffi/doc/Makefile.am| 3 +
> >  libffi/doc/libffi.texi|   382 +-
> >  libffi/doc/version.texi   | 8 +-
> >  libffi/fficonfig.h.in |   208 -
> >  libffi/generate-darwin-source-and-headers.py  |   143 +-
> >  libffi/include/Makefile.am| 8 +-
> >  libffi/include/Makefile.in|   565 -
> >  libffi/include/ffi.h.in   |   213 +-
> >  libffi/include/ffi_cfi.h  |21 +
> >  libffi/include/ffi_common.h   |50 +-
> >  libffi/include/tramp.h|45 +
> >  libffi/libffi.map.in  |24 +-
> >  libffi/libffi.pc.in   | 2 +-
> >  libffi/libffi.xcodeproj/project.pbxproj   |   530 +-
> >  libffi/libtool-version|25 +-
> >  libffi/man/Makefile.in|   515 -
> >  libffi/mdate-sh   |   205 -
> >  libffi/msvcc.sh   |   134 +-
> >  libffi/src/aarch64/ffi.c  |   536 +-
> >  libffi/src/aarch64/ffitarget.h|35 +-
> >  libffi/src/aarch64/internal.h |33 +
> >  libffi/src/aarch64/sysv.S |   189 +-
> >  libffi/src/aarch64/win64_armasm.S |   506 +
> >  libffi/src/alpha/ffi.c| 6 +-
> >  libffi/src/arc/ffi.c  | 6 +-
> >  libffi/src/arm/ffi.c  |   380 +-
> >  libffi/src/arm/ffitarget.h|24 +-
> >  libffi/src/arm/internal.h |10 +
> >  libffi/src/arm/sysv.S |   304 +-
> >  libffi/src/arm/sysv_msvc_arm32.S  |   311 +
> >  libffi/src/closures.c |   489 +-
> >  libffi/src/cris/ffi.c | 4 +-
> >  libffi/src/csky/ffi.c |   395 +
> >  libffi/src/csky/ffitarget.h   |63 +
> >  libffi/src/csky/sysv.S|   371 +
> >  libffi/src/dlmalloc.c | 7 +-
> >  libffi/src/frv/ffi.c  | 4 +-
> >  libffi/src/ia64/ffi.c |30 +-
> >  libffi/src/ia64/ffitarget.h   | 3 +-
> >  libffi/src/ia64/unix.S| 9 +-
> >  libffi/src/java_raw_api.c | 6 +-
> >  libffi/src/kvx/asm.h  | 5 +
> >  libffi/src/kvx/ffi.c  |   273 +
> >  libffi/src/kvx/ffitarget.h|75 +
> >  libffi/src/kvx/sysv.S |   127 +
> >  libffi/src/m32r/ffi.c | 2 +-
> >  libffi/src/m68k/ffi.c | 4 +-
> >  libffi/src/m68k/sysv.S|29 +-
> >  libffi/src/m88k/ffi.c | 8 +-
> >  libffi/src/metag/ffi.c|14 +-
> >  libffi/src/microblaze/ffi.c   |10 +-
> >  libffi/src/mips/ffi.c |   146 +-
> >  libffi/src/mips/ffitarget.h   |23 +-
> >  libffi/src/mips/n32.S |   15

[committed] libstdc++: Add valid range checks to std::span constructors [PR98421]

2021-08-31 Thread Jonathan Wakely via Gcc-patches
Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

PR libstdc++/98421
* include/std/span (span(Iter, size_type), span(Iter, Iter)):
Add valid range checks.
* testsuite/23_containers/span/cons_1_assert_neg.cc: New test.
* testsuite/23_containers/span/cons_2_assert_neg.cc: New test.

Tested x86_64-linux. Committed to trunk.

commit ef7becc9c8a48804d3fd9dac032f7b33e561a612
Author: Jonathan Wakely 
Date:   Tue Aug 31 17:34:51 2021

libstdc++: Add valid range checks to std::span constructors [PR98421]

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

PR libstdc++/98421
* include/std/span (span(Iter, size_type), span(Iter, Iter)):
Add valid range checks.
* testsuite/23_containers/span/cons_1_assert_neg.cc: New test.
* testsuite/23_containers/span/cons_2_assert_neg.cc: New test.

diff --git a/libstdc++-v3/include/std/span b/libstdc++-v3/include/std/span
index 21d8f6a43a6..be053e8ef38 100644
--- a/libstdc++-v3/include/std/span
+++ b/libstdc++-v3/include/std/span
@@ -160,6 +160,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
{
  __glibcxx_assert(__count == _Extent);
}
+ __glibcxx_requires_valid_range(__first, __first + __count);
}
 
   template _End>
@@ -175,6 +176,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
{
  __glibcxx_assert((__last - __first) == _Extent);
}
+ __glibcxx_requires_valid_range(__first, __last);
}
 
   template
diff --git a/libstdc++-v3/testsuite/23_containers/span/cons_1_assert_neg.cc 
b/libstdc++-v3/testsuite/23_containers/span/cons_1_assert_neg.cc
new file mode 100644
index 000..2f555125453
--- /dev/null
+++ b/libstdc++-v3/testsuite/23_containers/span/cons_1_assert_neg.cc
@@ -0,0 +1,14 @@
+// { dg-options "-std=gnu++2a" }
+// { dg-do run { xfail *-*-* } }
+// { dg-require-effective-target c++2a }
+
+#undef _GLIBCXX_DEBUG
+#define _GLIBCXX_DEBUG
+#include 
+#include 
+
+int main()
+{
+  std::vector v(2);
+  std::span s(v.begin(), 3);
+}
diff --git a/libstdc++-v3/testsuite/23_containers/span/cons_2_assert_neg.cc 
b/libstdc++-v3/testsuite/23_containers/span/cons_2_assert_neg.cc
new file mode 100644
index 000..efef0e608ba
--- /dev/null
+++ b/libstdc++-v3/testsuite/23_containers/span/cons_2_assert_neg.cc
@@ -0,0 +1,14 @@
+// { dg-options "-std=gnu++2a" }
+// { dg-do run { xfail *-*-* } }
+// { dg-require-effective-target c++2a }
+
+#undef _GLIBCXX_DEBUG
+#define _GLIBCXX_DEBUG
+#include 
+#include 
+
+int main()
+{
+  std::vector v(2), w(1);
+  std::span s(v.begin(), w.end());
+}


Re: [Patch 1/5] OpenACC tile clause support, OMP_CLAUSE_TILE adjustments

2021-08-31 Thread Thomas Schwinge
Hi!

Given this:

On 2016-11-10T18:44:52+0800, Chung-Lin Tang  wrote:
> --- tree.c(revision 241809)
> +++ tree.c(working copy)
> @@ -327,7 +327,7 @@ unsigned const char omp_clause_num_ops[] =

> -  1, /* OMP_CLAUSE_TILE  */
> +  3, /* OMP_CLAUSE_TILE  */

... for this:

> --- tree.h(revision 241809)
> +++ tree.h(working copy)

>  #define OMP_CLAUSE_TILE_LIST(NODE) \
>OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_TILE), 0)
> +#define OMP_CLAUSE_TILE_ITERVAR(NODE) \
> +  OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_TILE), 1)
> +#define OMP_CLAUSE_TILE_COUNT(NODE) \
> +  OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_TILE), 2)

..., we also need to "Fix 'OMP_CLAUSE_TILE' operands handling in
'gcc/tree.c:walk_tree_1'".  In
commit 92dc5d844a2088db79bc4521be3ecb4e2f28 pushed to master branch,
cherry-picked in commit e6880aa976f962ecf78d20b58f7815b585791647 into
releases/gcc-11 branch, in
commit 82631dd97a3762e59bf5b9623f3c8c999aba7d80 into releases/gcc-10
branch, in commit 1514a668b96a9b66539646ec3d2a6ef9c6f39fb2 into
releases/gcc-9 branch, see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 92dc5d844a2088db79bc4521be3ecb4e2f28 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 27 Aug 2021 07:49:35 +0200
Subject: [PATCH] Fix 'OMP_CLAUSE_TILE' operands handling in
 'gcc/tree.c:walk_tree_1'

In r245300 (commit 02889d23ee3b02854dff203dd87b9a25e30b61b4)
"OpenACC tile clause support" that one had changed to three operands,
similar to 'OMP_CLAUSE_COLLAPSE'.

There is no (existing) test case where this seems to matter (likewise
for 'OMP_CLAUSE_COLLAPSE'), but it's good to be consistent.

	gcc/
	* tree.c (walk_tree_1) : Handle three operands.
---
 gcc/tree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/tree.c b/gcc/tree.c
index cba3bca41b3..4c7e03b0f25 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -11166,7 +11166,6 @@ walk_tree_1 (tree *tp, walk_tree_fn func, void *data,
 	case OMP_CLAUSE_BIND:
 	case OMP_CLAUSE_AUTO:
 	case OMP_CLAUSE_SEQ:
-	case OMP_CLAUSE_TILE:
 	case OMP_CLAUSE__SIMT_:
 	case OMP_CLAUSE_IF_PRESENT:
 	case OMP_CLAUSE_FINALIZE:
@@ -11179,6 +11178,7 @@ walk_tree_1 (tree *tp, walk_tree_fn func, void *data,
 	  WALK_SUBTREE_TAIL (OMP_CLAUSE_CHAIN (*tp));
 
 	case OMP_CLAUSE_COLLAPSE:
+	case OMP_CLAUSE_TILE:
 	  {
 	int i;
 	for (i = 0; i < 3; i++)
-- 
2.30.2

>From e6880aa976f962ecf78d20b58f7815b585791647 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 27 Aug 2021 07:49:35 +0200
Subject: [PATCH] Fix 'OMP_CLAUSE_TILE' operands handling in
 'gcc/tree.c:walk_tree_1'

In r245300 (commit 02889d23ee3b02854dff203dd87b9a25e30b61b4)
"OpenACC tile clause support" that one had changed to three operands,
similar to 'OMP_CLAUSE_COLLAPSE'.

There is no (existing) test case where this seems to matter (likewise
for 'OMP_CLAUSE_COLLAPSE'), but it's good to be consistent.

	gcc/
	* tree.c (walk_tree_1) : Handle three operands.

(cherry picked from commit 92dc5d844a2088db79bc4521be3ecb4e2f28)
---
 gcc/tree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/tree.c b/gcc/tree.c
index e0183b73e31..8bc81d66821 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -12299,7 +12299,6 @@ walk_tree_1 (tree *tp, walk_tree_fn func, void *data,
 	case OMP_CLAUSE_BIND:
 	case OMP_CLAUSE_AUTO:
 	case OMP_CLAUSE_SEQ:
-	case OMP_CLAUSE_TILE:
 	case OMP_CLAUSE__SIMT_:
 	case OMP_CLAUSE_IF_PRESENT:
 	case OMP_CLAUSE_FINALIZE:
@@ -12311,6 +12310,7 @@ walk_tree_1 (tree *tp, walk_tree_fn func, void *data,
 	  WALK_SUBTREE_TAIL (OMP_CLAUSE_CHAIN (*tp));
 
 	case OMP_CLAUSE_COLLAPSE:
+	case OMP_CLAUSE_TILE:
 	  {
 	int i;
 	for (i = 0; i < 3; i++)
-- 
2.30.2

>From 82631dd97a3762e59bf5b9623f3c8c999aba7d80 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 27 Aug 2021 07:49:35 +0200
Subject: [PATCH] Fix 'OMP_CLAUSE_TILE' operands handling in
 'gcc/tree.c:walk_tree_1'

In r245300 (commit 02889d23ee3b02854dff203dd87b9a25e30b61b4)
"OpenACC tile clause support" that one had changed to three operands,
similar to 'OMP_CLAUSE_COLLAPSE'.

There is no (existing) test case where this seems to matter (likewise
for 'OMP_CLAUSE_COLLAPSE'), but it's good to be consistent.

	gcc/
	* tree.c (walk_tree_1) : Handle three operands.

(cherry picked from commit 92dc5d844a2088db79bc4521be3ecb4e2f28)
---
 gcc/tree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/tree.c b/gcc/tree.c
index b43bc809823..d82c308f14c 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -12191,7 +12191,6 @@ walk_tree_1 (tree *tp, walk_tree_fn func, void *data,
 	case OMP_CLAUSE_BIND:
 	case OMP_CLAUSE_AUTO:
 	case OMP_CLAUSE_SEQ:
-	case OMP_CLAUSE_TILE:
 	case OMP_CL

[committed] avoid valid Coverity warning for comparing array to zero

2021-08-31 Thread Martin Sebor via Gcc-patches

A typo in maybe_warn_alloc_args_overflow() has it compare the address
of an array for equality to null when it actually means to compare
the value of the array's element.  This is apparently caught by
Converity (and Jason who pointed it out to me -- thanks again).
In r12-3268 I've pushed the change below to fix this.  I'm testing
an enhancement to -Waddress to let GCC detect it as well.

Martin

commit r12-3268-gb3aa3288a958a75744df256d70e7f8e90ccab724
Author: Martin Sebor 
Date:   Tue Aug 31 11:16:37 2021 -0600

Avoid valid Coverity warning for comparing array to zero.

* gimple-ssa-warn-access.cc 
(maybe_warn_alloc_args_overflow): Test

pointer element for equality to zero, not that of the cotaining
array.

Diff:
---
 gcc/gimple-ssa-warn-access.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/gimple-ssa-warn-access.cc b/gcc/gimple-ssa-warn-access.cc
index 5df97a6473a..5a359587ed3 100644
--- a/gcc/gimple-ssa-warn-access.cc
+++ b/gcc/gimple-ssa-warn-access.cc
@@ -2433,7 +2433,7 @@ maybe_warn_alloc_args_overflow (gimple *stmt, 
const tree args[2],

}
 }

-  if (!argrange[0])
+  if (!argrange[0][0])
 return;

   /* For a two-argument alloc_size, validate the product of the two




[committed] libstdc++: Fix broken autoconf check for O_NONBLOCK

2021-08-31 Thread Jonathan Wakely via Gcc-patches

On 26/08/21 12:46 +0100, Jonathan Wakely wrote:

PR libstdc++/100285
* configure.ac: Check for O_NONBLOCK.


This check was broken, oops. Fixed like so.

Tested x86_64-linux. Committed to trunk.


commit 1cacdef0d1a3f587691735d1822d584b68eba593
Author: Jonathan Wakely 
Date:   Tue Aug 31 17:08:00 2021

libstdc++: Fix broken autoconf check for O_NONBLOCK

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* configure.ac: Fix checks for F_GETFL, F_SETFL and O_NONBLOCK.
* configure: Regenerate.

diff --git a/libstdc++-v3/configure.ac b/libstdc++-v3/configure.ac
index d29efa6cb5f..2d68b3672b9 100644
--- a/libstdc++-v3/configure.ac
+++ b/libstdc++-v3/configure.ac
@@ -481,10 +481,10 @@ GLIBCXX_CHECK_FILESYSTEM_DEPS
 
 # For Networking TS.
 AC_CHECK_HEADERS([fcntl.h sys/ioctl.h sys/socket.h sys/uio.h poll.h netdb.h arpa/inet.h netinet/in.h netinet/tcp.h])
-AC_CHECK_DECL(F_GETFL,[],[],[fcntl.h])
-AC_CHECK_DECL(F_SETFL,[],[],[fcntl.h])
-if [ "$ac_cv_have_decl_F_GETFL$ac_cv_have_decl_F_SETFL" = 11 ]; then
-  AC_CHECK_DECL(O_NONBLOCK,[],[],[fcntl.h])
+AC_CHECK_DECL(F_GETFL,,,[#include ])
+AC_CHECK_DECL(F_SETFL,,,[#include ])
+if test "$ac_cv_have_decl_F_GETFL$ac_cv_have_decl_F_SETFL" = yesyes ; then
+  AC_CHECK_DECL(O_NONBLOCK,,,[#include ])
 fi
 
 # For Transactional Memory TS


[committed] libstdc++: Remove redundant noexcept-specifier on definitions

2021-08-31 Thread Jonathan Wakely via Gcc-patches
These destructors are noexcept anyway. I removed the redundant noexcept
from the error_category destructor's declaration in r0-123475, but
didn't remove it from the defaulted definition in system_error.cc. That
causes warnings if the library is built with Clang.

This removes the redundant noexcept from ~error_category and
~system_error and adds tests to ensure they really are noexcept.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* src/c++11/system_error.cc (error_category::~error_category()):
Remove noexcept-specifier.
(system_error::~system_error()): Likewise.
* testsuite/19_diagnostics/error_category/noexcept.cc: New test.
* testsuite/19_diagnostics/system_error/noexcept.cc: New test.

Tested powerpc64le-linux. Committed to trunk.

commit f63e86f797d82772c62a7475dbc6e881727b666f
Author: Jonathan Wakely 
Date:   Tue Aug 31 16:30:01 2021

libstdc++: Remove redundant noexcept-specifier on definitions

These destructors are noexcept anyway. I removed the redundant noexcept
from the error_category destructor's declaration in r0-123475, but
didn't remove it from the defaulted definition in system_error.cc. That
causes warnings if the library is built with Clang.

This removes the redundant noexcept from ~error_category and
~system_error and adds tests to ensure they really are noexcept.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* src/c++11/system_error.cc (error_category::~error_category()):
Remove noexcept-specifier.
(system_error::~system_error()): Likewise.
* testsuite/19_diagnostics/error_category/noexcept.cc: New test.
* testsuite/19_diagnostics/system_error/noexcept.cc: New test.

diff --git a/libstdc++-v3/src/c++11/system_error.cc 
b/libstdc++-v3/src/c++11/system_error.cc
index 23fb6182825..f12290adaee 100644
--- a/libstdc++-v3/src/c++11/system_error.cc
+++ b/libstdc++-v3/src/c++11/system_error.cc
@@ -338,7 +338,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 _GLIBCXX_THROW_OR_ABORT(system_error(error_code(__i, generic_category(;
   }
 
-  error_category::~error_category() noexcept = default;
+  error_category::~error_category() = default;
 
   const error_category&
   _V2::system_category() noexcept { return system_category_instance; }
@@ -346,7 +346,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   const error_category&
   _V2::generic_category() noexcept { return generic_category_instance; }
 
-  system_error::~system_error() noexcept = default;
+  system_error::~system_error() = default;
 
   error_condition
   error_category::default_error_condition(int __i) const noexcept
diff --git a/libstdc++-v3/testsuite/19_diagnostics/error_category/noexcept.cc 
b/libstdc++-v3/testsuite/19_diagnostics/error_category/noexcept.cc
new file mode 100644
index 000..210344c7607
--- /dev/null
+++ b/libstdc++-v3/testsuite/19_diagnostics/error_category/noexcept.cc
@@ -0,0 +1,13 @@
+// { dg-do compile { target c++11 } }
+#include 
+
+extern const std::error_category& cat;
+
+static_assert(std::is_nothrow_destructible::value, "");
+static_assert(noexcept(cat.name()), "");
+static_assert(noexcept(cat.default_error_condition(1)), "");
+static_assert(noexcept(cat.equivalent(1, {})), "");
+static_assert(noexcept(cat.equivalent({}, 1)), "");
+static_assert(noexcept(cat == cat), "");
+static_assert(noexcept(cat != cat), "");
+static_assert(noexcept(cat < cat), "");
diff --git a/libstdc++-v3/testsuite/19_diagnostics/system_error/noexcept.cc 
b/libstdc++-v3/testsuite/19_diagnostics/system_error/noexcept.cc
new file mode 100644
index 000..853b7f922b6
--- /dev/null
+++ b/libstdc++-v3/testsuite/19_diagnostics/system_error/noexcept.cc
@@ -0,0 +1,6 @@
+// { dg-do compile { target c++11 } }
+#include 
+
+static_assert(std::is_nothrow_destructible::value, "");
+static_assert(noexcept(std::declval().code()), "");
+static_assert(noexcept(std::declval().what()), "");


[committed] libstdc++: Add missing return for atomic timed wait [PR102074]

2021-08-31 Thread Jonathan Wakely via Gcc-patches
This adds a missing return statement to the non-futex wait-until
operation.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

PR libstdc++/102074
* include/bits/atomic_timed_wait.h (__timed_waiter_pool)
[!_GLIBCXX_HAVE_PLATFORM_TIMED_WAIT]: Add missing return.

Tested powerpc64le-linux and power-aix and sparc-solaris.

Committed to trunk.

We think there's another issue with return values for the futex case,
which we'll fix separately.

commit 763eb1f19239ebb19c0f87590a4f02300c02c52b
Author: Jonathan Wakely 
Date:   Tue Aug 31 16:50:17 2021

libstdc++: Add missing return for atomic timed wait [PR102074]

This adds a missing return statement to the non-futex wait-until
operation.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

PR libstdc++/102074
* include/bits/atomic_timed_wait.h (__timed_waiter_pool)
[!_GLIBCXX_HAVE_PLATFORM_TIMED_WAIT]: Add missing return.

diff --git a/libstdc++-v3/include/bits/atomic_timed_wait.h 
b/libstdc++-v3/include/bits/atomic_timed_wait.h
index ec7ff51cdbc..3db08f82707 100644
--- a/libstdc++-v3/include/bits/atomic_timed_wait.h
+++ b/libstdc++-v3/include/bits/atomic_timed_wait.h
@@ -213,6 +213,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  lock_guard __l(_M_mtx);
  return __cond_wait_until(_M_cv, _M_mtx, __atime);
}
+ else
+   return true;
 #endif // _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT
}
 };


RE: [PATCH 2/2]AArch64: Add better costing for vector constants and operations

2021-08-31 Thread Tamar Christina via Gcc-patches


> -Original Message-
> From: Richard Sandiford 
> Sent: Tuesday, August 31, 2021 5:07 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
> ; Marcus Shawcroft
> ; Kyrylo Tkachov 
> Subject: Re: [PATCH 2/2]AArch64: Add better costing for vector constants
> and operations
> 
> Tamar Christina  writes:
> >> -Original Message-
> >> From: Richard Sandiford 
> >> Sent: Tuesday, August 31, 2021 4:14 PM
> >> To: Tamar Christina 
> >> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
> >> ; Marcus Shawcroft
> >> ; Kyrylo Tkachov
> 
> >> Subject: Re: [PATCH 2/2]AArch64: Add better costing for vector
> >> constants and operations
> >>
> >> Tamar Christina  writes:
> >> > @@ -13936,8 +13937,65 @@ cost_plus:
> >> >   mode, MULT, 1, speed);
> >> >return true;
> >> >  }
> >> > +break;
> >> > +case PARALLEL:
> >> > +  /* Fall through */
> >>
> >> Which code paths lead to getting a PARALLEL here?
> >
> > Hi,
> >
> > Thanks for the review!
> >
> > I added it for completeness because CSE treats a parallel and
> > CONST_VECTOR as equivalent when they each entry in the parallel defines
> a constant.
> 
> Could you test whether it ever triggers in practice though?
> The code would be much simpler without it.

Will check 😊

> 
> >> > +case CONST_VECTOR:
> >> > +{
> >> > +  rtx gen_insn = aarch64_simd_make_constant (x, true);
> >> > +  /* Not a valid const vector.  */
> >> > +  if (!gen_insn)
> >> > +break;
> >> >
> >> > -  /* Fall through.  */
> >> > +  switch (GET_CODE (gen_insn))
> >> > +  {
> >> > +  case CONST_VECTOR:
> >> > +/* Load using MOVI/MVNI.  */
> >> > +if (aarch64_simd_valid_immediate (x, NULL))
> >> > +  *cost += extra_cost->vect.movi;
> >> > +else /* Load using constant pool.  */
> >> > +  *cost += extra_cost->ldst.load;
> >> > +break;
> >> > +  /* Load using a DUP.  */
> >> > +  case VEC_DUPLICATE:
> >> > +*cost += extra_cost->vect.dup;
> >> > +break;
> >>
> >> Does this trigger in practice?  The new check==true path (rightly)
> >> stops the duplicated element from being forced into a register, but
> >> then I would have
> >> expected:
> >>
> >> rtx
> >> gen_vec_duplicate (machine_mode mode, rtx x) {
> >>   if (valid_for_const_vector_p (mode, x))
> >> return gen_const_vec_duplicate (mode, x);
> >>   return gen_rtx_VEC_DUPLICATE (mode, x); }
> >>
> >> to generate the original CONST_VECTOR again.
> >
> > Yes, but CSE is trying to see whether using a DUP is cheaper than another
> instruction.
> > Normal code won't hit this but CSE is just costing all the different
> > ways one can semantically construct a vector, which RTL actually comes out
> of it depends on how it's folded as you say.
> 
> But what I mean is, you call:
> 
> rtx gen_insn = aarch64_simd_make_constant (x, true);
> /* Not a valid const vector.  */
> if (!gen_insn)
>   break;
> 
> where aarch64_simd_make_constant does:
> 
>   if (CONST_VECTOR_P (vals))
> const_vec = vals;
>   else if (GET_CODE (vals) == PARALLEL)
> {
>   /* A CONST_VECTOR must contain only CONST_INTs and
>CONST_DOUBLEs, but CONSTANT_P allows more (e.g. SYMBOL_REF).
>Only store valid constants in a CONST_VECTOR.  */
>   int n_elts = XVECLEN (vals, 0);
>   for (i = 0; i < n_elts; ++i)
>   {
> rtx x = XVECEXP (vals, 0, i);
> if (CONST_INT_P (x) || CONST_DOUBLE_P (x))
>   n_const++;
>   }
>   if (n_const == n_elts)
>   const_vec = gen_rtx_CONST_VECTOR (mode, XVEC (vals, 0));
> }
>   else
> gcc_unreachable ();
> 
>   if (const_vec != NULL_RTX
>   && aarch64_simd_valid_immediate (const_vec, NULL))
> /* Load using MOVI/MVNI.  */
> return const_vec;
>   else if ((const_dup = aarch64_simd_dup_constant (vals, check)) !=
> NULL_RTX)
> /* Loaded using DUP.  */
> return const_dup;
> 
> and aarch64_simd_dup_constant does:
> 
>   machine_mode mode = GET_MODE (vals);
>   machine_mode inner_mode = GET_MODE_INNER (mode);
>   rtx x;
> 
>   if (!const_vec_duplicate_p (vals, &x))
> return NULL_RTX;
> 
>   /* We can load this constant by using DUP and a constant in a
>  single ARM register.  This will be cheaper than a vector
>  load.  */
>   if (!check)
> x = copy_to_mode_reg (inner_mode, x);
>   return gen_vec_duplicate (mode, x);
> 
> For the “check” case, “x” will be a constant, and so gen_vec_duplicate will 
> call
> gen_const_vec_duplicate, which will return a CONST_VECTOR.
> It didn't seem to be possible for gen_insn to be a VEC_DUPLICATE.
>

Yes, but CSE can ask the cost of a VEC_DUPLICATE directly on a register without 
going through gen_const_vec_duplicate
which is intended as the gen_ functions can have side effects (e.g. creating 
new psuedos etc)

[committed] libstdc++: Improve error handling in Net TS name resolution

2021-08-31 Thread Jonathan Wakely via Gcc-patches
Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* include/experimental/internet (__make_resolver_error_code):
Handle EAI_SYSTEM errors.
(basic_resolver_results): Use __make_resolver_error_code. Use
Glibc NI_MAXHOST and NI_MAXSERV values for buffer sizes.

Tested powerpc64le-linux. Committed to trunk.

commit feec7ef6672bf28d5c79950a21d435533a10710d
Author: Jonathan Wakely 
Date:   Tue Aug 31 13:09:26 2021

libstdc++: Improve error handling in Net TS name resolution

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* include/experimental/internet (__make_resolver_error_code):
Handle EAI_SYSTEM errors.
(basic_resolver_results): Use __make_resolver_error_code. Use
Glibc NI_MAXHOST and NI_MAXSERV values for buffer sizes.

diff --git a/libstdc++-v3/include/experimental/internet 
b/libstdc++-v3/include/experimental/internet
index 6ce070ae775..65c97de07d9 100644
--- a/libstdc++-v3/include/experimental/internet
+++ b/libstdc++-v3/include/experimental/internet
@@ -89,6 +89,12 @@ namespace ip
 host_not_found = EAI_NONAME,
 host_not_found_try_again = EAI_AGAIN,
 service_not_found = EAI_SERVICE
+// N.B. POSIX defines additional errors that have no enumerator here:
+// EAI_BADFLAGS, EAI_FAIL, EAI_FAMILY, EAI_MEMORY, EAI_SOCKTYPE, EAI_SYSTEM
+// Some C libraries define additional errors:
+// EAI_BADHINTS, EAI_OVERFLOW, EAI_PROTOCOL
+// Some C libraries define additional (obsolete?) errors:
+// EAI_ADDRFAMILY, EAI_NODATA
 #endif
   };
 
@@ -117,6 +123,19 @@ namespace ip
   inline error_condition make_error_condition(resolver_errc __e) noexcept
   { return error_condition(static_cast(__e), resolver_category()); }
 
+  /// @cond undocumented
+  inline error_code
+  __make_resolver_error_code(int __ai_err,
+[[__maybe_unused__]] int __sys_err) noexcept
+  {
+#ifdef EAI_SYSTEM
+if (__builtin_expect(__ai_err == EAI_SYSTEM, 0))
+  return error_code(__sys_err, std::generic_category());
+#endif
+return error_code(__ai_err, resolver_category());
+  }
+  /// @endcond
+
   /// @}
 
   using port_type = uint_least16_t;///< Type used for port numbers.
@@ -2011,7 +2030,7 @@ namespace ip
 
   if (int __err = ::getaddrinfo(__h, __s, &__hints, &__sai._M_p))
{
- __ec.assign(__err, resolver_category());
+ __ec = ip::__make_resolver_error_code(__err, errno);
  return;
}
   __ec.clear();
@@ -2040,8 +2059,8 @@ namespace ip
 basic_resolver_results(const endpoint_type& __ep, error_code& __ec)
 {
 #ifdef _GLIBCXX_HAVE_NETDB_H
-  char __host_name[256];
-  char __service_name[128];
+  char __host_name[1025];  // glibc NI_MAXHOST
+  char __service_name[32];  // glibc NI_MAXSERV
   int __flags = 0;
   if (__ep.protocol().type() == SOCK_DGRAM)
__flags |= NI_DGRAM;
@@ -2059,7 +2078,7 @@ namespace ip
__flags);
}
   if (__err)
-   __ec.assign(__err, resolver_category());
+   __ec = ip::__make_resolver_error_code(__err, errno);
   else
{
  __ec.clear();


[committed] libstdc++: Fix ip::tcp::resolver test failure on Solaris

2021-08-31 Thread Jonathan Wakely via Gcc-patches
Solaris 11 does not have "http" in /etc/services, which causes this test
to fail. Try some other services until we find one that works.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* testsuite/experimental/net/internet/resolver/ops/lookup.cc:
Try other service if "http" fails.

Tested powerpc64le-linux and sparc-solaris2.11. Committed to trunk.

commit 48b20d46f9597a4b1e19e0e2d4a0c68d056d7662
Author: Jonathan Wakely 
Date:   Tue Aug 31 13:08:23 2021

libstdc++: Fix ip::tcp::resolver test failure on Solaris

Solaris 11 does not have "http" in /etc/services, which causes this test
to fail. Try some other services until we find one that works.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* testsuite/experimental/net/internet/resolver/ops/lookup.cc:
Try other service if "http" fails.

diff --git 
a/libstdc++-v3/testsuite/experimental/net/internet/resolver/ops/lookup.cc 
b/libstdc++-v3/testsuite/experimental/net/internet/resolver/ops/lookup.cc
index ca8f0899ccd..69be194fa29 100644
--- a/libstdc++-v3/testsuite/experimental/net/internet/resolver/ops/lookup.cc
+++ b/libstdc++-v3/testsuite/experimental/net/internet/resolver/ops/lookup.cc
@@ -30,13 +30,27 @@ test01()
   std::error_code ec;
   io_context ctx;
   ip::tcp::resolver resolv(ctx);
-  auto addrs = resolv.resolve("localhost", "http", ec);
+  auto hostname = "localhost", service = "http";
+  auto addrs = resolv.resolve(hostname, service, ec);
+  if (ec == ip::resolver_errc::service_not_found)
+  {
+// Solaris doesn't have http in /etc/services, try some others.
+for (auto serv : {"ftp", "telnet", "smtp"})
+{
+  addrs = resolv.resolve(hostname, serv, ec);
+  if (!ec)
+  {
+   service = serv;
+   break;
+  }
+}
+  }
   VERIFY( !ec );
   VERIFY( addrs.size() > 0 );
   VERIFY( addrs.begin() != addrs.end() );
   VERIFY( ! addrs.empty() );
 
-  auto addrs2 = resolv.resolve("localhost", "http");
+  auto addrs2 = resolv.resolve(hostname, service);
   VERIFY( addrs == addrs2 );
 }
 
@@ -68,7 +82,7 @@ test02()
 #if __cpp_exceptions
   bool caught = false;
   try {
-resolv.resolve("localhost", "http", flags);
+resolv.resolve("localhost", "42", flags);
   } catch (const std::system_error& e) {
 caught = true;
 VERIFY( e.code() == ec );


[Committed] Fix subreg_promoted_mode breakage on various platforms

2021-08-31 Thread Roger Sayle
My apologies for the inconvenience.  My recent patch to preserve
SUBREG_PROMOTED_VAR_P on (extend:HI (subreg/s:QI (reg:SI))), and other
places in the middle-end, has broken the build on several targets.

The change to convert_modes inadvertently used the same
subreg_promoted_mode idiom for retrieving the mode of a SUBREG_REG
as the existing code just a few lines earlier.  Alas in the meantime,
the original SUBREG gets replaced by one without SUBREG_PROMOTED_VAR_P,
the whole raison-d'etre for my patch, and I'd not realized/noticed
that subreg_promoted_mode asserts for this.  Alas neither the bootstrap
and regression test on x86_64-pc-linux-gnu nor my testing on nvptx-none
must have hit this particular case.  The logic of this transformation
is sound, it's the implementation that's bitten me.

This patch has been committed, after another "make bootstrap" on
x86_64-pc-linux-gnu (just in case), and confirmation/pre-approval
from Jeff Law that this indeed fixes the build failures seen on
several platforms.

My humble apologies again.

2021-08-31  Roger Sayle  

gcc/ChangeLog
* expr.c (convert_modes): Don't use subreg_promoted_mode on a
SUBREG if it can't be guaranteed to a SUBREG_PROMOTED_VAR_P set.
Instead use the standard (safer) is_a  idiom.

Roger
--




Re: [PATCH 2/4] libffi: Sync with libffi 3.4.2

2021-08-31 Thread Xi Ruoyao via Gcc-patches
Hi hj,

libffi-3.4.2's new static trampoline feature is known to break some
downstream packages with some specific use (or misuse?) of libffi,
unexpected by the libffi developers.  For example
https://gitlab.gnome.org/GNOME/gjs/-/issues/428.

I've not use gccgo recently, so I don't know if it might break something
here.  Just a reminder, if the regtest on x86 and arm (32-bit and 64-
bit, they are the only platforms where libffi enables static trampoline)
is OK there should be no problem.

On Tue, 2021-08-31 at 08:36 -0700, H.J. Lu via Gcc-patches wrote:
> Merged commit: f9ea41683444ebe11cfa45b05223899764df28fb
> ---
>  libffi/.gitattributes | 4 +
>  libffi/ChangeLog.libffi   |  7743 +-
>  libffi/LICENSE    | 2 +-
>  libffi/LICENSE-BUILDTOOLS |   353 +
>  libffi/MERGE  | 4 +
>  libffi/Makefile.am    |   249 +-
>  libffi/Makefile.in    |  1944 --
>  libffi/README |   450 -
>  libffi/README.md  |   495 +
>  libffi/acinclude.m4   |    38 +-
>  libffi/aclocal.m4 |  1202 -
>  libffi/configure  | 19411 
>  libffi/configure.ac   |   199 +-
>  libffi/configure.host |    97 +-
>  libffi/doc/Makefile.am    | 3 +
>  libffi/doc/libffi.texi    |   382 +-
>  libffi/doc/version.texi   | 8 +-
>  libffi/fficonfig.h.in |   208 -
>  libffi/generate-darwin-source-and-headers.py  |   143 +-
>  libffi/include/Makefile.am    | 8 +-
>  libffi/include/Makefile.in    |   565 -
>  libffi/include/ffi.h.in   |   213 +-
>  libffi/include/ffi_cfi.h  |    21 +
>  libffi/include/ffi_common.h   |    50 +-
>  libffi/include/tramp.h    |    45 +
>  libffi/libffi.map.in  |    24 +-
>  libffi/libffi.pc.in   | 2 +-
>  libffi/libffi.xcodeproj/project.pbxproj   |   530 +-
>  libffi/libtool-version    |    25 +-
>  libffi/man/Makefile.in    |   515 -
>  libffi/mdate-sh   |   205 -
>  libffi/msvcc.sh   |   134 +-
>  libffi/src/aarch64/ffi.c  |   536 +-
>  libffi/src/aarch64/ffitarget.h    |    35 +-
>  libffi/src/aarch64/internal.h |    33 +
>  libffi/src/aarch64/sysv.S |   189 +-
>  libffi/src/aarch64/win64_armasm.S |   506 +
>  libffi/src/alpha/ffi.c    | 6 +-
>  libffi/src/arc/ffi.c  | 6 +-
>  libffi/src/arm/ffi.c  |   380 +-
>  libffi/src/arm/ffitarget.h    |    24 +-
>  libffi/src/arm/internal.h |    10 +
>  libffi/src/arm/sysv.S |   304 +-
>  libffi/src/arm/sysv_msvc_arm32.S  |   311 +
>  libffi/src/closures.c |   489 +-
>  libffi/src/cris/ffi.c | 4 +-
>  libffi/src/csky/ffi.c |   395 +
>  libffi/src/csky/ffitarget.h   |    63 +
>  libffi/src/csky/sysv.S    |   371 +
>  libffi/src/dlmalloc.c | 7 +-
>  libffi/src/frv/ffi.c  | 4 +-
>  libffi/src/ia64/ffi.c |    30 +-
>  libffi/src/ia64/ffitarget.h   | 3 +-
>  libffi/src/ia64/unix.S    | 9 +-
>  libffi/src/java_raw_api.c | 6 +-
>  libffi/src/kvx/asm.h  | 5 +
>  libffi/src/kvx/ffi.c  |   273 +
>  libffi/src/kvx/ffitarget.h    |    75 +
>  libffi/src/kvx/sysv.S |   127 +
>  libffi/src/m32r/ffi.c | 2 +-
>  libffi/src/m68k/ffi.c | 4 +-
>  libffi/src/m68k/sysv.S    |    29 +-
>  libffi/src/m88k/ffi.c | 8 +-
>  libffi/src/metag/ffi.c    |    14 +-
>  libffi/src/microblaze/ffi.c   |    10 +-
>  libffi/src/mips/ffi.c |   146 +-
>  libffi/src/mips/ffitarget.h   |    23 +-
>  libffi/src/mips/n32.S |   151 +-
>  libffi/src/mips/o32.S |   177 +-
>  libffi/src/moxie/eabi.S   | 2 +-
>  libffi/src/moxie/ffi.c    |    27 +-
>  libffi/src/nios2/ffi.c    | 4 +-
>  libffi/src/pa/ffi.c   |   216 +-
>

[pushed] c++: Improve error recovery with constexpr [PR92193]

2021-08-31 Thread Jason Merrill via Gcc-patches
The compiler tries to limit error cascades in limit_bad_template_recursion
by avoiding triggering a new instantiation from one that has caused errors.
We were exempting constexpr functions from this because they can be needed
for constant evaluation, but as more and more functions get marked
constexpr, this becomes an over-broad category.  So as suggested on IRC,
this patch only exempts functions that are needed for mandatory constant
evaluation.

As noted in the comment, this flag doesn't particularly need to use a bit in
the FUNCTION_DECL, but there were still some free.

Tested x86_64-pc-linux-gnu, applying to trunk.

PR c++/92193

gcc/cp/ChangeLog:

* cp-tree.h (FNDECL_MANIFESTLY_CONST_EVALUATED): New.
* constexpr.c (cxx_eval_call_expression): Set it.
* pt.c (neglectable_inst_p): Check it.

gcc/testsuite/ChangeLog:

* g++.dg/diagnostic/static_assert4.C: New test.
---
 gcc/cp/cp-tree.h  |  8 +
 gcc/cp/constexpr.c|  2 ++
 gcc/cp/pt.c   |  3 +-
 .../g++.dg/diagnostic/static_assert4.C| 30 +++
 4 files changed, 42 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/diagnostic/static_assert4.C

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index ce7ca53a113..f0a7bd24df7 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -500,6 +500,7 @@ extern GTY(()) tree cp_global_trees[CPTI_MAX];
   FUNCTION_REF_QUALIFIED (in FUNCTION_TYPE, METHOD_TYPE)
   OVL_LOOKUP_P (in OVERLOAD)
   LOOKUP_FOUND_P (in RECORD_TYPE, UNION_TYPE, ENUMERAL_TYPE, 
NAMESPACE_DECL)
+  FNDECL_MANIFESTLY_CONST_EVALUATED (in FUNCTION_DECL)
5: IDENTIFIER_VIRTUAL_P (in IDENTIFIER_NODE)
   FUNCTION_RVALUE_QUALIFIED (in FUNCTION_TYPE, METHOD_TYPE)
   CALL_EXPR_REVERSE_ARGS (in CALL_EXPR, AGGR_INIT_EXPR)
@@ -4213,6 +4214,13 @@ more_aggr_init_expr_args_p (const 
aggr_init_expr_arg_iterator *iter)
 #define FNDECL_USED_AUTO(NODE) \
   TREE_LANG_FLAG_2 (FUNCTION_DECL_CHECK (NODE))
 
+/* True if NODE is needed for a manifestly constant-evaluated expression.
+   This doesn't especially need to be a flag, since currently it's only
+   used for error recovery; if we run out of function flags it could move
+   to an attribute.  */
+#define FNDECL_MANIFESTLY_CONST_EVALUATED(NODE) \
+  TREE_LANG_FLAG_4 (FUNCTION_DECL_CHECK (NODE))
+
 /* True for artificial decls added for OpenMP privatized non-static
data members.  */
 #define DECL_OMP_PRIVATIZED_MEMBER(NODE) \
diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index e78fdf021b2..8be88dcfc24 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -2572,6 +2572,8 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree 
t,
   location_t save_loc = input_location;
   input_location = loc;
   ++function_depth;
+  if (ctx->manifestly_const_eval)
+   FNDECL_MANIFESTLY_CONST_EVALUATED (fun) = true;
   instantiate_decl (fun, /*defer_ok*/false, /*expl_inst*/false);
   --function_depth;
   input_location = save_loc;
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index fcf3ac31b25..72b22d8c487 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -10873,7 +10873,8 @@ neglectable_inst_p (tree d)
 {
   return (d && DECL_P (d)
  && !undeduced_auto_decl (d)
- && !(TREE_CODE (d) == FUNCTION_DECL ? DECL_DECLARED_CONSTEXPR_P (d)
+ && !(TREE_CODE (d) == FUNCTION_DECL
+  ? FNDECL_MANIFESTLY_CONST_EVALUATED (d)
   : decl_maybe_constant_var_p (d)));
 }
 
diff --git a/gcc/testsuite/g++.dg/diagnostic/static_assert4.C 
b/gcc/testsuite/g++.dg/diagnostic/static_assert4.C
new file mode 100644
index 000..c539016e526
--- /dev/null
+++ b/gcc/testsuite/g++.dg/diagnostic/static_assert4.C
@@ -0,0 +1,30 @@
+// PR c++/92193
+// { dg-do compile { target c++11 } }
+
+template
+  struct has_foo
+  { static constexpr bool value = false; };
+
+template
+#ifndef NO_CONSTEXPR
+  constexpr
+#endif
+  bool
+  foo(T t) noexcept(noexcept(t.foo()))
+  { return t.foo(); }
+
+template
+  void
+  maybe_foo(T t)
+  {
+static_assert( has_foo::value, "has foo" ); // { dg-error "has foo" }
+foo(t);
+  }
+
+struct X { };
+
+int main()
+{
+  X x;
+  maybe_foo(x);
+}

base-commit: cad36f38576a6a781e3c62ab061c68f5b8dab13a
-- 
2.27.0



Re: [PATCH 2/2]AArch64: Add better costing for vector constants and operations

2021-08-31 Thread Richard Sandiford via Gcc-patches
Tamar Christina  writes:
>> -Original Message-
>> From: Richard Sandiford 
>> Sent: Tuesday, August 31, 2021 4:14 PM
>> To: Tamar Christina 
>> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
>> ; Marcus Shawcroft
>> ; Kyrylo Tkachov 
>> Subject: Re: [PATCH 2/2]AArch64: Add better costing for vector constants
>> and operations
>> 
>> Tamar Christina  writes:
>> > @@ -13936,8 +13937,65 @@ cost_plus:
>> > mode, MULT, 1, speed);
>> >return true;
>> >  }
>> > +  break;
>> > +case PARALLEL:
>> > +  /* Fall through */
>> 
>> Which code paths lead to getting a PARALLEL here?
>
> Hi,
>
> Thanks for the review!
>
> I added it for completeness because CSE treats a parallel and CONST_VECTOR as
> equivalent when they each entry in the parallel defines a constant.

Could you test whether it ever triggers in practice though?
The code would be much simpler without it.

>> > +case CONST_VECTOR:
>> > +  {
>> > +rtx gen_insn = aarch64_simd_make_constant (x, true);
>> > +/* Not a valid const vector.  */
>> > +if (!gen_insn)
>> > +  break;
>> >
>> > -  /* Fall through.  */
>> > +switch (GET_CODE (gen_insn))
>> > +{
>> > +case CONST_VECTOR:
>> > +  /* Load using MOVI/MVNI.  */
>> > +  if (aarch64_simd_valid_immediate (x, NULL))
>> > +*cost += extra_cost->vect.movi;
>> > +  else /* Load using constant pool.  */
>> > +*cost += extra_cost->ldst.load;
>> > +  break;
>> > +/* Load using a DUP.  */
>> > +case VEC_DUPLICATE:
>> > +  *cost += extra_cost->vect.dup;
>> > +  break;
>> 
>> Does this trigger in practice?  The new check==true path (rightly) stops the
>> duplicated element from being forced into a register, but then I would have
>> expected:
>> 
>> rtx
>> gen_vec_duplicate (machine_mode mode, rtx x) {
>>   if (valid_for_const_vector_p (mode, x))
>> return gen_const_vec_duplicate (mode, x);
>>   return gen_rtx_VEC_DUPLICATE (mode, x); }
>> 
>> to generate the original CONST_VECTOR again.
>
> Yes, but CSE is trying to see whether using a DUP is cheaper than another 
> instruction.
> Normal code won't hit this but CSE is just costing all the different ways one 
> can semantically
> construct a vector, which RTL actually comes out of it depends on how it's 
> folded as you say.

But what I mean is, you call:

  rtx gen_insn = aarch64_simd_make_constant (x, true);
  /* Not a valid const vector.  */
  if (!gen_insn)
break;

where aarch64_simd_make_constant does:

  if (CONST_VECTOR_P (vals))
const_vec = vals;
  else if (GET_CODE (vals) == PARALLEL)
{
  /* A CONST_VECTOR must contain only CONST_INTs and
 CONST_DOUBLEs, but CONSTANT_P allows more (e.g. SYMBOL_REF).
 Only store valid constants in a CONST_VECTOR.  */
  int n_elts = XVECLEN (vals, 0);
  for (i = 0; i < n_elts; ++i)
{
  rtx x = XVECEXP (vals, 0, i);
  if (CONST_INT_P (x) || CONST_DOUBLE_P (x))
n_const++;
}
  if (n_const == n_elts)
const_vec = gen_rtx_CONST_VECTOR (mode, XVEC (vals, 0));
}
  else
gcc_unreachable ();

  if (const_vec != NULL_RTX
  && aarch64_simd_valid_immediate (const_vec, NULL))
/* Load using MOVI/MVNI.  */
return const_vec;
  else if ((const_dup = aarch64_simd_dup_constant (vals, check)) != NULL_RTX)
/* Loaded using DUP.  */
return const_dup;

and aarch64_simd_dup_constant does:

  machine_mode mode = GET_MODE (vals);
  machine_mode inner_mode = GET_MODE_INNER (mode);
  rtx x;

  if (!const_vec_duplicate_p (vals, &x))
return NULL_RTX;

  /* We can load this constant by using DUP and a constant in a
 single ARM register.  This will be cheaper than a vector
 load.  */
  if (!check)
x = copy_to_mode_reg (inner_mode, x);
  return gen_vec_duplicate (mode, x);

For the “check” case, “x” will be a constant, and so gen_vec_duplicate
will call gen_const_vec_duplicate, which will return a CONST_VECTOR.
It didn't seem to be possible for gen_insn to be a VEC_DUPLICATE.

This would be much simpler if we could call aarch64_simd_valid_immediate
and aarch64_simd_dup_constant directly from the rtx cost code, hence the
question about whether the PARALLEL stuff was really needed in practice.

>> > +default:
>> > +  *cost += extra_cost->ldst.load;
>> > +  break;
>> > +}
>> > +return true;
>> > +  }
>> > +case VEC_CONCAT:
>> > +  /* depending on the operation, either DUP or INS.
>> > + For now, keep default costing.  */
>> > +  break;
>> > +case VEC_DUPLICATE:
>> > +  *cost += extra_cost->vect.dup;
>> > +  return true;
>> > +case VEC_SELECT:
>> > +  {
>> > +/* cost subreg of 0 as free, otherwise as DUP */
>> > +rtx op1 = XEXP (x, 1);
>> > +int nelts;
>> > +if ((op1 == const0_rtx && !BYTES_BIG_ENDIAN)
>> > +|| (BYTES_BIG_ENDIAN
>> > +&& GET_MODE_NUNITS (mode).is_constant(&nelts)
>> > + 

Re: [PATCH] c++: check arity before deduction w/ explicit targs [PR12672]

2021-08-31 Thread Jason Merrill via Gcc-patches

On 8/30/21 9:26 PM, Patrick Palka wrote:

During overload resolution, when the arity of a function template
clearly disagrees with the arity of the call, no specialization of the
function template could yield a viable candidate.  The deduction routine
type_unification_real already notices this situation, but not before
it substitutes explicit template arguments into the template, a step
which could induce a hard error.  Although it's necessary to perform
this substitution first in order to check arity perfectly (since the
substitution can e.g. expand a non-trailing parameter pack), in most
cases we can determine ahead of time whether there's an arity
disagreement without needing to perform deduction.

To that end, this patch implements an (approximate) arity check in
add_template_candidate_real that guards actual deduction.  It's enabled
only when there are explicit template arguments since that's when
deduction can force otherwise avoidable template instantiations.  (I
experimented with enabling it unconditionally as an optimization and
saw some compile-time improvements of about 5% but also some regressions
by about the same magnitude, so kept it conditional.)

In passing, this adds a least_p parameter to arity_rejection for sake
of consistent diagnostics with unify_arity.

A couple of testcases needed to be adjusted so that deduction continues
to occur as intended after this change.  Except in unify6.C, where we
were expecting foo to be ill-formed due to substitution
yielding a function type with an added 'const', but I think this is
permitted by [dcl.fct]/7, so I changed the test accordingly.


Agreed.


Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?


OK, thanks.


PR c++/12672

gcc/cp/ChangeLog:

* call.c (rejection_reason::call_varargs_p): Rename this
previously unused member to ...
(rejection_reason::least_p): ... this.
(arity_rejection): Add least_p parameter.
(add_template_candidate_real): When there are explicit
template arguments, check that the arity of the call agrees with
the arity of the function before attempting deduction.
(print_arity_information): Add least_p parameter.
(print_z_candidate): Adjust call to print_arity_information.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/decltype29.C: Adjust.
* g++.dg/template/error56.C: Adjust.
* g++.old-deja/g++.pt/unify6.C: Adjust.
* g++.dg/template/explicit-args6.C: New test.
---
  gcc/cp/call.c | 67 ---
  gcc/testsuite/g++.dg/cpp0x/decltype29.C   |  4 +-
  gcc/testsuite/g++.dg/template/error56.C   |  4 +-
  .../g++.dg/template/explicit-args6.C  | 33 +
  gcc/testsuite/g++.old-deja/g++.pt/unify6.C|  4 +-
  5 files changed, 96 insertions(+), 16 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/template/explicit-args6.C

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index e4df72ec1a3..80e6121ce44 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -455,8 +455,8 @@ struct rejection_reason {
int expected;
/* The actual number of arguments in the call.  */
int actual;
-  /* Whether the call was a varargs call.  */
-  bool call_varargs_p;
+  /* Whether EXPECTED should be treated as a lower bound.  */
+  bool least_p;
  } arity;
  /* Information about an argument conversion mismatch.  */
  struct conversion_info conversion;
@@ -628,12 +628,13 @@ alloc_rejection (enum rejection_reason_code code)
  }
  
  static struct rejection_reason *

-arity_rejection (tree first_arg, int expected, int actual)
+arity_rejection (tree first_arg, int expected, int actual, bool least_p = 
false)
  {
struct rejection_reason *r = alloc_rejection (rr_arity);
int adjust = first_arg != NULL_TREE;
r->u.arity.expected = expected - adjust;
r->u.arity.actual = actual - adjust;
+  r->u.arity.least_p = least_p;
return r;
  }
  
@@ -3452,6 +3453,44 @@ add_template_candidate_real (struct z_candidate **candidates, tree tmpl,

  }
gcc_assert (ia == nargs_without_in_chrg);
  
+  if (!obj && explicit_targs)

+{
+  /* Check that there's no obvious arity mismatch before proceeding with
+deduction.  This avoids substituting explicit template arguments
+into the template (which could result in an error outside the
+immediate context) when the resulting candidate would be unviable
+anyway.  */
+  int min_arity = 0, max_arity = 0;
+  tree parms = TYPE_ARG_TYPES (TREE_TYPE (tmpl));
+  parms = skip_artificial_parms_for (tmpl, parms);
+  for (; parms != void_list_node; parms = TREE_CHAIN (parms))
+   {
+ if (!parms || PACK_EXPANSION_P (TREE_VALUE (parms)))
+   {
+ max_arity = -1;
+ break;
+   }
+ if (TREE_PURPOSE (parms))
+   /* A parameter with a default argument.  */
+   ++

RE: [PATCH 2/2]AArch64: Add better costing for vector constants and operations

2021-08-31 Thread Tamar Christina via Gcc-patches
> -Original Message-
> From: Richard Sandiford 
> Sent: Tuesday, August 31, 2021 4:14 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
> ; Marcus Shawcroft
> ; Kyrylo Tkachov 
> Subject: Re: [PATCH 2/2]AArch64: Add better costing for vector constants
> and operations
> 
> Tamar Christina  writes:
> > @@ -13936,8 +13937,65 @@ cost_plus:
> >  mode, MULT, 1, speed);
> >return true;
> >  }
> > +   break;
> > +case PARALLEL:
> > +  /* Fall through */
> 
> Which code paths lead to getting a PARALLEL here?

Hi,

Thanks for the review!

I added it for completeness because CSE treats a parallel and CONST_VECTOR as
equivalent when they each entry in the parallel defines a constant.

> 
> > +case CONST_VECTOR:
> > +   {
> > + rtx gen_insn = aarch64_simd_make_constant (x, true);
> > + /* Not a valid const vector.  */
> > + if (!gen_insn)
> > +   break;
> >
> > -  /* Fall through.  */
> > + switch (GET_CODE (gen_insn))
> > + {
> > + case CONST_VECTOR:
> > +   /* Load using MOVI/MVNI.  */
> > +   if (aarch64_simd_valid_immediate (x, NULL))
> > + *cost += extra_cost->vect.movi;
> > +   else /* Load using constant pool.  */
> > + *cost += extra_cost->ldst.load;
> > +   break;
> > + /* Load using a DUP.  */
> > + case VEC_DUPLICATE:
> > +   *cost += extra_cost->vect.dup;
> > +   break;
> 
> Does this trigger in practice?  The new check==true path (rightly) stops the
> duplicated element from being forced into a register, but then I would have
> expected:
> 
> rtx
> gen_vec_duplicate (machine_mode mode, rtx x) {
>   if (valid_for_const_vector_p (mode, x))
> return gen_const_vec_duplicate (mode, x);
>   return gen_rtx_VEC_DUPLICATE (mode, x); }
> 
> to generate the original CONST_VECTOR again.

Yes, but CSE is trying to see whether using a DUP is cheaper than another 
instruction.
Normal code won't hit this but CSE is just costing all the different ways one 
can semantically
construct a vector, which RTL actually comes out of it depends on how it's 
folded as you say.

> 
> > + default:
> > +   *cost += extra_cost->ldst.load;
> > +   break;
> > + }
> > + return true;
> > +   }
> > +case VEC_CONCAT:
> > +   /* depending on the operation, either DUP or INS.
> > +  For now, keep default costing.  */
> > +   break;
> > +case VEC_DUPLICATE:
> > +   *cost += extra_cost->vect.dup;
> > +   return true;
> > +case VEC_SELECT:
> > +   {
> > + /* cost subreg of 0 as free, otherwise as DUP */
> > + rtx op1 = XEXP (x, 1);
> > + int nelts;
> > + if ((op1 == const0_rtx && !BYTES_BIG_ENDIAN)
> > + || (BYTES_BIG_ENDIAN
> > + && GET_MODE_NUNITS (mode).is_constant(&nelts)
> > + && INTVAL (op1) == nelts - 1))
> > +   ;
> > + else if (vec_series_lowpart_p (mode, GET_MODE (op1), op1))
> > +   ;
> > + else if (vec_series_highpart_p (mode, GET_MODE (op1), op1))
> > + /* Selecting the high part is not technically free, but we lack
> > +enough information to decide that here.  For instance selecting
> > +the high-part of a vec_dup *is* free or to feed into any _high
> > +instruction.   Both of which we can't really tell.  That said
> > +have a better chance to optimize an dup vs multiple constants.  */
> > +   ;
> 
> Not sure about this.  We already try to detect the latter case (_high
> instructions) via aarch64_strip_extend_vec_half.  We might be missing some
> cases, but that still feels like the right way to go IMO.

That's a different problem from what I understand.  What this is trying to say 
is that
If you have a vector [x y a b] and you need vector [x y] that you can use the 
top part
of the original vector for this.

This is an approximation, because something that can be created with a movi is 
probably
Cheaper to keep distinct if it's not going to be paired with a _high operation 
(since you will have a dup then).

The problem is that the front end has already spit the two Vectors into [x y a 
b] and [x y].
There's nothing else that tries to consolidate them back up if both survive.

As a consequence of this, the testcase test0 is not handled optimally.  It 
would instead create
2 vectors, both of movi 0x3, just one being 64-bits and one being 128-bits.

So if the cost of selecting it is cheaper than the movi, cse will not 
consolidate the vectors,
and because movi's are so cheap, the only cost that worked was 0.  But 
increasing the costs
of movi's requires the costs of everything to be increased (including loads).

I preferred to 0 out the cost, because the worst that can happen is an dup 
instead of a movi,
And at best a dup instead of a load from a pool (if the constant is 
complicated).

> 
> Selecting the high part of a vec_dup should get folded into another vec_dup.
> 
> The lowpart bits look OK, but which paths call thi

Re: [PATCH 1/5]AArch64 sve: combine inverted masks into NOTs

2021-08-31 Thread Richard Sandiford via Gcc-patches
Tamar Christina  writes:
> Hi All,
>
> The following example
>
> void f10(double * restrict z, double * restrict w, double * restrict x,
>double * restrict y, int n)
> {
> for (int i = 0; i < n; i++) {
> z[i] = (w[i] > 0) ? x[i] + w[i] : y[i] - w[i];
> }
> }
>
> generates currently:
>
> ld1dz1.d, p1/z, [x1, x5, lsl 3]
> fcmgt   p2.d, p1/z, z1.d, #0.0
> fcmgt   p0.d, p3/z, z1.d, #0.0
> ld1dz2.d, p2/z, [x2, x5, lsl 3]
> bic p0.b, p3/z, p1.b, p0.b
> ld1dz0.d, p0/z, [x3, x5, lsl 3]
>
> where a BIC is generated between p1 and p0 where a NOT would be better here
> since we won't require the use of p3 and opens the pattern up to being CSEd.
>
> After this patch using a 2 -> 2 split we generate:
>
> ld1dz1.d, p0/z, [x1, x5, lsl 3]
> fcmgt   p2.d, p0/z, z1.d, #0.0
> not p1.b, p0/z, p2.b
>
> The additional scratch is needed such that we can CSE the two operations.  If
> both statements wrote to the same register then CSE won't be able to CSE the
> values if there are other statements in between that use the register.
>
> Note: This patch series is working incrementally towards generating the most
>   efficient code for this and other loops in small steps.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64-sve.md (*mask_inv_combine): New.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/sve/pred-not-gen.c: New test.
>
> --- inline copy of patch -- 
> diff --git a/gcc/config/aarch64/aarch64-sve.md 
> b/gcc/config/aarch64/aarch64-sve.md
> index 
> 359fe0e457096cf4042a774789a5c241420703d3..2c23c6b12bafb038d82920e7141a418e078a2c65
>  100644
> --- a/gcc/config/aarch64/aarch64-sve.md
> +++ b/gcc/config/aarch64/aarch64-sve.md
> @@ -8126,6 +8126,42 @@ (define_insn_and_split "*fcmuo_and_combine"
> UNSPEC_COND_FCMUO))]
>  )
>  
> +;; Make sure that inversions of masked comparisons are always on the mask
> +;; instead of on the operation.
> +(define_insn_and_split "*mask_inv_combine"

I think it would be better to explain why this is an optimisation
in its own right, e.g.:

;; Similar to *fcm_and_combine, but for BIC rather than AND.
;; In this case, we still need a separate NOT/BIC operation, but predicating
;; the comparison on the BIC operand removes the need for a PTRUE.

For the same reason, calling it *fcm_bic_combine might
be more consistent with surrounding code.

It would be good to have a pattern for FCMUO as well, even though that
isn't a focus of the work.

> +  [(set (match_operand: 0 "register_operand" "=Upa")
> + (and:
> +   (and:
> + (not:
> +   (unspec:
> + [(match_operand: 1)
> +  (const_int SVE_KNOWN_PTRUE)
> +  (match_operand:SVE_FULL_F 2 "register_operand" "w")
> +  (match_operand:SVE_FULL_F 3 "aarch64_simd_reg_or_zero" "wDz")]
> + SVE_COND_FP_CMP_I0))
> + (match_operand: 4 "register_operand" "Upa"))
> +   (match_dup: 1)))

Indentation looks off here: this is a sibling of the inner “and”.

> +   (clobber (match_scratch: 5 "=&Upa"))]

This needs to be Upl, since it's the target of a comparison.

> +  "TARGET_SVE"
> +  "#"
> +  "&& 1"
> +  [(set (match_dup 5)
> + (unspec:
> +   [(match_dup 4)
> +(const_int SVE_MAYBE_NOT_PTRUE)
> +(match_dup 2)
> +(match_dup 3)]
> +   SVE_COND_FP_CMP_I0))
> +   (set (match_dup 0)
> + (and:
> +   (not:
> + (match_dup 5))
> +   (match_dup 4)))]
> +{
> +  operands[5] = gen_reg_rtx (mode);

This should be protected by:

  if (can_create_pseudo_p ())

since for post-reload splits we should use operand 5 unaltered.

It would be good to test the patch with the "&& 1" changed to
"&& reload_completed", to make sure that things still work for
post-RA splits.  I *think* the changes above are the only ones
needed to make that true, but nothing beats trying.

> +}
> +)
> +
>  ;; -
>  ;;  [FP] Absolute comparisons
>  ;; -
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pred-not-gen.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/pred-not-gen.c
> new file mode 100644
> index 
> ..a5bb616ef505a63075cf33203de8cf8e8c38b95d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pred-not-gen.c
> @@ -0,0 +1,56 @@
> +/* { dg-do assemble { target aarch64_asm_sve_ok } } */
> +/* { dg-options "-O3 --save-temps -fno-schedule-insns -fno-schedule-insns2" 
> } */
> +/* { dg-final { check-function-bodies "**" "" "" } } */
> +
> +/*
> +** f10:
> +** ...
> +**   ld1dz1.d, p0/z, \[x1, x5, lsl 3\]
> +**   fcmgt   p2.d, p0/z, z1.d, #0.0
> +**   ld1dz2.d, p2/z, \[x2, x5, lsl 3\]
> +**   not p1.b, p0/z, p2.b
>

[PATCH 2/4] libffi: Sync with libffi 3.4.2

2021-08-31 Thread H.J. Lu via Gcc-patches
Merged commit: f9ea41683444ebe11cfa45b05223899764df28fb
---
 libffi/.gitattributes | 4 +
 libffi/ChangeLog.libffi   |  7743 +-
 libffi/LICENSE| 2 +-
 libffi/LICENSE-BUILDTOOLS |   353 +
 libffi/MERGE  | 4 +
 libffi/Makefile.am|   249 +-
 libffi/Makefile.in|  1944 --
 libffi/README |   450 -
 libffi/README.md  |   495 +
 libffi/acinclude.m4   |38 +-
 libffi/aclocal.m4 |  1202 -
 libffi/configure  | 19411 
 libffi/configure.ac   |   199 +-
 libffi/configure.host |97 +-
 libffi/doc/Makefile.am| 3 +
 libffi/doc/libffi.texi|   382 +-
 libffi/doc/version.texi   | 8 +-
 libffi/fficonfig.h.in |   208 -
 libffi/generate-darwin-source-and-headers.py  |   143 +-
 libffi/include/Makefile.am| 8 +-
 libffi/include/Makefile.in|   565 -
 libffi/include/ffi.h.in   |   213 +-
 libffi/include/ffi_cfi.h  |21 +
 libffi/include/ffi_common.h   |50 +-
 libffi/include/tramp.h|45 +
 libffi/libffi.map.in  |24 +-
 libffi/libffi.pc.in   | 2 +-
 libffi/libffi.xcodeproj/project.pbxproj   |   530 +-
 libffi/libtool-version|25 +-
 libffi/man/Makefile.in|   515 -
 libffi/mdate-sh   |   205 -
 libffi/msvcc.sh   |   134 +-
 libffi/src/aarch64/ffi.c  |   536 +-
 libffi/src/aarch64/ffitarget.h|35 +-
 libffi/src/aarch64/internal.h |33 +
 libffi/src/aarch64/sysv.S |   189 +-
 libffi/src/aarch64/win64_armasm.S |   506 +
 libffi/src/alpha/ffi.c| 6 +-
 libffi/src/arc/ffi.c  | 6 +-
 libffi/src/arm/ffi.c  |   380 +-
 libffi/src/arm/ffitarget.h|24 +-
 libffi/src/arm/internal.h |10 +
 libffi/src/arm/sysv.S |   304 +-
 libffi/src/arm/sysv_msvc_arm32.S  |   311 +
 libffi/src/closures.c |   489 +-
 libffi/src/cris/ffi.c | 4 +-
 libffi/src/csky/ffi.c |   395 +
 libffi/src/csky/ffitarget.h   |63 +
 libffi/src/csky/sysv.S|   371 +
 libffi/src/dlmalloc.c | 7 +-
 libffi/src/frv/ffi.c  | 4 +-
 libffi/src/ia64/ffi.c |30 +-
 libffi/src/ia64/ffitarget.h   | 3 +-
 libffi/src/ia64/unix.S| 9 +-
 libffi/src/java_raw_api.c | 6 +-
 libffi/src/kvx/asm.h  | 5 +
 libffi/src/kvx/ffi.c  |   273 +
 libffi/src/kvx/ffitarget.h|75 +
 libffi/src/kvx/sysv.S |   127 +
 libffi/src/m32r/ffi.c | 2 +-
 libffi/src/m68k/ffi.c | 4 +-
 libffi/src/m68k/sysv.S|29 +-
 libffi/src/m88k/ffi.c | 8 +-
 libffi/src/metag/ffi.c|14 +-
 libffi/src/microblaze/ffi.c   |10 +-
 libffi/src/mips/ffi.c |   146 +-
 libffi/src/mips/ffitarget.h   |23 +-
 libffi/src/mips/n32.S |   151 +-
 libffi/src/mips/o32.S |   177 +-
 libffi/src/moxie/eabi.S   | 2 +-
 libffi/src/moxie/ffi.c|27 +-
 libffi/src/nios2/ffi.c| 4 +-
 libffi/src/pa/ffi.c   |   216 +-
 libffi/src/pa/ffitarget.h |11 +-
 libffi/src/pa/hpux32.S|76 +-
 libffi/src/pa/linux.S |   160 +-
 libffi/src/powerpc/asm.h  | 4 +-
 libffi/src/powerpc/darwin_closure.S   | 6 +-
 libffi/src/powerpc/ffi.c  |10 +-
 libffi/src/powerpc/ffi_darwin.c   |48 +-
 libffi/src/powerpc/ffi_linux64.c  |   247 +-
 libffi/src/powerpc/ffi_powerpc.h  |25 +-
 libffi/src/powerpc/ffitarget.h|14 +-
 libffi/src/powerpc/linux64.S  |   111 +-
 libffi/src/powerpc/linux64_closure.S  |70 +-
 libffi/src/pow

[PATCH 0/4] libffi: Sync with upstream

2021-08-31 Thread H.J. Lu via Gcc-patches
GCC maintained a copy of libffi snapshot from 2009 and cherry-picked fixes 
from upstream over the last 10+ years.  In the meantime, libffi upstream
has been changed significantly with new features, bug fixes and new target
support.  Here is a set of patches to sync with libffi 3.4.2 release and
make it easier to sync with libffi upstream:

1. Document how to sync with upstream.
2. Add scripts to help sync with upstream.
3. Sync with libffi 3.4.2. This patch is quite big.  It is availale at

https://gitlab.com/x86-gcc/gcc/-/commit/667397efc8307e45ca6ddec737b0caf8ca9d0fda
4. Integrate libffi build and testsuite with GCC.

H.J. Lu (4):
  libffi: Add HOWTO_MERGE, autogen.sh and merge.sh
  libffi: Sync with libffi 3.4.2
  libffi: Integrate with GCC
  libffi: Integrate testsuite with GCC testsuite

 libffi/.gitattributes |4 +
 libffi/ChangeLog.libffi   | 7743 -
 libffi/HOWTO_MERGE|   13 +
 libffi/LICENSE|2 +-
 libffi/LICENSE-BUILDTOOLS |  353 +
 libffi/MERGE  |4 +
 libffi/Makefile.am|  135 +-
 libffi/Makefile.in|  219 +-
 libffi/README |  450 -
 libffi/README.md  |  495 ++
 libffi/acinclude.m4   |   38 +-
 libffi/autogen.sh |   11 +
 libffi/configure  |  487 +-
 libffi/configure.ac   |   91 +-
 libffi/configure.host |   97 +-
 libffi/doc/Makefile.am|3 +
 libffi/doc/libffi.texi|  382 +-
 libffi/doc/version.texi   |8 +-
 libffi/fficonfig.h.in |   21 +-
 libffi/generate-darwin-source-and-headers.py  |  143 +-
 libffi/include/Makefile.am|2 +-
 libffi/include/Makefile.in|3 +-
 libffi/include/ffi.h.in   |  213 +-
 libffi/include/ffi_cfi.h  |   21 +
 libffi/include/ffi_common.h   |   50 +-
 libffi/include/tramp.h|   45 +
 libffi/libffi.map.in  |   24 +-
 libffi/libffi.pc.in   |2 +-
 libffi/libffi.xcodeproj/project.pbxproj   |  530 +-
 libffi/libtool-version|   25 +-
 libffi/man/Makefile.in|1 +
 libffi/mdate-sh   |   39 +-
 libffi/merge.sh   |   51 +
 libffi/msvcc.sh   |  134 +-
 libffi/src/aarch64/ffi.c  |  536 +-
 libffi/src/aarch64/ffitarget.h|   35 +-
 libffi/src/aarch64/internal.h |   33 +
 libffi/src/aarch64/sysv.S |  189 +-
 libffi/src/aarch64/win64_armasm.S |  506 ++
 libffi/src/alpha/ffi.c|6 +-
 libffi/src/arc/ffi.c  |6 +-
 libffi/src/arm/ffi.c  |  380 +-
 libffi/src/arm/ffitarget.h|   24 +-
 libffi/src/arm/internal.h |   10 +
 libffi/src/arm/sysv.S |  304 +-
 libffi/src/arm/sysv_msvc_arm32.S  |  311 +
 libffi/src/closures.c |  489 +-
 libffi/src/cris/ffi.c |4 +-
 libffi/src/csky/ffi.c |  395 +
 libffi/src/csky/ffitarget.h   |   63 +
 libffi/src/csky/sysv.S|  371 +
 libffi/src/dlmalloc.c |7 +-
 libffi/src/frv/ffi.c  |4 +-
 libffi/src/ia64/ffi.c |   30 +-
 libffi/src/ia64/ffitarget.h   |3 +-
 libffi/src/ia64/unix.S|9 +-
 libffi/src/java_raw_api.c |6 +-
 libffi/src/kvx/asm.h  |5 +
 libffi/src/kvx/ffi.c  |  273 +
 libffi/src/kvx/ffitarget.h|   75 +
 libffi/src/kvx/sysv.S |  127 +
 libffi/src/m32r/ffi.c |2 +-
 libffi/src/m68k/ffi.c |4 +-
 libffi/src/m68k/sysv.S|   29 +-
 libffi/src/m88k/ffi.c |8 +-
 libffi/src/metag/ffi.c|   14 +-
 libffi/src/microblaze/ffi.c   |   10 +-
 libffi/src/mips/ffi.c |  146 +-
 libffi/src/mips/ffitarget.h   |   23 +-
 libffi/src/mips/n32.S |  151 +-
 libffi/src/mips/o32.S |  177 +-
 libffi/src/moxie/eabi.S   |2 +-
 libffi/src/moxie/ffi.c|   27 +-
 libffi/src/nios2/ffi.c

[PATCH 3/4] libffi: Integrate with GCC

2021-08-31 Thread H.J. Lu via Gcc-patches
1. Integrate with GCC.
2. Support multilib.

* Makefile.am (AUTOMAKE_OPTIONS): Add info-in-builddir.
(ACLOCAL_AMFLAGS): Set to -I .. -I ../config.
(SUBDIRS): Don't add doc.
(TEXINFO_TEX): New.
(MAKEINFOFLAGS): Likewise.
(info_TEXINFOS): Likewise.
(STAMP_GENINSRC): Likewise.
(STAMP_BUILD_INFO): Likewise.
(all-local): Likewise.
(stamp-geninsrc): Likewise.
(doc/libffi.info): Likewise.
(stamp-build-info:): Likewise.
(CLEANFILES): Likewise.
(MAINTAINERCLEANFILES): Likewise.
(AM_MAKEFLAGS): Likewise.
(all-recursive): Likewise.
(install-recursive): Likewise.
(mostlyclean-recursive): Likewise.
(clean-recursive): Likewise.
(distclean-recursive): Likewise.
(maintainer-clean-recursive): Likewise.
(LTLDFLAGS): Replace libtool-ldflags with ../libtool-ldflags.
(AM_CFLAGS): Add -g -fexceptions.
(libffi.map-sun): Replace make_sunver.pl with
../contrib/make_sunver.pl.
(dist-hook): Removed.
Include $(top_srcdir)/../multilib.am.
* configure.ac: Add AM_ENABLE_MULTILIB.
Remove the frv*-elf check.
(AX_ENABLE_BUILDDIR): Removed.
(AM_INIT_AUTOMAKE): Add [no-dist].
Add --enable-generated-files-in-srcdir.
(C_CONFIG_MACRO_DIR): Removed.
(AX_COMPILER_VENDOR): Likewise.
(AX_CC_MAXOPT): Likewise.
(AX_CFLAGS_WARN_ALL): Likewise.
Remove the GCC check.
(SYMBOL_UNDERSCORE): Removed.
(AX_CHECK_COMPILE_FLAG): Likewise.
Remove --disable-docs.
(ACX_CHECK_PROG_VER): Check makeinfo.
(BUILD_DOCS): Updated.
Remove --disable-multi-os-directory.
(GCC_WITH_TOOLEXECLIBDIR): New.
Support cross host.
Support --enable-multilib.
* include/Makefile.am (nodist_include_HEADERS): Removed.
(gcc_version): New.
(toollibffidir): Likewise.
(toollibffi_HEADERS): Likewise.
* Makefile.in: Regenerate.
(GCC_BASE_VER): New.
(AC_CONFIG_FILES): Remove doc/Makefile.
(AC_CONFIG_LINKS): New.
* aclocal.m4: Likewise.
* configure: Likewise.
* fficonfig.h.in: Likewise.
* mdate-sh: Likewise.
* include/Makefile.in: Likewise.
* man/Makefile.in: Likewise.
* testsuite/Makefile.in: Likewise.
---
 libffi/Makefile.am   |   116 +-
 libffi/Makefile.in   |  1963 
 libffi/aclocal.m4|  1202 ++
 libffi/configure | 19584 +
 libffi/configure.ac  |   126 +-
 libffi/fficonfig.h.in|   227 +
 libffi/include/Makefile.am   | 6 +-
 libffi/include/Makefile.in   |   566 +
 libffi/man/Makefile.in   |   516 +
 libffi/mdate-sh  |   224 +
 libffi/testsuite/Makefile.in |   606 +
 11 files changed, 25052 insertions(+), 84 deletions(-)
 create mode 100644 libffi/Makefile.in
 create mode 100644 libffi/aclocal.m4
 create mode 100755 libffi/configure
 create mode 100644 libffi/fficonfig.h.in
 create mode 100644 libffi/include/Makefile.in
 create mode 100644 libffi/man/Makefile.in
 create mode 100755 libffi/mdate-sh
 create mode 100644 libffi/testsuite/Makefile.in

diff --git a/libffi/Makefile.am b/libffi/Makefile.am
index 1b18198ad18..02e36176c67 100644
--- a/libffi/Makefile.am
+++ b/libffi/Makefile.am
@@ -1,18 +1,10 @@
 ## Process this with automake to create Makefile.in
 
-AUTOMAKE_OPTIONS = foreign subdir-objects
+AUTOMAKE_OPTIONS = foreign subdir-objects info-in-builddir
 
-ACLOCAL_AMFLAGS = -I m4
+ACLOCAL_AMFLAGS = -I .. -I ../config
 
 SUBDIRS = include testsuite man
-if BUILD_DOCS
-## This hack is needed because it doesn't seem possible to make a
-## conditional info_TEXINFOS in Automake.  At least Automake 1.14
-## either gives errors -- if this attempted in the most
-## straightforward way -- or simply unconditionally tries to build the
-## info file.
-SUBDIRS += doc
-endif
 
 EXTRA_DIST = LICENSE ChangeLog.old \
m4/libtool.m4 m4/lt~obsolete.m4 \
@@ -26,6 +18,90 @@ EXTRA_DIST = LICENSE ChangeLog.old   
\
 # local.exp is generated by configure
 DISTCLEANFILES = local.exp
 
+# Automake Documentation:
+# If your package has Texinfo files in many directories, you can use the
+# variable TEXINFO_TEX to tell Automake where to find the canonical
+# `texinfo.tex' for your package. The value of this variable should be
+# the relative path from the current `Makefile.am' to `texinfo.tex'.
+TEXINFO_TEX   = ../gcc/doc/include/texinfo.tex
+
+# Defines info, dvi, pdf and html targets
+MAKEINFOFLAGS = -I $(srcdir)/../gcc/doc/include
+info_TEXINFOS = doc/libffi.texi
+
+# AM_CONDITIONAL on configure option --generated-files-in-srcdir
+if GENINSRC
+STAMP_GENINSRC = stamp-geninsrc
+else
+STAMP_GENINSRC =
+endif
+
+# AM_CONDITIONA

[PATCH 1/4] libffi: Add HOWTO_MERGE, autogen.sh and merge.sh

2021-08-31 Thread H.J. Lu via Gcc-patches
Add scripts for syncing with libffi upstream:

1. Clone libffi repo.
2. Checkout the specific commit.
3. Remove the unused files.
4. Add new files if needed.

* HOWTO_MERGE: New file.
* autogen.sh: Likewise.
* merge.sh: Likewise.
---
 libffi/HOWTO_MERGE | 13 
 libffi/autogen.sh  | 11 ++
 libffi/merge.sh| 51 ++
 3 files changed, 75 insertions(+)
 create mode 100644 libffi/HOWTO_MERGE
 create mode 100755 libffi/autogen.sh
 create mode 100755 libffi/merge.sh

diff --git a/libffi/HOWTO_MERGE b/libffi/HOWTO_MERGE
new file mode 100644
index 000..6821c6dee3f
--- /dev/null
+++ b/libffi/HOWTO_MERGE
@@ -0,0 +1,13 @@
+In general, merging process should not be very difficult, but we need to
+track GCC-specific patches carefully.  Here is a general list of actions
+required to perform the merge:
+
+* Checkout recent GCC tree.
+* Run merge.sh script from the libffi directory.
+* Add new files if needed.
+* Apply all needed GCC-specific patches to libffi (note that some of
+  them might be already included to upstream).  The list of these patches
+  is stored into LOCAL_PATCHES file.  May need to re-run autogen.sh to
+  regenerate configure and Makefile.in files.
+* Send your patches for review to GCC Patches Mailing List 
(gcc-patches@gcc.gnu.org).
+* Update LOCAL_PATCHES file when you've committed the whole patch set with new 
revisions numbers.
diff --git a/libffi/autogen.sh b/libffi/autogen.sh
new file mode 100755
index 000..95bfc389faf
--- /dev/null
+++ b/libffi/autogen.sh
@@ -0,0 +1,11 @@
+#!/bin/sh
+#exec autoreconf -v -i
+
+rm -rf autom4te.cache
+aclocal  -I .. -I ../config
+autoheader -I .. -I ../config
+autoconf
+automake --foreign --add-missing --copy Makefile
+automake --foreign include/Makefile
+automake --foreign man/Makefile
+automake --foreign testsuite/Makefile
diff --git a/libffi/merge.sh b/libffi/merge.sh
new file mode 100755
index 000..b36fbb92185
--- /dev/null
+++ b/libffi/merge.sh
@@ -0,0 +1,51 @@
+#!/bin/bash
+
+# FIXME: do we need a license (or whatever else) header here?
+
+# This script merges libffi sources from upstream.
+
+# Default to the tip of master branch.
+commit=${1-master}
+
+fatal() {
+  echo "$1"
+  exit 1;
+}
+
+get_upstream() {
+  rm -rf upstream
+  git clone https://github.com/libffi/libffi.git upstream
+  pushd upstream
+  git checkout $commit || fatal "Failed to checkout $commit"
+  popd
+}
+
+get_current_rev() {
+  cd upstream
+  git rev-parse HEAD
+}
+
+pwd | grep 'libffi$' || \
+  fatal "Run this script from the libffi directory"
+get_upstream
+CUR_REV=$(get_current_rev)
+echo Current upstream revision: $CUR_REV
+
+# Remove the unused files.
+pushd upstream
+rm -rf ChangeLog.old .appveyor* .ci .github .gitignore .travis* \
+   config.guess config.sub libtool-ldflags m4 make_sunver.pl \
+   msvc_build
+rm -rf .git autogen.sh
+cp -a . ..
+popd
+
+rm -rf upstream
+
+# Update the MERGE file.
+cat << EOF > MERGE
+$CUR_REV
+
+The first line of this file holds the git revision number of the
+last merge done from the master library sources.
+EOF
-- 
2.31.1



[PATCH 4/4] libffi: Integrate testsuite with GCC testsuite

2021-08-31 Thread H.J. Lu via Gcc-patches
---
 libffi/testsuite/lib/libffi.exp | 24 +++-
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/libffi/testsuite/lib/libffi.exp b/libffi/testsuite/lib/libffi.exp
index 4f4dd48d2c6..d8bf6269a36 100644
--- a/libffi/testsuite/lib/libffi.exp
+++ b/libffi/testsuite/lib/libffi.exp
@@ -15,12 +15,15 @@
 # .
 
 proc load_gcc_lib { filename } {
-global srcdir
-load_file $srcdir/lib/$filename
+global srcdir loaded_libs
+load_file $srcdir/../../gcc/testsuite/lib/$filename
+set loaded_libs($filename) ""
 }
 
 load_lib dg.exp
 load_lib libgloss.exp
+load_gcc_lib target-supports.exp
+load_gcc_lib target-supports-dg.exp
 load_gcc_lib target-libpath.exp
 load_gcc_lib wrapper.exp
 
@@ -277,6 +280,7 @@ proc libffi-init { args } {
 global srcdir
 global blddirffi
 global objdir
+global blddircxx
 global TOOL_OPTIONS
 global tool
 global libffi_include
@@ -285,13 +289,13 @@ proc libffi-init { args } {
 global ld_library_path
 global compiler_vendor
 
-if ![info exists blddirffi] {
-   set blddirffi [pwd]/..
-}
-
+set blddirffi [lookfor_file [get_multilibs] libffi]
 verbose "libffi $blddirffi"
+set blddircxx [lookfor_file [get_multilibs] libstdc++-v3]
+verbose "libstdc++ $blddircxx"
+
+set compiler_vendor "gnu"
 
-# Which compiler are we building with?
 if { [string match $compiler_vendor "gnu"] } {
 set gccdir [lookfor_file $tool_root_dir gcc/libgcc.a]
 if {$gccdir != ""} {
@@ -320,6 +324,8 @@ proc libffi-init { args } {
 
 # add the library path for libffi.
 append ld_library_path ":${blddirffi}/.libs"
+# add the library path for libstdc++ as well.
+append ld_library_path ":${blddircxx}/src/.libs"
 
 verbose "ld_library_path: $ld_library_path"
 
@@ -332,6 +338,7 @@ proc libffi-init { args } {
 if { $libffi_dir != "" } {
set libffi_dir [file dirname ${libffi_dir}]
set libffi_link_flags "-L${libffi_dir}/.libs"
+   lappend libffi_link_flags "-L${blddircxx}/src/.libs"
 }
 
 set_ld_library_path_env_vars
@@ -398,9 +405,8 @@ proc libffi_target_compile { source dest type options } {
lappend options "libs= -lpthread"
 }
 
-# this may be required for g++, but just confused clang.
 if { [string match "*.cc" $source] } {
-lappend options "c++"
+   lappend options "ldflags=-lstdc++"
 }
 
 if { [string match "arc*-*-linux*" $target_triplet] } {
-- 
2.31.1



Re: [PATCH] Preserve SUBREG_PROMOTED_VAR_P on (extend:HI (subreg/s:QI (reg:SI)))

2021-08-31 Thread Jeff Law via Gcc-patches




On 8/31/2021 7:24 AM, Roger Sayle wrote:

Hi Christophe,
I'm testing the attached patch, but without an aarch64, it'll take a while to 
figure
out the toolchain to reproduce the failure.  Neither of the platforms I tested 
were
affected, but I can see it's unsafe to reuse the subreg_promoted_reg idiom from
just a few lines earlier.  Any help testing the attached patch on an affected 
target
would be much appreciated.

Sorry for the inconvenience.
Roger
--

-Original Message-
From: Christophe LYON 
Sent: 31 August 2021 13:32
To: Roger Sayle ; 'GCC Patches' 

Subject: Re: [PATCH] Preserve SUBREG_PROMOTED_VAR_P on (extend:HI (subreg/s:QI 
(reg:SI)))


On 29/08/2021 09:46, Roger Sayle wrote:

SUBREG_PROMOTED_VAR_P is a mechanism for tracking that a partial
subreg is correctly zero-extended or sign-extended in the parent
register.  For example, the RTL (subreg/s/v:QI (reg/v:SI 23 [ x ]) 0)
indicates that the byte x is zero extended in reg:SI 23, which is useful for 
optimization.
An example is that zero extending the above QImode value to HImode can
simply use a wider subreg, i.e. (subreg:HI (reg/v:SI 23 [ x ]) 0).

This patch addresses the oversight/missed optimization opportunity
that the new HImode subreg above should retain its
SUBREG_PROMOTED_VAR_P annotation as its value is guaranteed to be
correctly extended in the SImode parent.  The code below to preserve
SUBREG_PROMOTED_VAR_P is already present in the middle-end (e.g.
simplify-rtx.c:7232-7242) but missing from one or two (precisely three) places 
that (accidentally) strip it.

Whilst there I also added another optimization.  If we need to extend
the above QImode value beyond the SImode register holding it, say to
DImode, we can eliminate the SUBREG and simply extend from the SImode
register to DImode.

This patch has been tested on x86_64-pc-linux-gnu with "make bootstrap"
and "make -k check" with no new failures, and on a cross-compiler to
nvptx-none, where the function "long foo(char x) { return x; }" now
requires one less instruction.

OK for mainline?


Hi,

This patch causes an ICE when building an aarch64 toolchain:

during RTL pass: expand
In file included from
/tmp/5987050_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/soft-fp/soft-fp.h:318,
   from
/tmp/5987050_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/soft-fp/floatditf.c:32:
/tmp/5987050_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/soft-fp/floatditf.c:
In function '__floatditf':
/tmp/5987050_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/soft-fp/op-2.h:249:37:
internal compiler error: in subreg_promoted_mode, at rtl.h:3132
249 |   _FP_PACK_RAW_2_flo.bits.exp   = X##_e;\
/tmp/5987050_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/soft-fp/quad.h:229:33:
note: in expansion of macro '_FP_PACK_RAW_2'
229 | # define FP_PACK_RAW_Q(val, X)  _FP_PACK_RAW_2 (Q, (val), X)
| ^~
/tmp/5987050_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/soft-fp/floatditf.c:42:3:
note: in expansion of macro 'FP_PACK_RAW_Q'
 42 |   FP_PACK_RAW_Q (a, A);
|   ^
0xa0b53a subreg_promoted_mode

/tmp/5987050_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/rtl.h:3132
0xa0b53a convert_modes(machine_mode, machine_mode, rtx_def*, int)
/tmp/5987050_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:699
0xa003bc expand_expr_real_2(separate_ops*, rtx_def*, machine_mode,
expand_modifier)
/tmp/5987050_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:9091
0xa0765c expand_expr_real_1(tree_node*, rtx_def*, machine_mode, 
expand_modifier, rtx_def**, bool)
/tmp/5987050_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:10497
0x9fcef6 expand_expr_real_2(separate_ops*, rtx_def*, machine_mode,
expand_modifier)
/tmp/5987050_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:9798
0xa0765c expand_expr_real_1(tree_node*, rtx_def*, machine_mode, 
expand_modifier, rtx_def**, bool)
/tmp/5987050_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:10497
0xa1099e expand_expr
/tmp/5987050_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.h:301
0xa1099e expand_operands(tree_node*, tree_node*, rtx_def*, rtx_def**, 
rtx_def**, expand_modifier)
/tmp/5987050_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:8308
0x9fcdff expand_expr_real_2(separate_ops*, rtx_def*, machine_mode,
expand_modifier)
/tmp/5987050_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:10288


Can you check?


Thanks,

Christophe



2021-08-29  Roger Sayle  

gcc/ChangeLog
* expr.c (convert_modes): Preserve SUBREG_PROMOTED_VAR_P when
creating a (wider) partial subreg from a SUBREG_PROMOTED_VAR_P
subreg.
* simplify-rtx.c (simplify_unary_operation_1) [SIGN_EXTEND]:
Likewise, preserve SUBREG_PROMOTED_VAR_P when creating a (wider)
partial subreg from a SUBREG_PROMOTED_VAR_P subreg.  Generate
SIGN_EXTEND of the SUBREG_REG when a subreg would be paradoxical.

Re: [PATCH 2/2]AArch64: Add better costing for vector constants and operations

2021-08-31 Thread Richard Sandiford via Gcc-patches
Tamar Christina  writes:
> @@ -13936,8 +13937,65 @@ cost_plus:
>mode, MULT, 1, speed);
>return true;
>  }
> + break;
> +case PARALLEL:
> +  /* Fall through */

Which code paths lead to getting a PARALLEL here?

> +case CONST_VECTOR:
> + {
> +   rtx gen_insn = aarch64_simd_make_constant (x, true);
> +   /* Not a valid const vector.  */
> +   if (!gen_insn)
> + break;
>  
> -  /* Fall through.  */
> +   switch (GET_CODE (gen_insn))
> +   {
> +   case CONST_VECTOR:
> + /* Load using MOVI/MVNI.  */
> + if (aarch64_simd_valid_immediate (x, NULL))
> +   *cost += extra_cost->vect.movi;
> + else /* Load using constant pool.  */
> +   *cost += extra_cost->ldst.load;
> + break;
> +   /* Load using a DUP.  */
> +   case VEC_DUPLICATE:
> + *cost += extra_cost->vect.dup;
> + break;

Does this trigger in practice?  The new check==true path (rightly) stops
the duplicated element from being forced into a register, but then
I would have expected:

rtx
gen_vec_duplicate (machine_mode mode, rtx x)
{
  if (valid_for_const_vector_p (mode, x))
return gen_const_vec_duplicate (mode, x);
  return gen_rtx_VEC_DUPLICATE (mode, x);
}

to generate the original CONST_VECTOR again.

> +   default:
> + *cost += extra_cost->ldst.load;
> + break;
> +   }
> +   return true;
> + }
> +case VEC_CONCAT:
> + /* depending on the operation, either DUP or INS.
> +For now, keep default costing.  */
> + break;
> +case VEC_DUPLICATE:
> + *cost += extra_cost->vect.dup;
> + return true;
> +case VEC_SELECT:
> + {
> +   /* cost subreg of 0 as free, otherwise as DUP */
> +   rtx op1 = XEXP (x, 1);
> +   int nelts;
> +   if ((op1 == const0_rtx && !BYTES_BIG_ENDIAN)
> +   || (BYTES_BIG_ENDIAN
> +   && GET_MODE_NUNITS (mode).is_constant(&nelts)
> +   && INTVAL (op1) == nelts - 1))
> + ;
> +   else if (vec_series_lowpart_p (mode, GET_MODE (op1), op1))
> + ;
> +   else if (vec_series_highpart_p (mode, GET_MODE (op1), op1))
> +   /* Selecting the high part is not technically free, but we lack
> +  enough information to decide that here.  For instance selecting
> +  the high-part of a vec_dup *is* free or to feed into any _high
> +  instruction.   Both of which we can't really tell.  That said
> +  have a better chance to optimize an dup vs multiple constants.  */
> + ;

Not sure about this.  We already try to detect the latter case
(_high instructions) via aarch64_strip_extend_vec_half.  We might
be missing some cases, but that still feels like the right way
to go IMO.

Selecting the high part of a vec_dup should get folded into
another vec_dup.

The lowpart bits look OK, but which paths call this function
without first simplifying the select to a subreg?  The subreg
is now the canonical form (thanks to r12-2288).

Thanks,
Richard


[committed] Restore intent of data-sym-multi-pool test

2021-08-31 Thread Jeff Law via Gcc-patches
Recent improvements to Ranger have optimized away some of the code in 
mips/data-sym-multi-pool.c which in turn causes the test to fail as it's 
looking for specific bits in the assembly output.  The easiest fix here 
which preserves the intent of the test is to disable VRP as done by this 
patch.


Installed on the trunk,

Jeff

commit 18f0e57b9a2f1b108831fcfb25cbcc4e2de65e8e
Author: Jeff Law 
Date:   Tue Aug 31 11:08:50 2021 -0400

Restore intent of data-sym-multi-pool test

gcc/testsuite
* gcc.target/mips/mips.exp: Add tree-vrp to mips_option_group.
* gcc.target/mips/data-sym-multi-pool.c: Add -fno-tree-vrp.

diff --git a/gcc/testsuite/gcc.target/mips/data-sym-multi-pool.c 
b/gcc/testsuite/gcc.target/mips/data-sym-multi-pool.c
index 1936f5bf27e..26a622a44c9 100644
--- a/gcc/testsuite/gcc.target/mips/data-sym-multi-pool.c
+++ b/gcc/testsuite/gcc.target/mips/data-sym-multi-pool.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-mips16 -mcode-readable=yes" } */
+/* { dg-options "-mips16 -mcode-readable=yes -fno-tree-vrp" } */
 /* { dg-skip-if "per-function expected output" { *-*-* } { "-flto" } { "" } } 
*/
 
 /* This testcase generates multiple constant pools within a function body.  */
diff --git a/gcc/testsuite/gcc.target/mips/mips.exp 
b/gcc/testsuite/gcc.target/mips/mips.exp
index 580e7c0c8f9..d4d4b90d897 100644
--- a/gcc/testsuite/gcc.target/mips/mips.exp
+++ b/gcc/testsuite/gcc.target/mips/mips.exp
@@ -333,6 +333,7 @@ foreach option {
 schedule-insns2
 split-wide-types
 tree-vectorize
+tree-vrp
 unroll-all-loops
 unroll-loops
 ipa-ra


Ping: [PATCH] Generate XXSPLTIDP on power10.

2021-08-31 Thread Michael Meissner via Gcc-patches
Ping patch.

| Date: Wed, 25 Aug 2021 15:46:43 -0400
| Subject: [PATCH] Generate XXSPLTIDP on power10.
| Message-ID: 

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797


Re: [PATCH]AArch64 RFC: Don't cost all scalar operations during vectorization if scalar will fuse

2021-08-31 Thread Richard Sandiford via Gcc-patches
Tamar Christina  writes:
> Hi All,
>
> As the vectorizer has improved over time in capabilities it has started
> over-vectorizing.  This has causes regressions in the order of 1-7x on 
> libraries
> that Arm produces.
>
> The vector costs actually do make a lot of sense and I don't think that they 
> are
> wrong.  I think that the costs for the scalar code are wrong.
>
> In particular the costing doesn't take into effect that scalar operation
> can/will fuse as this happens in RTL.  Because of this the costs for the 
> scalars
> end up being always higher.
>
> As an example the loop in PR 97984:
>
> void x (long * __restrict a, long * __restrict b)
> {
>   a[0] *= b[0];
>   a[1] *= b[1];
>   a[0] += b[0];
>   a[1] += b[1];
> }
>
> generates:
>
> x:
> ldp x2, x3, [x0]
> ldr x4, [x1]
> ldr q1, [x1]
> mul x2, x2, x4
> ldr x4, [x1, 8]
> fmovd0, x2
> ins v0.d[1], x3
> mul x1, x3, x4
> ins v0.d[1], x1
> add v0.2d, v0.2d, v1.2d
> str q0, [x0]
> ret
>
> On an actual loop the prologue costs would make the loop too expensive so we
> produce the scalar output, but with SLP there's no loop overhead costs so we 
> end
> up trying to vectorize this. Because SLP discovery is started from the stores 
> we
> will end up vectorizing and costing the add but not the MUL.
>
> To counter this the patch adjusts the costing when it finds an operation that
> can be fused and discounts the cost of the "other" operation being fused in.
>
> The attached testcase shows that even when we discount it we still get still 
> get
> vectorized code when profitable to do so, e.g. SVE.
>
> This happens as well with other operations such as scalar operations where
> shifts can be fused in or for e.g. bfxil.  As such sending this for feedback.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master? If the approach is acceptable I can add support for more.
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>   PR target/97984
>   * config/aarch64/aarch64.c (aarch64_add_stmt_cost): Check for fusing
>   madd.
>
> gcc/testsuite/ChangeLog:
>
>   PR target/97984
>   * gcc.target/aarch64/pr97984-1.c: New test.
>   * gcc.target/aarch64/pr97984-2.c: New test.
>   * gcc.target/aarch64/pr97984-3.c: New test.
>   * gcc.target/aarch64/pr97984-4.c: New test.
>
> --- inline copy of patch -- 
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 
> 4cd4b037f2606e515ad8f4669d2cd13a509dd0a4..329b556311310d86aaf546d7b395a3750a9d57d4
>  100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -15536,6 +15536,39 @@ aarch64_add_stmt_cost (class vec_info *vinfo, void 
> *data, int count,
>   stmt_cost = aarch64_sve_adjust_stmt_cost (vinfo, kind, stmt_info,
> vectype, stmt_cost);
>  
> +  /* Scale costs if operation is fusing.  */
> +  if (stmt_info && kind == scalar_stmt)
> +  {
> + if (gassign *stmt = dyn_cast (STMT_VINFO_STMT (stmt_info)))
> +   {
> + switch (gimple_assign_rhs_code (stmt))
> + {
> + case PLUS_EXPR:
> + case MINUS_EXPR:
> +   {
> + /* Check if operation can fuse into MSUB or MADD.  */
> + tree rhs1 = gimple_assign_rhs1 (stmt);
> + if (gassign *stmt1 = dyn_cast (SSA_NAME_DEF_STMT 
> (rhs1)))
> +   if (gimple_assign_rhs_code (stmt1) == MULT_EXPR)
> + {
> +   stmt_cost = 0;
> +   break;
> +}
> + tree rhs2 = gimple_assign_rhs2 (stmt);
> + if (gassign *stmt2 = dyn_cast (SSA_NAME_DEF_STMT 
> (rhs2)))
> +   if (gimple_assign_rhs_code (stmt2) == MULT_EXPR)
> + {
> +   stmt_cost = 0;
> +   break;
> + }
> +   }
> +   break;
> + default:
> +   break;
> + }
> +   }
> +  }
> +

The difficulty with this is that we can also use MLA-type operations
for SVE, and for Advanced SIMD if the mode is not DI.  It's not just
a scalar thing.

We already take the combination into account (via aarch64_multiply_add_p)
when estimating issue rates.  But we don't take it into account for
latencies because of the reason above: if the multiplications are
vectorisable, then the combination applies to both the scalar and
the vector code, so the adjustments cancel out.  (Admittedly that
decision predates the special Advanced SIMD handling in
aarch64_multiply_add_p, so we might want to revisit it.)

I think the key feature for this testcase is that the multiplication is
not part of the vector code.  I think that's something we need to check
if we're going to cost the scalar code more cheaply.

But for this particular testcase, I think the main problem is that
we count the cost

Re: [gomp] Add langhook, so that Fortran can privatize variables by reference

2021-08-31 Thread Jakub Jelinek via Gcc-patches
On Tue, Aug 31, 2021 at 04:28:19PM +0200, Thomas Schwinge wrote:
> >From fb29fe81b4c8e880b32d68351385d8a42c97934b Mon Sep 17 00:00:00 2001
> From: Thomas Schwinge 
> Date: Wed, 29 May 2019 18:59:46 +0200
> Subject: [PATCH] [OMP] Standardize on 'omp_privatize_by_reference'
> 
> ... instead of 'omp_is_reference' vs.
> 'lang_hooks.decls.omp_privatize_by_reference'.
> 
>   gcc/
>   * omp-general.h (omp_is_reference): Rename to...
>   (omp_privatize_by_reference): ... this.  Adjust all users...
>   * omp-general.c: ... here, ...
>   * gimplify.c: ... here, ...
>   * omp-expand.c: ... here, ...
>   * omp-low.c: ... here.

Ok for trunk.

> diff --git a/gcc/gimplify.c b/gcc/gimplify.c
> index 070d0e4df45..cab4089192a 100644
> --- a/gcc/gimplify.c
> +++ b/gcc/gimplify.c
> @@ -1831,7 +1831,8 @@ gimplify_decl_expr (tree *stmt_p, gimple_seq *seq_p)
> gimplify_and_add (init, seq_p);
> ggc_free (init);
> /* Clear TREE_READONLY if we really have an initialization.  */
> -   if (!DECL_INITIAL (decl) && !omp_is_reference (decl))
> +   if (!DECL_INITIAL (decl)
> +   && !omp_privatize_by_reference (decl))
>   TREE_READONLY (decl) = 0;
>   }
> else
> @@ -7064,7 +7065,7 @@ omp_add_variable (struct gimplify_omp_ctx *ctx, tree 
> decl, unsigned int flags)
>   omp_notice_variable (ctx, TYPE_SIZE_UNIT (TREE_TYPE (decl)), true);
>  }
>else if ((flags & (GOVD_MAP | GOVD_LOCAL)) == 0
> -&& lang_hooks.decls.omp_privatize_by_reference (decl))
> +&& omp_privatize_by_reference (decl))
>  {
>omp_firstprivatize_type_sizes (ctx, TREE_TYPE (decl));
>  
> @@ -7322,7 +7323,7 @@ oacc_default_clause (struct gimplify_omp_ctx *ctx, tree 
> decl, unsigned flags)
>bool declared = is_oacc_declared (decl);
>tree type = TREE_TYPE (decl);
>  
> -  if (lang_hooks.decls.omp_privatize_by_reference (decl))
> +  if (omp_privatize_by_reference (decl))
>  type = TREE_TYPE (type);
>  
>/* For Fortran COMMON blocks, only used variables in those blocks are
> @@ -7586,7 +7587,7 @@ omp_notice_variable (struct gimplify_omp_ctx *ctx, tree 
> decl, bool in_code)
> tree type = TREE_TYPE (decl);
>  
> if (gimplify_omp_ctxp->target_firstprivatize_array_bases
> -   && lang_hooks.decls.omp_privatize_by_reference (decl))
> +   && omp_privatize_by_reference (decl))
>   type = TREE_TYPE (type);
> if (!lang_hooks.types.omp_mappable_type (type))
>   {
> @@ -7660,7 +7661,7 @@ omp_notice_variable (struct gimplify_omp_ctx *ctx, tree 
> decl, bool in_code)
> n2 = splay_tree_lookup (ctx->variables, (splay_tree_key) t);
> n2->value |= GOVD_SEEN;
>   }
> -  else if (lang_hooks.decls.omp_privatize_by_reference (decl)
> +  else if (omp_privatize_by_reference (decl)
>  && TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (decl)))
>  && (TREE_CODE (TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (decl
>  != INTEGER_CST))
> @@ -7785,7 +7786,7 @@ omp_check_private (struct gimplify_omp_ctx *ctx, tree 
> decl, bool copyprivate)
> if (copyprivate)
>   return true;
>  
> -   if (lang_hooks.decls.omp_privatize_by_reference (decl))
> +   if (omp_privatize_by_reference (decl))
>   return false;
>  
> /* Treat C++ privatized non-static data members outside
> @@ -10373,7 +10374,7 @@ omp_shared_to_firstprivate_optimizable_decl_p (tree 
> decl)
>HOST_WIDE_INT len = int_size_in_bytes (type);
>if (len == -1 || len > 4 * POINTER_SIZE / BITS_PER_UNIT)
>  return false;
> -  if (lang_hooks.decls.omp_privatize_by_reference (decl))
> +  if (omp_privatize_by_reference (decl))
>  return false;
>return true;
>  }
> @@ -10698,7 +10699,7 @@ gimplify_adjust_omp_clauses_1 (splay_tree_node n, 
> void *data)
> OMP_CLAUSE_CHAIN (clause) = nc;
>   }
>else if (gimplify_omp_ctxp->target_firstprivatize_array_bases
> -&& lang_hooks.decls.omp_privatize_by_reference (decl))
> +&& omp_privatize_by_reference (decl))
>   {
> OMP_CLAUSE_DECL (clause) = build_simple_mem_ref (decl);
> OMP_CLAUSE_SIZE (clause)
> diff --git a/gcc/omp-expand.c b/gcc/omp-expand.c
> index 66c64f5a37b..7ce0663ba70 100644
> --- a/gcc/omp-expand.c
> +++ b/gcc/omp-expand.c
> @@ -4232,9 +4232,8 @@ expand_omp_for_generic (struct omp_region *region,
> && !OMP_CLAUSE_LINEAR_NO_COPYIN (c))
>   {
> tree d = OMP_CLAUSE_DECL (c);
> -   bool is_ref = omp_is_reference (d);
> tree t = d, a, dest;
> -   if (is_ref)
> +   if (omp_privatize_by_reference (t))
>   t = build_simple_mem_ref_loc (OMP_CLAUSE_LOCATION (c), t);
> tree type = TREE_TYPE (t);
> if (POINTER_TYPE_P (type))
> @@ -5236,9 +5235,8 @@ expand_omp_for_static_nochunk (struct omp_region 
> *region,
> && !OMP_CLAUSE

Re:Re: [PATCH] libstdc++-v3: Check for TLS support on mingw

2021-08-31 Thread lhmouse via Gcc-patches
在 2021-08-31 17:02, Jonathan Wakely 写道:
> It looks like my questions about this patch never got an answer, and
> it never got applied.
> 
> Could somebody say whether TLS is enabled for native *-*-mingw*
> builds? If it is, then we definitely need to add GCC_CHECK_TLS to the
> cross-compiler config too.
> 
> For a linux-hosted x86_64-w64-mingw32 cross compiler I see TLS is not enabled:
> 
> /* Define to 1 if the target supports thread-local storage. */
> /* #undef _GLIBCXX_HAVE_TLS */
> 

I have been disabling it with `--disable-tls` for years, but... I couldn't 
remember why. Thread-local storage is implemented with emutls, no matter with 
or without this option.

Does 'thread-local storage' mean to access thread-local objects via the FS or 
GS register on x86? If so, then it is definitely not supported yet.


-- 
Best regards,
LIU Hao






Re: [gomp] Add langhook, so that Fortran can privatize variables by reference

2021-08-31 Thread Thomas Schwinge
Hi Jakub!

We never finished this dicussion here -- but I ran into this again, last
week:

On 2019-05-29T18:59:46+0200, I wrote:
> On Mon, 27 May 2019 18:49:20 +0200, Jakub Jelinek  wrote:
>> On Sun, May 26, 2019 at 07:43:04PM +0200, Thomas Schwinge wrote:
>> > On Tue, 18 Oct 2005 03:01:40 -0400, Jakub Jelinek  wrote:
>> > > --- gcc/omp-low.c.jj 2005-10-15 12:00:06.0 +0200
>> > > +++ gcc/omp-low.c2005-10-18 08:46:23.0 +0200
>> > > @@ -126,7 +126,7 @@ is_variable_sized (tree expr)
>> > >  static inline bool
>> > >  is_reference (tree decl)
>> > >  {
>> > > -  return TREE_CODE (TREE_TYPE (decl)) == REFERENCE_TYPE;
>> > > +  return lang_hooks.decls.omp_privatize_by_reference (decl);
>> > >  }
>> >
>> > With the same implementation, this function nowadays is known as
>> > 'omp_is_reference' ('gcc/omp-general.c'), and is used in 'omp-*' files
>> > only.  The gimplifier directly calls
>> > 'lang_hooks.decls.omp_privatize_by_reference'.
>> >
>> > Will it be OK to commit the obvious patch to get rid of the
>> > 'omp_is_reference' function?  Whenever I see it used in 'omp-*' files, I
>>
>> No, omp_is_reference (something) is certainly more readable from
>> lang_hooks.decls.omp_privatize_by_reference (something)
>
> Yes, better readable because it's shorter, but you have to look up its
> meaning, whereas with 'lang_hooks.decls.omp_privatize_by_reference' you
> directly see what it's about.
>
>> which is quite
>> long and would cause major issues in formatting etc.

(Actually, my proposed change: 'omp_is_reference' ->
'lang_hooks.decls.omp_privatize_by_reference' would not "cause major
issues in formatting etc.": very most of the affected source code lines
are not going to overflow.)

>> What advantage do you see in removing that?
>
> For me, it's confusing, when looking at, say, 'OMP_CLAUSE_FIRSTPRIVATE'
> code, that in 'gcc/gimplify.c' we call
> 'lang_hooks.decls.omp_privatize_by_reference', whereas in 'gcc/omp-*.c'
> files we call 'omp_is_reference' -- but both actually mean the same
> thing.
>
>> > wonder and have to look up what special things it might be doing -- but
>> > it actually isn't.
>> >
>> >gcc/
>> > * omp-general.c (omp_is_reference): Don't define.  Adjust all 
>> > users.
>
> Or, of course, the other way round:
>
>   gcc/
> * gimplify.c: Use omp_is_reference.
>
> Or, even more preferably:
>
>   gcc/
>   * omp-general.c (omp_is_reference): Rename to...
> (omp_privatize_by_reference): ... this.  Adjust all users.
> * gimplify.c: Use it.

The latter one is what I had implemented and now tested: is the attached
"[OMP] Standardize on 'omp_privatize_by_reference'" OK to push to master
branch?


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From fb29fe81b4c8e880b32d68351385d8a42c97934b Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 29 May 2019 18:59:46 +0200
Subject: [PATCH] [OMP] Standardize on 'omp_privatize_by_reference'

... instead of 'omp_is_reference' vs.
'lang_hooks.decls.omp_privatize_by_reference'.

	gcc/
	* omp-general.h (omp_is_reference): Rename to...
	(omp_privatize_by_reference): ... this.  Adjust all users...
	* omp-general.c: ... here, ...
	* gimplify.c: ... here, ...
	* omp-expand.c: ... here, ...
	* omp-low.c: ... here.
---
 gcc/gimplify.c|  17 ++---
 gcc/omp-expand.c  |   9 +--
 gcc/omp-general.c |   5 +-
 gcc/omp-general.h |   2 +-
 gcc/omp-low.c | 154 --
 5 files changed, 98 insertions(+), 89 deletions(-)

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 070d0e4df45..cab4089192a 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -1831,7 +1831,8 @@ gimplify_decl_expr (tree *stmt_p, gimple_seq *seq_p)
 	  gimplify_and_add (init, seq_p);
 	  ggc_free (init);
 	  /* Clear TREE_READONLY if we really have an initialization.  */
-	  if (!DECL_INITIAL (decl) && !omp_is_reference (decl))
+	  if (!DECL_INITIAL (decl)
+		  && !omp_privatize_by_reference (decl))
 		TREE_READONLY (decl) = 0;
 	}
 	  else
@@ -7064,7 +7065,7 @@ omp_add_variable (struct gimplify_omp_ctx *ctx, tree decl, unsigned int flags)
 	omp_notice_variable (ctx, TYPE_SIZE_UNIT (TREE_TYPE (decl)), true);
 }
   else if ((flags & (GOVD_MAP | GOVD_LOCAL)) == 0
-	   && lang_hooks.decls.omp_privatize_by_reference (decl))
+	   && omp_privatize_by_reference (decl))
 {
   omp_firstprivatize_type_sizes (ctx, TREE_TYPE (decl));
 
@@ -7322,7 +7323,7 @@ oacc_default_clause (struct gimplify_omp_ctx *ctx, tree decl, unsigned flags)
   bool declared = is_oacc_declared (decl);
   tree type = TREE_TYPE (decl);
 
-  if (lang_hooks.decls.omp_privatize_by_reference (decl))
+  if (omp_privatize_by_reference (decl))
 type = TREE_TYPE (type);
 

[PATCH, V3 3/3] dwarf2out: Emit BTF in dwarf2out_finish for BPF CO-RE usecase

2021-08-31 Thread Indu Bhagat via Gcc-patches
DWARF generation is split between early and late phases when LTO is in effect.
This poses challenges for CTF/BTF generation especially if late debug info
generation is desirable, as turns out to be the case for BPF CO-RE.

The approach taken here in this patch is:

1. LTO is disabled for BPF CO-RE
The reason to disable LTO for BPF CO-RE is that if LTO is in effect, BPF CO-RE
relocations need to be generated in the LTO link phase _after_ the optimizations
are done. This means we need to devise way to combine early and late BTF. At
this time, in absence of linker support for BTF sections, it makes sense to
steer clear of LTO for BPF CO-RE and bypass the issue.

2. The BPF backend updates the write_symbols with BPF_WITH_CORE_DEBUG to convey
the case that BTF with CO-RE support needs to be generated.  This information
is used by the debug info emission routines to defer the emission of BTF/CO-RE
until dwarf2out_finish.

So, in other words,

dwarf2out_early_finish
  - Always emit CTF here.
  - if (BTF && !BTF_WITH_CORE), emit BTF now.

dwarf2out_finish
  - if (BTF_WITH_CORE) emit BTF now.

gcc/ChangeLog:

* dwarf2ctf.c (ctf_debug_finalize): Make it static.
(ctf_debug_early_finish): New definition.
(ctf_debug_finish): Likewise.
* dwarf2ctf.h (ctf_debug_finalize): Remove declaration.
(ctf_debug_early_finish): New declaration.
(ctf_debug_finish): Likewise.
* dwarf2out.c (dwarf2out_finish): Invoke ctf_debug_finish.
(dwarf2out_early_finish): Invoke ctf_debug_early_finish.
---
 gcc/dwarf2ctf.c | 54 +-
 gcc/dwarf2ctf.h |  4 +++-
 gcc/dwarf2out.c |  9 +++--
 3 files changed, 51 insertions(+), 16 deletions(-)

diff --git a/gcc/dwarf2ctf.c b/gcc/dwarf2ctf.c
index 5e8a725..b686baf 100644
--- a/gcc/dwarf2ctf.c
+++ b/gcc/dwarf2ctf.c
@@ -917,6 +917,27 @@ gen_ctf_type (ctf_container_ref ctfc, dw_die_ref die)
   return type_id;
 }
 
+/* Prepare for output and write out the CTF debug information.  */
+
+static void
+ctf_debug_finalize (const char *filename, bool btf)
+{
+  if (btf)
+{
+  btf_output (filename);
+  btf_finalize ();
+}
+
+  else
+{
+  /* Emit the collected CTF information.  */
+  ctf_output (filename);
+
+  /* Reset the CTF state.  */
+  ctf_finalize ();
+}
+}
+
 bool
 ctf_do_die (dw_die_ref die)
 {
@@ -966,24 +987,31 @@ ctf_debug_init_postprocess (bool btf)
 btf_init_postprocess ();
 }
 
-/* Prepare for output and write out the CTF debug information.  */
+/* Early finish CTF/BTF debug info.  */
 
 void
-ctf_debug_finalize (const char *filename, bool btf)
+ctf_debug_early_finish (const char * filename)
 {
-  if (btf)
-{
-  btf_output (filename);
-  btf_finalize ();
-}
+  /* Emit CTF debug info early always.  */
+  if (ctf_debug_info_level > CTFINFO_LEVEL_NONE
+  /* Emit BTF debug info early if CO-RE relocations are not
+required.  */
+  || (btf_debuginfo_p () && !btf_with_core_debuginfo_p ()))
+ctf_debug_finalize (filename, btf_debuginfo_p ());
+}
 
-  else
-{
-  /* Emit the collected CTF information.  */
-  ctf_output (filename);
+/* Finish CTF/BTF debug info emission.  */
 
-  /* Reset the CTF state.  */
-  ctf_finalize ();
+void
+ctf_debug_finish (const char * filename)
+{
+  /* Emit BTF debug info here when CO-RE relocations need to be generated.
+ BTF with CO-RE relocations needs to be generated when CO-RE is in effect
+ for the BPF target.  */
+  if (btf_with_core_debuginfo_p ())
+{
+  gcc_assert (btf_debuginfo_p ());
+  ctf_debug_finalize (filename, btf_debuginfo_p ());
 }
 }
 
diff --git a/gcc/dwarf2ctf.h b/gcc/dwarf2ctf.h
index a3cf567..9edbde0 100644
--- a/gcc/dwarf2ctf.h
+++ b/gcc/dwarf2ctf.h
@@ -24,13 +24,15 @@ along with GCC; see the file COPYING3.  If not see
 #define GCC_DWARF2CTF_H 1
 
 #include "dwarf2out.h"
+#include "flags.h"
 
 /* Debug Format Interface.  Used in dwarf2out.c.  */
 
 extern void ctf_debug_init (void);
 extern void ctf_debug_init_postprocess (bool);
 extern bool ctf_do_die (dw_die_ref);
-extern void ctf_debug_finalize (const char *, bool);
+extern void ctf_debug_early_finish (const char *);
+extern void ctf_debug_finish (const char *);
 
 /* Wrappers for CTF/BTF to fetch information from GCC DWARF DIE.  Used in
ctfc.c.
diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index 07a479f..3615e68 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -31913,6 +31913,11 @@ dwarf2out_finish (const char *filename)
   unsigned char checksum[16];
   char dl_section_ref[MAX_ARTIFICIAL_LABEL_BYTES];
 
+  /* Generate CTF/BTF debug info.  */
+  if ((ctf_debug_info_level > CTFINFO_LEVEL_NONE
+   || btf_debuginfo_p ()) && lang_GNU_C ())
+ctf_debug_finish (filename);
+
   /* Skip emitting DWARF if not required.  */
   if (!dwarf_debuginfo_p ())
 return;
@@ -32817,8 +32822,8 @@ dwarf2out_early_finish (const char *filename)
ctf_debug_do_cu 

[PATCH,V3 2/3] bpf: Add new -mco-re option for BPF CO-RE

2021-08-31 Thread Indu Bhagat via Gcc-patches
-mco-re in the BPF backend enables code generation for the CO-RE usecase. LTO is
disabled for CO-RE compilations.

gcc/ChangeLog:

* config/bpf/bpf.c (bpf_option_override): For BPF backend, disable LTO
support when compiling for CO-RE.
* config/bpf/bpf.opt: Add new command line option -mco-re.

gcc/testsuite/ChangeLog:

* gcc.target/bpf/core-lto-1.c: New test.
---
 gcc/config/bpf/bpf.c  | 25 +
 gcc/config/bpf/bpf.opt|  4 
 gcc/testsuite/gcc.target/bpf/core-lto-1.c |  9 +
 3 files changed, 38 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/bpf/core-lto-1.c

diff --git a/gcc/config/bpf/bpf.c b/gcc/config/bpf/bpf.c
index e635f9e..7228978 100644
--- a/gcc/config/bpf/bpf.c
+++ b/gcc/config/bpf/bpf.c
@@ -54,6 +54,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "builtins.h"
 #include "predict.h"
 #include "langhooks.h"
+#include "flags.h"
 
 /* Per-function machine data.  */
 struct GTY(()) machine_function
@@ -158,6 +159,30 @@ bpf_option_override (void)
 {
   /* Set the initializer for the per-function status structure.  */
   init_machine_status = bpf_init_machine_status;
+
+  /* BPF CO-RE support requires BTF debug info generation.  */
+  if (TARGET_BPF_CORE && !btf_debuginfo_p ())
+error ("BPF CO-RE requires BTF debugging information, use %<-gbtf%>");
+
+  /* To support the portability needs of BPF CO-RE approach, BTF debug
+ information includes the BPF CO-RE relocations.  */
+  if (TARGET_BPF_CORE)
+write_symbols |= BTF_WITH_CORE_DEBUG;
+
+  /* Unlike much of the other BTF debug information, the information necessary
+ for CO-RE relocations is added to the CTF container by the BPF backend.
+ Enabling LTO adds some complications in the generation of the BPF CO-RE
+ relocations because if LTO is in effect, the relocations need to be
+ generated late in the LTO link phase.  This poses a new challenge for the
+ compiler to now provide means to combine the early BTF and late BTF CO-RE
+ debug info, similar to DWARF debug info.  BTF/CO-RE debug info is not
+ amenable to such a split generation and a later merging.
+
+ In any case, in absence of linker support for BTF sections at this time,
+ it is acceptable to simply disallow LTO for BPF CO-RE compilations.  */
+
+  if (flag_lto && TARGET_BPF_CORE)
+sorry ("BPF CO-RE does not support LTO");
 }
 
 #undef TARGET_OPTION_OVERRIDE
diff --git a/gcc/config/bpf/bpf.opt b/gcc/config/bpf/bpf.opt
index 916b53c..4493067 100644
--- a/gcc/config/bpf/bpf.opt
+++ b/gcc/config/bpf/bpf.opt
@@ -127,3 +127,7 @@ Generate little-endian eBPF.
 mframe-limit=
 Target Joined RejectNegative UInteger IntegerRange(0, 32767) 
Var(bpf_frame_limit) Init(512)
 Set a hard limit for the size of each stack frame, in bytes.
+
+mco-re
+Target Mask(BPF_CORE)
+Generate all necessary information for BPF Compile Once - Run Everywhere.
diff --git a/gcc/testsuite/gcc.target/bpf/core-lto-1.c 
b/gcc/testsuite/gcc.target/bpf/core-lto-1.c
new file mode 100644
index 000..927de23
--- /dev/null
+++ b/gcc/testsuite/gcc.target/bpf/core-lto-1.c
@@ -0,0 +1,9 @@
+/* Test -mco-re with -flto.
+  
+   -mco-re is used to generate information for BPF CO-RE usecase. To support
+   the generataion of the .BTF and .BTF.ext sections in GCC, -flto is disabled
+   with -mco-re.  */
+
+/* { dg-do compile } */
+/* { dg-message "sorry, unimplemented: BPF CO-RE does not support LTO" "" { 
target bpf-*-* } 0 } */
+/* { dg-options "-gbtf -mco-re -flto" } */
-- 
1.8.3.1



[PATCH,V3 0/3] Allow means for late BTF generation for BPF CO-RE

2021-08-31 Thread Indu Bhagat via Gcc-patches
[Changes from V2]
- Instead of target hook, the patch set now adds a new debug format
BTF_WITH_CORE_DEBUG.
- Renamed the BPF option from -mcore to -mco-re.
- Adapted the commit logs a bit.
[End of Changes from V2]


Hello,

This patch series puts the framework in place for late BTF generation (in
dwarf2out_finish). This is needed for the landing of BPF CO-RE support in GCC,
patches for which were posted earlier -
https://gcc.gnu.org/pipermail/gcc-patches/2021-August/576719.html.

BPF's Compile Once - Run Everywhere (CO-RE) feature is used to make a compiled 
BPF program portable across kernel versions, all this without the need to
recompile the BPF program. A key part of BPF CO-RE capability is the BTF debug
info generated for them.

A traditional BPF program (non CO-RE) will have a .BTF section which contains
the type information in the BTF debug format.  In case of CO-RE, however, an 
additional section .BTF.ext section is used.  The .BTF.ext section contains
the CO-RE relocations. A BPF loader will use the .BTF.ext section along with the
associated .BTF section to adjust some references in the instructions of BPF
program to ensure it is compatible with the required kernel version / headers.

A .BTF.ext section contains the CO-RE relocation records. Roughly, each CO-RE
relocation record will contain the following info:
 - offset of BPF instruction to be patched
 - the BTF ID of the data structure being accessed by the instruction, and 
 - an offset to the "access string" - a BTF string which encodes a series of 
   field accesses to retrieve the field of interest in the instruction.

The .BTF.ext section does not have a string table of its own, so these "access
strings" are placed in the .BTF section string table. The CO-RE relocation
records refer to them by offset into the .BTF string table.

Example of access string encoding:

 struct S {
 int a;
 union {
 int _unused;
 int b;
 char c;
 } u[4];
 };

struct S *foo = ...;
int x  = foo->a;  /* encoded as "0:0" */
int y  = foo[4]->u[2].b   /* encoded as "4:1:2:1" */
char z = foo->u[3].c  /* encoded as "0:1:3:2" */

This means that writing out of a .BTF section needs to be delayed until after
these "access strings" have been added by the BPF backend, when CO-RE is in
effect.

High-level design
-
- The CTF container (CTFC) is populated with the compiler-internal
representation for the "type information" at dwarf2out_early_finish time.  This
information is used for generation of the .BTF section.
- For CO-RE relocation records, the information needed to generate .BTF.ext
section is added by the BPF backend to the CTF container (CTFC) at expand time.
- A new debug format BTF_WITH_CORE_DEBUG is being added.
- The BPF backend updates the write_symbols variable with BTF_WITH_CORE_DEBUG
debug format signalling the rest of the compiler that BPF/CO-RE is in effect,
and hence the need to generate the BTF CO-RE relocation records.
- BTF debug information is emitted late in dwarf2out_finish when
BTF_WITH_CORE_DEBUG debug format is requested by the user (implicitly via the
command line option -mco-re for the BPF backend).
- Combining early BTF and late BTF/CO-RE debug information is not feasible due
to the way BTF CO-RE format is defined and lack of linker support for the BTF
debug format.
- Lastly, LTO is disallowed to be used together with CO-RE for the BPF target.

Testing Notes

- Bootstrapped and reg tested on x86_64
- make all-gcc for --target=bpf-unknown-none; tested ctf.exp, btf.exp and 
bpf.exp

Thanks,

Indu Bhagat (3):
  debug: add BTF_WITH_CORE_DEBUG debug format
  bpf: Add new -mco-re option for BPF CO-RE
  dwarf2out: Emit BTF in dwarf2out_finish for BPF CO-RE usecase

 gcc/config/bpf/bpf.c  | 25 ++
 gcc/config/bpf/bpf.opt|  4 +++
 gcc/dwarf2ctf.c   | 54 +++
 gcc/dwarf2ctf.h   |  4 ++-
 gcc/dwarf2out.c   |  9 --
 gcc/flag-types.h  |  6 +++-
 gcc/flags.h   |  4 +++
 gcc/opts.c|  8 +
 gcc/testsuite/gcc.target/bpf/core-lto-1.c |  9 ++
 9 files changed, 106 insertions(+), 17 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/bpf/core-lto-1.c

-- 
1.8.3.1



[PATCH,V3 1/3] debug: add BTF_WITH_CORE_DEBUG debug format

2021-08-31 Thread Indu Bhagat via Gcc-patches
To best handle BTF/CO-RE in GCC, a distinct BTF_WITH_CORE_DEBUG debug format is
being added.  This helps the compiler detect whether BTF with CO-RE relocations
needs to be emitted.

gcc/ChangeLog:

* flag-types.h (enum debug_info_type): Add new enum
DINFO_TYPE_BTF_WITH_CORE.
(BTF_WITH_CORE_DEBUG): New bitmask.
* flags.h (btf_with_core_debuginfo_p): New declaration.
* opts.c (btf_with_core_debuginfo_p): New definition.
---
 gcc/flag-types.h | 6 +-
 gcc/flags.h  | 4 
 gcc/opts.c   | 8 
 3 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/gcc/flag-types.h b/gcc/flag-types.h
index 4fb1cb4..cc41b2a 100644
--- a/gcc/flag-types.h
+++ b/gcc/flag-types.h
@@ -31,7 +31,8 @@ enum debug_info_type
   DINFO_TYPE_VMS = 4,/* VMS debug info.  */
   DINFO_TYPE_CTF = 5,/* CTF debug info.  */
   DINFO_TYPE_BTF = 6,/* BTF debug info.  */
-  DINFO_TYPE_MAX = DINFO_TYPE_BTF /* Marker only.  */
+  DINFO_TYPE_BTF_WITH_CORE = 7,  /* BTF debug info with CO-RE 
relocations.  */
+  DINFO_TYPE_MAX = DINFO_TYPE_BTF_WITH_CORE /* Marker only.  */
 };
 
 #define NO_DEBUG  (0U)
@@ -47,6 +48,9 @@ enum debug_info_type
 #define CTF_DEBUG (1U << DINFO_TYPE_CTF)
 /* Write BTF debug info (using btfout.c).  */
 #define BTF_DEBUG (1U << DINFO_TYPE_BTF)
+/* Write BTF debug info for BPF CO-RE usecase (using btfout.c).  */
+#define BTF_WITH_CORE_DEBUG (1U << DINFO_TYPE_BTF_WITH_CORE)
+
 /* Note: Adding new definitions to handle -combination- of debug formats,
like VMS_AND_DWARF2_DEBUG is not recommended.  This definition remains
here for historical reasons.  */
diff --git a/gcc/flags.h b/gcc/flags.h
index afedef0..af61bcd 100644
--- a/gcc/flags.h
+++ b/gcc/flags.h
@@ -44,6 +44,10 @@ const char * debug_set_names (uint32_t w_symbols);
 
 extern bool btf_debuginfo_p ();
 
+/* Return true iff BTF with CO-RE debug info is enabled.  */
+
+extern bool btf_with_core_debuginfo_p ();
+
 /* Return true iff CTF debug info is enabled.  */
 
 extern bool ctf_debuginfo_p ();
diff --git a/gcc/opts.c b/gcc/opts.c
index e050155..1d2d22d 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -135,6 +135,14 @@ btf_debuginfo_p ()
   return (write_symbols & BTF_DEBUG);
 }
 
+/* Return TRUE iff BTF with CO-RE debug info is enabled.  */
+
+bool
+btf_with_core_debuginfo_p ()
+{
+  return (write_symbols & BTF_WITH_CORE_DEBUG);
+}
+
 /* Return TRUE iff CTF debug info is enabled.  */
 
 bool
-- 
1.8.3.1



Re: [wwwdocs] gcc-12/changes.html: nvptx - new __PTX_SM__ macro

2021-08-31 Thread Tom de Vries via Gcc-patches
On 8/30/21 12:54 PM, Tobias Burnus wrote:
> Document Roger's patch
> https://gcc.gnu.org/g:3c496e92d795a8fe5c527e3c5b5a6606669ae50d
> 
> OK? Suggestions?
> 

LGTM.

Thanks,
- Tom



[committed] More stabs removal

2021-08-31 Thread Jeff Law via Gcc-patches
Dropping stabs from cris, m32r, mn103/am33, xtensa & m32c.  No 
particular rhyme or reason to those ports.


Committed to the trunk,
Jeff


commit d158c3f77738e1d44aa117c1674e9ec8dee38661
Author: Jeff Law 
Date:   Tue Aug 31 09:48:02 2021 -0400

More stabs removal.

gcc/

* config.gcc (cris-*-elf, cris-*-none): Remove dbxelf.h from
tm_file.
(m32r-*-elf, m32rle-*-elf, m32r-*-linux): Likewise.
(mn10300-*-*, am33_2.0-*-linux*): Likewise.
(xtensa*-*-elf, xtensa*-*-linux, xtensa*-*-uclinux): Likewise.
(m32c-*-elf*, m32c-*-rtems*): Likewise.
* config/cris/cris.h (DBX_NO_XREFS): Remove.
(DBX_CONTIN_LENGTH, DBX_CONTIN_CHAR): Likewise.
* config/m32r/m32r.h (DBXOUT_SOURCE_LINE): Likewise.
(DBX_DEBUGGING_INFO, DBX_CONTIN_LENGTH): Likewise.
* config/mn10300/mn10300.h (DEFAULT_GDB_EXTENSIONS): Likewise.
* config/mn10300/linux.h (DBX_REGISTER_NAMES): Likewise.

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 0eba332bd45..e553ef34bc7 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -1532,7 +1532,7 @@ cr16-*-elf)
 use_collect2=no
 ;;
 cris-*-elf | cris-*-none)
-   tm_file="dbxelf.h elfos.h newlib-stdint.h ${tm_file}"
+   tm_file="elfos.h newlib-stdint.h ${tm_file}"
tmake_file="cris/t-cris cris/t-elfmulti"
gas=yes
extra_options="${extra_options} cris/elf.opt"
@@ -2293,13 +2293,13 @@ lm32-*-uclinux*)
tmake_file="${tmake_file} lm32/t-lm32"
 ;;
 m32r-*-elf*)
-   tm_file="dbxelf.h elfos.h newlib-stdint.h ${tm_file}"
+   tm_file="elfos.h newlib-stdint.h ${tm_file}"
;;
 m32rle-*-elf*)
-   tm_file="dbxelf.h elfos.h newlib-stdint.h m32r/little.h ${tm_file}"
+   tm_file="elfos.h newlib-stdint.h m32r/little.h ${tm_file}"
;;
 m32r-*-linux*)
-   tm_file="dbxelf.h elfos.h gnu-user.h linux.h glibc-stdint.h ${tm_file} 
m32r/linux.h"
+   tm_file="elfos.h gnu-user.h linux.h glibc-stdint.h ${tm_file} 
m32r/linux.h"
tmake_file="${tmake_file} m32r/t-linux t-slibgcc"
gnu_ld=yes
if test x$enable_threads = xyes; then
@@ -2307,7 +2307,7 @@ m32r-*-linux*)
fi
;;
 m32rle-*-linux*)
-   tm_file="dbxelf.h elfos.h gnu-user.h linux.h glibc-stdint.h 
m32r/little.h ${tm_file} m32r/linux.h"
+   tm_file="elfos.h gnu-user.h linux.h glibc-stdint.h m32r/little.h 
${tm_file} m32r/linux.h"
tmake_file="${tmake_file} m32r/t-linux t-slibgcc"
gnu_ld=yes
if test x$enable_threads = xyes; then
@@ -2736,11 +2736,7 @@ mmix-knuth-mmixware)
use_gcc_stdint=wrap
;;
 mn10300-*-*)
-   tm_file="dbxelf.h elfos.h newlib-stdint.h ${tm_file}"
-   if test x$stabs = xyes
-   then
-   tm_file="${tm_file} dbx.h"
-   fi
+   tm_file="elfos.h newlib-stdint.h ${tm_file}"
use_collect2=no
use_gcc_stdint=wrap
;;
@@ -3555,30 +3551,30 @@ xstormy16-*-elf)
tmake_file="stormy16/t-stormy16"
;;
 xtensa*-*-elf*)
-   tm_file="${tm_file} dbxelf.h elfos.h newlib-stdint.h xtensa/elf.h"
+   tm_file="${tm_file} elfos.h newlib-stdint.h xtensa/elf.h"
extra_options="${extra_options} xtensa/elf.opt"
;;
 xtensa*-*-linux*)
-   tm_file="${tm_file} dbxelf.h elfos.h gnu-user.h linux.h glibc-stdint.h 
xtensa/linux.h"
+   tm_file="${tm_file} elfos.h gnu-user.h linux.h glibc-stdint.h 
xtensa/linux.h"
tmake_file="${tmake_file} xtensa/t-xtensa"
;;
 xtensa*-*-uclinux*)
-   tm_file="${tm_file} dbxelf.h elfos.h gnu-user.h linux.h glibc-stdint.h 
xtensa/uclinux.h"
+   tm_file="${tm_file} elfos.h gnu-user.h linux.h glibc-stdint.h 
xtensa/uclinux.h"
tmake_file="${tmake_file} xtensa/t-xtensa"
extra_options="${extra_options} xtensa/uclinux.opt"
;;
 am33_2.0-*-linux*)
-   tm_file="mn10300/mn10300.h dbxelf.h elfos.h gnu-user.h linux.h 
glibc-stdint.h mn10300/linux.h"
+   tm_file="mn10300/mn10300.h elfos.h gnu-user.h linux.h glibc-stdint.h 
mn10300/linux.h"
gas=yes gnu_ld=yes
use_collect2=no
;;
 m32c-*-rtems*)
-   tm_file="dbxelf.h elfos.h ${tm_file} m32c/rtems.h rtems.h 
newlib-stdint.h"
+   tm_file="elfos.h ${tm_file} m32c/rtems.h rtems.h newlib-stdint.h"
c_target_objs="m32c-pragma.o"
cxx_target_objs="m32c-pragma.o"
;;
 m32c-*-elf*)
-   tm_file="dbxelf.h elfos.h newlib-stdint.h ${tm_file}"
+   tm_file="elfos.h newlib-stdint.h ${tm_file}"
c_target_objs="m32c-pragma.o"
cxx_target_objs="m32c-pragma.o"
;;
diff --git a/gcc/config/cris/cris.h b/gcc/config/cris/cris.h
index 1ab830e4d75..4276b6a76bf 100644
--- a/gcc/config/cris/cris.h
+++ b/gcc/config/cris/cris.h
@@ -901,24 +901,6 @@ struct cum_args {int regs;};
 /* FIXME: Investigate DEBUGGER_AUTO_OFFSET, DEBUGGER_ARG_OFFSET.  */
 
 
-/* Node: DBX Options */
-
-/* Is this correct? Check later

[PATCH] c++: fix cases of core1001/1322 by not dropping cv-qualifier of function parameter of type of typename or decltype[PR101402,PR102033,PR102034,PR102039,PR102044]

2021-08-31 Thread nick huang via Gcc-patches
These bugs are considered duplicate cases of PR51851 which has been suspended 
since 2012, an issue known as "core1001/1322". Considering this background, 
it deserves a long comment to explain.

Many people believed the root cause of this family of bugs is related with 
the nature of how and when the array type is converted to pointer type during 
function signature is calculated. This is true, but we may need to go into 
details 
to understand the exact reason.

There is a pattern for these bugs(PR101402,PR102033,PR102034,PR102039). In the 
template function declaration, the function parameter is consisted of a "const" 
followed by a typename-type which is actually an array type. According to 
standard, function signature is calculated by dropping so-called 
"top-level-cv-qualifier". As a result, the templater specialization complains 
no matching to declaration can be found because specialization has const and 
template function declaration doesn't have const which is dropped as mentioned.
Obviously the template function declaration should NOT drop the const. But why?
Let's review the procedure of standard first.
(https://timsong-cpp.github.io/cppwp/dcl.fct#5.sentence-3)

"After determining the type of each parameter, any parameter of type “array of 
T” 
or of function type T is adjusted to be “pointer to T”. After producing the 
list 
of parameter types, any top-level cv-qualifiers modifying a parameter type are 
deleted when forming the function type."

Please note the action of deleting top-level cv-qualifiers happens at last 
stage 
after array type is converted to pointer type. More importantly, there are two 
conditions:
a) Each type must be able to be determined.
b) The cv-qualifier must be top-level.
Let's analysis if these two conditions can be met one by one.
1) Keyword "typename" indicates inside template it involves dependent name
 (https://timsong-cpp.github.io/cppwp/n4659/temp.res#2) for which the name 
lookup 
can be postponed until template instantiation. Clearly the type of dependent 
name cannot be determined without name lookup. Then we can NOT proceed to next 
step until concrete template argument type is determined during specialization. 
2) After “array of T” is converted to “pointer to T”, the cv-qualifiers are no 
longer top-level! Unfortunately in standard there is no definition 
of "top-level". Mr. Dan Saks's articals 
(https://www.dansaks.com/articles.shtml) 
are tremendous help! Especially this wonderful paper 
(https://www.dansaks.com/articles/2000-02%20Top-Level%20cv-Qualifiers%20in%20Function%20Parameters.pdf)
  
discusses this topic in details. In one short sentence, the "const" before 
array type is NOT top-level-cv-qualifier and should NOT be dropped.

So, understanding the root cause makes the fix very clear: Let's NOT drop 
cv-qualifier for typename-type inside template. Leave this task for template
substitution later when template specialization locks template argument types.

Similarly inside template, "decltype" may also include dependent name and 
the best strategy for parser is to preserve all original declaration and 
postpone the task till template substitution.

Here is an interesting observation to share. Originally my fix is trying to 
use function "resolve_typename_type" to see if the "typename-type" is indeed
an array type so as to decide whether the const should be dropped. It works 
for cases of PR101402,PR102033(with a small fix of function), but cannot 
succeed on cases of PR102034,PR102039. Especially PR102039 is impossible 
because it depends on template argument. This helps me realize that parser 
should not do any work if it cannot be 100% successful. All can wait.

At last I want to acknowledge other efforts to tackle this core 1001/1322 from 
PR92010 which is an irreplaceable different approach from this fix by doing 
rebuilding template function signature during template substitution stage. 
After all, this fix can only deal with dependent type started with "typename"
or "decltype" which is not the case of pr92010.
 

gcc/cp/ChangeLog:

2021-08-30  qingzhe huang  

* decl.c (grokparms):

gcc/testsuite/ChangeLog:

2021-08-30  qingzhe huang  

* g++.dg/parse/pr101402.C: New test.
* g++.dg/parse/pr102033.C: New test.
* g++.dg/parse/pr102034.C: New test.
* g++.dg/parse/pr102039.C: New test.
* g++.dg/parse/pr102044.C: New test.


diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index e0c603aaab6..940c43ce707 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -14384,7 +14384,16 @@ grokparms (tree parmlist, tree *parms)
 
  /* Top-level qualifiers on the parameters are
 ignored for function types.  */
- type = cp_build_qualified_type (type, 0);
+
+ int type_quals = 0;
+ /* Inside template declaration, typename and decltype indicating
+dependent name and cv-qualifier are preserved until
+template instantiation.
+PR101402/PR102033/PR10

Re: [PATCH] Set bound/cmp/control for until wrap loop.

2021-08-31 Thread Richard Biener via Gcc-patches
On Tue, 31 Aug 2021, guojiufu wrote:

> On 2021-08-30 20:02, Richard Biener wrote:
> > On Mon, 30 Aug 2021, guojiufu wrote:
> > 
> >> On 2021-08-30 14:15, Jiufu Guo wrote:
> >> > Hi,
> >> >
> >> > In patch r12-3136, niter->control, niter->bound and niter->cmp are
> >> > derived from number_of_iterations_lt.  While for 'until wrap condition',
> >> > the calculation in number_of_iterations_lt is not align the requirements
> >> > on the define of them and requirements in determine_exit_conditions.
> >> >
> >> > This patch calculate niter->control, niter->bound and niter->cmp in
> >> > number_of_iterations_until_wrap.
> >> >
> >> > The ICEs in the PR are pass with this patch.
> >> > Bootstrap and reg-tests pass on ppc64/ppc64le and x86.
> >> > Is this ok for trunk?
> >> >
> >> > BR.
> >> > Jiufu Guo
> >> >
> >> Add ChangeLog:
> >> gcc/ChangeLog:
> >> 
> >> 2021-08-30  Jiufu Guo  
> >> 
> >> PR tree-optimization/102087
> >> * tree-ssa-loop-niter.c (number_of_iterations_until_wrap):
> >> Set bound/cmp/control for niter.
> >> 
> >> gcc/testsuite/ChangeLog:
> >> 
> >> 2021-08-30  Jiufu Guo  
> >> 
> >> PR tree-optimization/102087
> >> * gcc.dg/vect/pr101145_3.c: Update tests.
> >> * gcc.dg/pr102087.c: New test.
> >> 
> >> > ---
> >> >  gcc/tree-ssa-loop-niter.c  | 14 +-
> >> >  gcc/testsuite/gcc.dg/pr102087.c| 25 +
> >> >  gcc/testsuite/gcc.dg/vect/pr101145_3.c |  4 +++-
> >> >  3 files changed, 41 insertions(+), 2 deletions(-)
> >> >  create mode 100644 gcc/testsuite/gcc.dg/pr102087.c
> >> >
> >> > diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c
> >> > index 7af92d1c893..747f04d3ce0 100644
> >> > --- a/gcc/tree-ssa-loop-niter.c
> >> > +++ b/gcc/tree-ssa-loop-niter.c
> >> > @@ -1482,7 +1482,7 @@ number_of_iterations_until_wrap (class loop *,
> >> > tree type, affine_iv *iv0,
> >> >   affine_iv *iv1, class tree_niter_desc 
> >> > *niter)
> >> >  {
> >> >tree niter_type = unsigned_type_for (type);
> >> > -  tree step, num, assumptions, may_be_zero;
> >> > +  tree step, num, assumptions, may_be_zero, span;
> >> >wide_int high, low, max, min;
> >> >
> >> >may_be_zero = fold_build2 (LE_EXPR, boolean_type_node, iv1->base,
> >> > iv0->base);
> >> > @@ -1513,6 +1513,8 @@ number_of_iterations_until_wrap (class loop *,
> >> > tree type, affine_iv *iv0,
> >> >   low = wi::to_wide (iv0->base);
> >> >  else
> >> >  low = min;
> >> > +
> >> > +  niter->control = *iv1;
> >> >  }
> >> >/* {base, -C} < n.  */
> >> >else if (tree_int_cst_sign_bit (iv0->step) && integer_zerop
> >> > (iv1->step))
> >> > @@ -1533,6 +1535,8 @@ number_of_iterations_until_wrap (class loop *,
> >> > tree type, affine_iv *iv0,
> >> >   high = wi::to_wide (iv1->base);
> >> >  else
> >> >  high = max;
> >> > +
> >> > +  niter->control = *iv0;
> >> >  }
> >> >else
> >> >  return false;
> > 
> > it looks like the above two should already be in effect from the
> > caller (guarding with integer_nozerop)?
> 
> I add them just because set these fields in one function.
> Yes, they have been set in caller already,  I could remove them here.
> 
> > 
> >> > @@ -1556,6 +1560,14 @@ number_of_iterations_until_wrap (class loop *,
> >> > tree type, affine_iv *iv0,
> >> >niter->assumptions, assumptions);
> >> >
> >> >niter->control.no_overflow = false;
> >> > +  niter->control.base = fold_build2 (MINUS_EXPR, niter_type,
> >> > + niter->control.base,
> >> > niter->control.step);
> > 
> > how do we know IVn - STEP doesn't already wrap?
> 
> The last IV value is just cross the max/min value of the type
> at the last iteration,  then IVn - STEP is the nearest value
> to max(or min) and not wrap.
> 
> > A comment might be
> > good to explain you're turning the simplified exit condition into
> > 
> >{ IVbase - STEP, +, STEP } != niter * STEP + (IVbase - STEP)
> > 
> > which, when mathematically looking at it makes me wonder why there's
> > the seemingly redundant '- STEP' term?  Also is NE_EXPR really
> > correct since STEP might be not 1?  Only for non equality compares
> > the '- STEP' should matter?
> 
> I need to add comments for this.  This is a little tricky.
> The last value of the original IV just cross max/min at most one STEP,
> at there wrapping already happen.
> Using "{IVbase, +, STEP} != niter * STEP + IVbase" is not wrong
> in the aspect of exit condition.
> 
> But this would not work well with existing code:
> like determine_exit_conditions, which will convert NE_EXP to
> LT_EXPR/GT_EXPR.  And so, the '- STEP' is added to adjust the
> IV.base and bound, with '- STEP' the bound will be the last value
> just before wrap.

Hmm.  The control IV is documented as

  /* The simplified shape of the exit condition.  The loop exits if
 CONTROL CMP BOUND is false, where CMP is one of NE_EXPR,
 LT_EXPR

[PATCH 4/5]AArch64 sve: optimize add reduction patterns

2021-08-31 Thread Tamar Christina via Gcc-patches
Hi All,

The following loop does a conditional reduction using an add:

#include 

int32_t f (int32_t *restrict array, int len, int min)
{
  int32_t iSum = 0;

  for (int i=0; i= min)
   iSum += array[i];
  }
  return iSum;
}

for this we currently generate:

mov z1.b, #0
mov z2.s, w2
mov z3.d, z1.d
ptrue   p2.b, all
ld1wz0.s, p0/z, [x0, x3, lsl 2]
cmpge   p1.s, p2/z, z0.s, z2.s
add x3, x3, x4
sel z0.s, p1, z0.s, z3.s
add z1.s, p0/m, z1.s, z0.s
whilelo p0.s, w3, w1

where the SEL is unneeded as it's selecting between 0 or a value.  This can be
optimized to just doing the conditional add on p1 instead of p0.  After this
patch we generate:

mov z2.s, w2
mov z0.b, #0
ptrue   p1.b, all
ld1wz1.s, p0/z, [x0, x3, lsl 2]
cmpge   p0.s, p0/z, z1.s, z2.s
add x3, x3, x4
add z0.s, p0/m, z0.s, z1.s
whilelo p0.s, w3, w1

and so we drop the SEL and the 0 move.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* match.pd: New rule.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/pred-cond-reduc.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/match.pd b/gcc/match.pd
index 
19cbad7592787a568d4a7cfd62746d5844c0be5f..ec98a302ac773647413f776fba15930ad247c747
 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -6978,6 +6978,18 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 && element_precision (type) == element_precision (op_type))
 (view_convert (cond_op @2 @3 @4 @5 (view_convert:op_type @1)))
 
+/* Detect simplication for a conditional reduction where
+
+   a = mask1 ? b : 0
+   c = mask2 ? d + a : d
+
+   is turned into
+
+   c = mask1 && mask2 ? d + b : d.  */
+(simplify
+  (IFN_COND_ADD @0 @1 (vec_cond @2 @3 integer_zerop) @1)
+   (IFN_COND_ADD (bit_and @0 @2) @1 @3 @1))
+
 /* For pointers @0 and @2 and nonnegative constant offset @1, look for
expressions like:
 
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pred-cond-reduc.c 
b/gcc/testsuite/gcc.target/aarch64/sve/pred-cond-reduc.c
new file mode 100644
index 
..bd53025d3f17224004244dadc88e0c68ded23f12
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pred-cond-reduc.c
@@ -0,0 +1,18 @@
+/* { dg-do assemble { target aarch64_asm_sve_ok } } */
+/* { dg-options "-O3 --save-temps" } */
+
+#include 
+
+int32_t f (int32_t *restrict array, int len, int min)
+{
+  int32_t iSum = 0;
+
+  for (int i=0; i= min)
+   iSum += array[i];
+  }
+  return iSum;
+}
+
+
+/* { dg-final { scan-assembler-not {\tsel\tz[0-9]+\.s, p1, z[0-9]+\.s, 
z[0-9]+\.s} } } */


-- 
diff --git a/gcc/match.pd b/gcc/match.pd
index 19cbad7592787a568d4a7cfd62746d5844c0be5f..ec98a302ac773647413f776fba15930ad247c747 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -6978,6 +6978,18 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 && element_precision (type) == element_precision (op_type))
 (view_convert (cond_op @2 @3 @4 @5 (view_convert:op_type @1)))
 
+/* Detect simplication for a conditional reduction where
+
+   a = mask1 ? b : 0
+   c = mask2 ? d + a : d
+
+   is turned into
+
+   c = mask1 && mask2 ? d + b : d.  */
+(simplify
+  (IFN_COND_ADD @0 @1 (vec_cond @2 @3 integer_zerop) @1)
+   (IFN_COND_ADD (bit_and @0 @2) @1 @3 @1))
+
 /* For pointers @0 and @2 and nonnegative constant offset @1, look for
expressions like:
 
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pred-cond-reduc.c b/gcc/testsuite/gcc.target/aarch64/sve/pred-cond-reduc.c
new file mode 100644
index ..bd53025d3f17224004244dadc88e0c68ded23f12
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pred-cond-reduc.c
@@ -0,0 +1,18 @@
+/* { dg-do assemble { target aarch64_asm_sve_ok } } */
+/* { dg-options "-O3 --save-temps" } */
+
+#include 
+
+int32_t f (int32_t *restrict array, int len, int min)
+{
+  int32_t iSum = 0;
+
+  for (int i=0; i= min)
+   iSum += array[i];
+  }
+  return iSum;
+}
+
+
+/* { dg-final { scan-assembler-not {\tsel\tz[0-9]+\.s, p1, z[0-9]+\.s, z[0-9]+\.s} } } */



[PATCH 3/5]AArch64 sve: do not keep negated mask and inverse mask live at the same time

2021-08-31 Thread Tamar Christina via Gcc-patches
Hi All,

The following example:

void f11(double * restrict z, double * restrict w, double * restrict x,
 double * restrict y, int n)
{
for (int i = 0; i < n; i++) {
z[i] = (w[i] > 0) ? w[i] : y[i];
}
}

Generates currently:

ptrue   p2.b, all
ld1dz0.d, p0/z, [x1, x2, lsl 3]
fcmgt   p1.d, p2/z, z0.d, #0.0
bic p3.b, p2/z, p0.b, p1.b
ld1dz1.d, p3/z, [x3, x2, lsl 3]

and after the previous patches generates:

ptrue   p3.b, all
ld1dz0.d, p0/z, [x1, x2, lsl 3]
fcmgt   p1.d, p0/z, z0.d, #0.0
fcmgt   p2.d, p3/z, z0.d, #0.0
not p1.b, p0/z, p1.b
ld1dz1.d, p1/z, [x3, x2, lsl 3]

where a duplicate comparison is performed for w[i] > 0.

This is because in the vectorizer we're emitting a comparison for both a and ~a
where we just need to emit one of them and invert the other.  After this patch
we generate:

ld1dz0.d, p0/z, [x1, x2, lsl 3]
fcmgt   p1.d, p0/z, z0.d, #0.0
mov p2.b, p1.b
not p1.b, p0/z, p1.b
ld1dz1.d, p1/z, [x3, x2, lsl 3]

In order to perform the check I have to fully expand the NOT stmts when
recording them as the SSA names for the top level expressions differ but
their arguments don't. e.g. in _31 = ~_34 the value of _34 differs but not
the operands in _34.

But we only do this when the operation is an ordered one because mixing
ordered and unordered expressions can lead to de-optimized code.

Note: This patch series is working incrementally towards generating the most
  efficient code for this and other loops in small steps. The mov is
  created by postreload when it does a late CSE.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* fold-const.c (tree_comparison_ordered_p): New.
* fold-const.h (tree_comparison_ordered_p): New.
* tree-vect-stmts.c (vectorizable_condition): Check if inverse of mask
is live.
* tree-vectorizer.c (scalar_cond_masked_key::get_cond_ops_from_tree):
Register mask inverses.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/pred-not-gen.c: Update testcase.

--- inline copy of patch -- 
diff --git a/gcc/fold-const.h b/gcc/fold-const.h
index 
7bac84ba33145c17d1dac9afe70bbd1c89a4b3fa..852fc37b25023a108410fcf375604d082357efa2
 100644
--- a/gcc/fold-const.h
+++ b/gcc/fold-const.h
@@ -144,6 +144,7 @@ extern enum tree_code swap_tree_comparison (enum tree_code);
 
 extern bool ptr_difference_const (tree, tree, poly_int64_pod *);
 extern enum tree_code invert_tree_comparison (enum tree_code, bool);
+extern bool tree_comparison_ordered_p (enum tree_code);
 extern bool inverse_conditions_p (const_tree, const_tree);
 
 extern bool tree_unary_nonzero_warnv_p (enum tree_code, tree, tree, bool *);
diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index 
7dcecc9a5c08d56703075229f762f750ed6c5d93..04991457db7e5166e8ce17d4bfa3b107f619dbc1
 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -2669,6 +2669,37 @@ invert_tree_comparison (enum tree_code code, bool 
honor_nans)
 }
 }
 
+/* Given a tree comparison code return whether the comparison is for an
+   ordered expression or not.  */
+
+bool
+tree_comparison_ordered_p (enum tree_code code)
+{
+  switch (code)
+{
+case EQ_EXPR:
+case NE_EXPR:
+case GT_EXPR:
+case GE_EXPR:
+case LT_EXPR:
+case LE_EXPR:
+case LTGT_EXPR:
+  return true;
+case UNEQ_EXPR:
+case UNGT_EXPR:
+case UNGE_EXPR:
+case UNLT_EXPR:
+case UNLE_EXPR:
+case ORDERED_EXPR:
+case UNORDERED_EXPR:
+  return false;
+default:
+  gcc_unreachable ();
+}
+}
+
+
+
 /* Similar, but return the comparison that results if the operands are
swapped.  This is safe for floating-point.  */
 
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pred-not-gen.c 
b/gcc/testsuite/gcc.target/aarch64/sve/pred-not-gen.c
index 
18d5cf8dcb46e227aecfcbacb833670427ed0586..e4251de32fe347d6193d6f964a74d30e28f5d128
 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/pred-not-gen.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pred-not-gen.c
@@ -24,7 +24,6 @@ void f10(double * restrict z, double * restrict w, double * 
restrict x, double *
 ** f11:
 ** ...
 ** ld1dz0.d, p0/z, \[x1, x2, lsl 3\]
-** fcmgt   p2.d, p3/z, z0.d, #0.0
 ** fcmgt   p1.d, p0/z, z0.d, #0.0
 ** not p1.b, p0/z, p1.b
 ** ld1dz1.d, p1/z, \[x3, x2, lsl 3\]
@@ -55,5 +54,3 @@ void f12(int * restrict z, int * restrict w, int * restrict 
x, int * restrict y,
 }
 }
 
-/* { dg-final { scan-assembler-not {\tbic\t} } } */
-/* { dg-final { scan-assembler-times {\tnot\tp[0-9]+\.b, p[0-9]+/z, 
p[0-9]+\.b\n} 2 } } */
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 
074dfdcf385f31f2ba753012131985544dfd69f8..54cce92066c058d85ad010091c0c0eb6716f8979
 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c

[PATCH 2/5]AArch64 sve: combine nested if predicates

2021-08-31 Thread Tamar Christina via Gcc-patches
Hi All,

The following example

void f5(float * restrict z0, float * restrict z1, float *restrict x,
float * restrict y, float c, int n)
{
for (int i = 0; i < n; i++) {
float a = x[i];
float b = y[i];
if (a > b) {
z0[i] = a + b;
if (a > c) {
z1[i] = a - b;
}
}
}
}

generates currently:

ptrue   p3.b, all
ld1wz1.s, p1/z, [x2, x5, lsl 2]
ld1wz2.s, p1/z, [x3, x5, lsl 2]
fcmgt   p0.s, p3/z, z1.s, z0.s
fcmgt   p2.s, p1/z, z1.s, z2.s
fcmgt   p0.s, p0/z, z1.s, z2.s

The conditions for a > b and a > c become separate comparisons.

After this patch using a 2 -> 2 split we generate:

ld1wz1.s, p0/z, [x2, x5, lsl 2]
ld1wz2.s, p0/z, [x3, x5, lsl 2]
fcmgt   p1.s, p0/z, z1.s, z2.s
fcmgt   p1.s, p1/z, z1.s, z0.s

Where the condition a > b && a > c are folded by using the predicate result of
the previous compare and thus allows the removal of one of the compares.

Note: This patch series is working incrementally towards generating the most
  efficient code for this and other loops in small steps.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64-sve.md (*mask_cmp_and_combine): New.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/pred-combine-and.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/config/aarch64/aarch64-sve.md 
b/gcc/config/aarch64/aarch64-sve.md
index 
2c23c6b12bafb038d82920e7141a418e078a2c65..ee9d32c0a5534209689d9d3abaa560ee5b66347d
 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -8162,6 +8162,48 @@ (define_insn_and_split "*mask_inv_combine"
 }
 )
 
+;; Combine multiple masks where the comparisons operators are the same and
+;; each comparison has one parameter shared. e.g. combine a > b && a > c
+(define_insn_and_split "*mask_cmp_and_combine"
+  [(set (match_operand: 0 "register_operand" "=Upa")
+   (and:
+ (and:
+   (unspec:
+ [(match_operand: 1)
+  (const_int SVE_KNOWN_PTRUE)
+  (match_operand:SVE_FULL_F 2 "register_operand" "w")
+  (match_operand:SVE_FULL_F 3 "aarch64_simd_reg_or_zero" "wDz")]
+ SVE_COND_FP_CMP_I0)
+   (unspec:
+ [(match_dup 1)
+  (const_int SVE_KNOWN_PTRUE)
+  (match_dup 2)
+  (match_operand:SVE_FULL_F 4 "aarch64_simd_reg_or_zero" "wDz")]
+ SVE_COND_FP_CMP_I0))
+   (match_operand: 5 "register_operand" "Upa")))
+   (clobber (match_scratch: 6 "=&Upa"))]
+  "TARGET_SVE"
+  "#"
+  "&& 1"
+  [(set (match_dup 6)
+   (unspec:
+ [(match_dup 5)
+  (const_int SVE_MAYBE_NOT_PTRUE)
+  (match_dup 2)
+  (match_dup 3)]
+ SVE_COND_FP_CMP_I0))
+   (set (match_dup 0)
+   (unspec:
+ [(match_dup 6)
+  (const_int SVE_MAYBE_NOT_PTRUE)
+  (match_dup 2)
+  (match_dup 4)]
+ SVE_COND_FP_CMP_I0))]
+{
+  operands[6] = gen_reg_rtx (mode);
+}
+)
+
 ;; -
 ;;  [FP] Absolute comparisons
 ;; -
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pred-combine-and.c 
b/gcc/testsuite/gcc.target/aarch64/sve/pred-combine-and.c
new file mode 100644
index 
..d395b7f84bb15b588493611df5a47549726ac24a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pred-combine-and.c
@@ -0,0 +1,18 @@
+/* { dg-do assemble { target aarch64_asm_sve_ok } } */
+/* { dg-options "-O3 --save-temps" } */
+
+void f5(float * restrict z0, float * restrict z1, float *restrict x, float * 
restrict y, float c, int n)
+{
+for (int i = 0; i < n; i++) {
+float a = x[i];
+float b = y[i];
+if (a > b) {
+z0[i] = a + b;
+if (a > c) {
+z1[i] = a - b;
+}
+}
+}
+}
+
+/* { dg-final { scan-assembler-times {\tfcmgt\tp[0-9]+\.s, p[0-9]+/z, 
z[0-9]+\.s, z[0-9]+\.s} 2 } } */


-- 
diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md
index 2c23c6b12bafb038d82920e7141a418e078a2c65..ee9d32c0a5534209689d9d3abaa560ee5b66347d 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -8162,6 +8162,48 @@ (define_insn_and_split "*mask_inv_combine"
 }
 )
 
+;; Combine multiple masks where the comparisons operators are the same and
+;; each comparison has one parameter shared. e.g. combine a > b && a > c
+(define_insn_and_split "*mask_cmp_and_combine"
+  [(set (match_operand: 0 "register_operand" "=Upa")
+	(and:
+	  (and:
+	(unspec:
+	  [(match_operand: 1)
+	   (const_int SVE_KNOWN_PTRUE)
+	   (match_operand:SVE_FULL_F 2 "register_oper

[PATCH 1/5]AArch64 sve: combine inverted masks into NOTs

2021-08-31 Thread Tamar Christina via Gcc-patches
Hi All,

The following example

void f10(double * restrict z, double * restrict w, double * restrict x,
 double * restrict y, int n)
{
for (int i = 0; i < n; i++) {
z[i] = (w[i] > 0) ? x[i] + w[i] : y[i] - w[i];
}
}

generates currently:

ld1dz1.d, p1/z, [x1, x5, lsl 3]
fcmgt   p2.d, p1/z, z1.d, #0.0
fcmgt   p0.d, p3/z, z1.d, #0.0
ld1dz2.d, p2/z, [x2, x5, lsl 3]
bic p0.b, p3/z, p1.b, p0.b
ld1dz0.d, p0/z, [x3, x5, lsl 3]

where a BIC is generated between p1 and p0 where a NOT would be better here
since we won't require the use of p3 and opens the pattern up to being CSEd.

After this patch using a 2 -> 2 split we generate:

ld1dz1.d, p0/z, [x1, x5, lsl 3]
fcmgt   p2.d, p0/z, z1.d, #0.0
not p1.b, p0/z, p2.b

The additional scratch is needed such that we can CSE the two operations.  If
both statements wrote to the same register then CSE won't be able to CSE the
values if there are other statements in between that use the register.

Note: This patch series is working incrementally towards generating the most
  efficient code for this and other loops in small steps.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64-sve.md (*mask_inv_combine): New.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/pred-not-gen.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/config/aarch64/aarch64-sve.md 
b/gcc/config/aarch64/aarch64-sve.md
index 
359fe0e457096cf4042a774789a5c241420703d3..2c23c6b12bafb038d82920e7141a418e078a2c65
 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -8126,6 +8126,42 @@ (define_insn_and_split "*fcmuo_and_combine"
  UNSPEC_COND_FCMUO))]
 )
 
+;; Make sure that inversions of masked comparisons are always on the mask
+;; instead of on the operation.
+(define_insn_and_split "*mask_inv_combine"
+  [(set (match_operand: 0 "register_operand" "=Upa")
+   (and:
+ (and:
+   (not:
+ (unspec:
+   [(match_operand: 1)
+(const_int SVE_KNOWN_PTRUE)
+(match_operand:SVE_FULL_F 2 "register_operand" "w")
+(match_operand:SVE_FULL_F 3 "aarch64_simd_reg_or_zero" "wDz")]
+   SVE_COND_FP_CMP_I0))
+   (match_operand: 4 "register_operand" "Upa"))
+ (match_dup: 1)))
+   (clobber (match_scratch: 5 "=&Upa"))]
+  "TARGET_SVE"
+  "#"
+  "&& 1"
+  [(set (match_dup 5)
+   (unspec:
+ [(match_dup 4)
+  (const_int SVE_MAYBE_NOT_PTRUE)
+  (match_dup 2)
+  (match_dup 3)]
+ SVE_COND_FP_CMP_I0))
+   (set (match_dup 0)
+   (and:
+ (not:
+   (match_dup 5))
+ (match_dup 4)))]
+{
+  operands[5] = gen_reg_rtx (mode);
+}
+)
+
 ;; -
 ;;  [FP] Absolute comparisons
 ;; -
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pred-not-gen.c 
b/gcc/testsuite/gcc.target/aarch64/sve/pred-not-gen.c
new file mode 100644
index 
..a5bb616ef505a63075cf33203de8cf8e8c38b95d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pred-not-gen.c
@@ -0,0 +1,56 @@
+/* { dg-do assemble { target aarch64_asm_sve_ok } } */
+/* { dg-options "-O3 --save-temps -fno-schedule-insns -fno-schedule-insns2" } 
*/
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+/*
+** f10:
+** ...
+** ld1dz1.d, p0/z, \[x1, x5, lsl 3\]
+** fcmgt   p2.d, p0/z, z1.d, #0.0
+** ld1dz2.d, p2/z, \[x2, x5, lsl 3\]
+** not p1.b, p0/z, p2.b
+** ld1dz0.d, p1/z, \[x3, x5, lsl 3\]
+** ...
+*/
+
+void f10(double * restrict z, double * restrict w, double * restrict x, double 
* restrict y, int n)
+{
+for (int i = 0; i < n; i++) {
+z[i] = (w[i] > 0) ? x[i] + w[i] : y[i] - w[i];
+}
+}
+
+/*
+** f11:
+** ...
+** ld1dz0.d, p0/z, \[x1, x2, lsl 3\]
+** fcmgt   p2.d, p3/z, z0.d, #0.0
+** fcmgt   p1.d, p0/z, z0.d, #0.0
+** not p1.b, p0/z, p1.b
+** ld1dz1.d, p1/z, \[x3, x2, lsl 3\]
+** ...
+*/
+
+void f11(double * restrict z, double * restrict w, double * restrict x, double 
* restrict y, int n)
+{
+for (int i = 0; i < n; i++) {
+z[i] = (w[i] > 0) ? w[i] : y[i];
+}
+}
+
+
+/*
+** f12:
+** ...
+** ld1wz1.s, p0/z, \[x1, x2, lsl 2\]
+** cmple   p1.s, p0/z, z1.s, #0
+** ld1wz0.s, p1/z, \[x3, x2, lsl 2\]
+** ...
+*/
+
+void f12(int * restrict z, int * restrict w, int * restrict x, int * restrict 
y, int n)
+{
+for (int i = 0; i < n; i++) {
+z[i] = (w[i] > 0) ? w[i] : y[i];
+}
+}


-- 
diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md
index 359fe0e457096cf4042a774789a5c241420703d3..2c23c6b1

[PATCH 2/2]AArch64: Add better costing for vector constants and operations

2021-08-31 Thread Tamar Christina via Gcc-patches
Hi All,

This patch adds extended costing to cost the creation of constants and the
manipulation of constants.  The default values provided are based on
architectural expectations and each cost models can be individually tweaked as
needed.

The changes in this patch covers:

* Construction of PARALLEL or CONST_VECTOR:
  Adds better costing for vector of constants which is based on the constant
  being created and the instruction that can be used to create it.  i.e. a movi
  is cheaper than a literal load etc.
* Construction of a vector through a vec_dup.
* Extraction of part of a vector using a vec_select.  In this part we had to
  make some opportunistic assumptions.  In particular we had to model extracting
  of the high-half of a register as being "free" in order to get fusion using
  NEON high-part instructions possible.  In the event that there is no 2
  variant for the instruction the select would still be cheaper than the load.

Unfortunately on AArch64 you need -O3 when using intrinsics for this to kick
in until we fix vld1/2/3 to be gimple instead of RTL intrinsics.

This should also fix the stack allocations.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/arm/aarch-common-protos.h (struct vector_cost_table): Add
movi, dup and extract costing fields.
* config/aarch64/aarch64-cost-tables.h (qdf24xx_extra_costs,
thunderx_extra_costs, thunderx2t99_extra_costs,
thunderx3t110_extra_costs, tsv110_extra_costs, a64fx_extra_costs): Use
them.
* config/arm/aarch-cost-tables.h (generic_extra_costs,
cortexa53_extra_costs, cortexa57_extra_costs, cortexa76_extra_costs,
exynosm1_extra_costs, xgene1_extra_costs): Likewise
* config/aarch64/aarch64-simd.md (aarch64_simd_dup): Add r->w dup.
* config/aarch64/aarch64.c (aarch64_simd_make_constant): Expose.
(aarch64_rtx_costs): Add extra costs.
(aarch64_simd_dup_constant): Support check only mode.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/vect-cse-codegen.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/config/aarch64/aarch64-cost-tables.h 
b/gcc/config/aarch64/aarch64-cost-tables.h
index 
dd2e7e7cbb13d24f0b51092270cd7e2d75fabf29..bb499a1eae62a145f1665d521f57c98b49ac5389
 100644
--- a/gcc/config/aarch64/aarch64-cost-tables.h
+++ b/gcc/config/aarch64/aarch64-cost-tables.h
@@ -124,7 +124,10 @@ const struct cpu_cost_table qdf24xx_extra_costs =
   /* Vector */
   {
 COSTS_N_INSNS (1),  /* alu.  */
-COSTS_N_INSNS (4)   /* mult.  */
+COSTS_N_INSNS (4),  /* mult.  */
+COSTS_N_INSNS (1),  /* movi.  */
+COSTS_N_INSNS (2),  /* dup.  */
+COSTS_N_INSNS (2)   /* extract.  */
   }
 };
 
@@ -229,7 +232,10 @@ const struct cpu_cost_table thunderx_extra_costs =
   /* Vector */
   {
 COSTS_N_INSNS (1), /* Alu.  */
-COSTS_N_INSNS (4)  /* mult.  */
+COSTS_N_INSNS (4), /* mult.  */
+COSTS_N_INSNS (1), /* movi.  */
+COSTS_N_INSNS (2), /* dup.  */
+COSTS_N_INSNS (2)  /* extract.  */
   }
 };
 
@@ -333,7 +339,10 @@ const struct cpu_cost_table thunderx2t99_extra_costs =
   /* Vector */
   {
 COSTS_N_INSNS (1), /* Alu.  */
-COSTS_N_INSNS (4)  /* Mult.  */
+COSTS_N_INSNS (4), /* Mult.  */
+COSTS_N_INSNS (1), /* movi.  */
+COSTS_N_INSNS (2), /* dup.  */
+COSTS_N_INSNS (2)  /* extract.  */
   }
 };
 
@@ -437,7 +446,10 @@ const struct cpu_cost_table thunderx3t110_extra_costs =
   /* Vector */
   {
 COSTS_N_INSNS (1), /* Alu.  */
-COSTS_N_INSNS (4)  /* Mult.  */
+COSTS_N_INSNS (4), /* Mult.  */
+COSTS_N_INSNS (1), /* movi.  */
+COSTS_N_INSNS (2), /* dup.  */
+COSTS_N_INSNS (2)  /* extract.  */
   }
 };
 
@@ -542,7 +554,10 @@ const struct cpu_cost_table tsv110_extra_costs =
   /* Vector */
   {
 COSTS_N_INSNS (1),  /* alu.  */
-COSTS_N_INSNS (4)   /* mult.  */
+COSTS_N_INSNS (4),  /* mult.  */
+COSTS_N_INSNS (1),  /* movi.  */
+COSTS_N_INSNS (2),  /* dup.  */
+COSTS_N_INSNS (2)   /* extract.  */
   }
 };
 
@@ -646,7 +661,10 @@ const struct cpu_cost_table a64fx_extra_costs =
   /* Vector */
   {
 COSTS_N_INSNS (1),  /* alu.  */
-COSTS_N_INSNS (4)   /* mult.  */
+COSTS_N_INSNS (4),  /* mult.  */
+COSTS_N_INSNS (1),  /* movi.  */
+COSTS_N_INSNS (2),  /* dup.  */
+COSTS_N_INSNS (2)   /* extract.  */
   }
 };
 
diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 
c5638d096fa84a27b4ea397f62cd0d05a28e7c8c..6814dae079c9ff40aaa2bb625432bf9eb8906b73
 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -74,12 +74,14 @@ (define_insn "aarch64_simd_dup"
 )
 
 (define_insn "aarch64_simd_dup"
-  [(set (match_operand:VDQF_F16 0 "register_operand" "=w")
+  [(set (match_operand:VDQF_F16 0 "register_operand" "=w,w")
(vec_duplicate:VDQF_F16
- (match_operand: 1 "register_operand" "w")))]

[PATCH 1/2]middle-end Teach CSE to be able to do vector extracts.

2021-08-31 Thread Tamar Christina via Gcc-patches
Hi All,

This patch gets CSE to re-use constants already inside a vector rather than
re-materializing the constant again.

Basically consider the following case:

#include 
#include 

uint64_t
test (uint64_t a, uint64x2_t b, uint64x2_t* rt)
{
  uint64_t arr[2] = { 0x0942430810234076UL, 0x0942430810234076UL};
  uint64_t res = a | arr[0];
  uint64x2_t val = vld1q_u64 (arr);
  *rt = vaddq_u64 (val, b);
  return res;
}

The actual behavior is inconsequential however notice that the same constants
are used in the vector (arr and later val) and in the calculation of res.

The code we generate for this however is quite sub-optimal:

test:
adrpx2, .LC0
sub sp, sp, #16
ldr q1, [x2, #:lo12:.LC0]
mov x2, 16502
movkx2, 0x1023, lsl 16
movkx2, 0x4308, lsl 32
add v1.2d, v1.2d, v0.2d
movkx2, 0x942, lsl 48
orr x0, x0, x2
str q1, [x1]
add sp, sp, 16
ret
.LC0:
.xword  667169396713799798
.xword  667169396713799798

Essentially we materialize the same constant twice.  The reason for this is
because the front-end lowers the constant extracted from arr[0] quite early on.
If you look into the result of fre you'll find

   :
  arr[0] = 667169396713799798;
  arr[1] = 667169396713799798;
  res_7 = a_6(D) | 667169396713799798;
  _16 = __builtin_aarch64_ld1v2di (&arr);
  _17 = VIEW_CONVERT_EXPR(_16);
  _11 = b_10(D) + _17;
  *rt_12(D) = _11;
  arr ={v} {CLOBBER};
  return res_7;

Which makes sense for further optimization.  However come expand time if the
constant isn't representable in the target arch it will be assigned to a
register again.

(insn 8 5 9 2 (set (reg:V2DI 99)
(const_vector:V2DI [
(const_int 667169396713799798 [0x942430810234076]) repeated x2
])) "cse.c":7:12 -1
 (nil))
...
(insn 14 13 15 2 (set (reg:DI 103)
(const_int 667169396713799798 [0x942430810234076])) "cse.c":8:12 -1
 (nil))
(insn 15 14 16 2 (set (reg:DI 102 [ res ])
(ior:DI (reg/v:DI 96 [ a ])
(reg:DI 103))) "cse.c":8:12 -1
 (nil))

And since it's out of the immediate range of the scalar instruction used
combine won't be able to do anything here.

This will then trigger the re-materialization of the constant twice.

To fix this this patch extends CSE to be able to generate an extract for a
constant from another vector, or to make a vector for a constant by duplicating
another constant.

Whether this transformation is done or not depends entirely on the costing for
the target for the different constants and operations.

I Initially also investigated doing this in PRE, but PRE requires at least 2 BB
to work and does not currently have any way to remove redundancies within a
single BB and it did not look easy to support.

Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* cse.c (find_sets_in_insn): Register constants in sets.
(cse_insn): Try materializing using vec_dup.

--- inline copy of patch -- 
diff --git a/gcc/cse.c b/gcc/cse.c
index 
330c1e90ce05b8f95b58f24576ec93e10ec55d89..d76e01b6478e22e9dd5760b7c78cecb536d7daef
 100644
--- a/gcc/cse.c
+++ b/gcc/cse.c
@@ -44,6 +44,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "regs.h"
 #include "function-abi.h"
 #include "rtlanal.h"
+#include "expr.h"
 
 /* The basic idea of common subexpression elimination is to go
through the code, keeping a record of expressions that would
@@ -4274,6 +4275,25 @@ find_sets_in_insn (rtx_insn *insn, struct set **psets)
 someplace else, so it isn't worth cse'ing.  */
   else if (GET_CODE (SET_SRC (x)) == CALL)
;
+  else if (GET_CODE (SET_SRC (x)) == CONST_VECTOR
+  && GET_MODE_CLASS (GET_MODE (SET_SRC (x))) != MODE_VECTOR_BOOL)
+   {
+ /* First register the vector itself.  */
+ sets[n_sets++].rtl = x;
+ rtx src = SET_SRC (x);
+ machine_mode elem_mode = GET_MODE_INNER (GET_MODE (src));
+ /* Go over the constants of the CONST_VECTOR in forward order, to
+put them in the same order in the SETS array.  */
+ for (unsigned i = 0; i < const_vector_encoded_nelts (src) ; i++)
+   {
+ /* These are templates and don't actually get emitted but are
+used to tell CSE how to get to a particular constant.  */
+ rtx tmp = gen_rtx_PARALLEL (VOIDmode,
+ gen_rtvec (1, GEN_INT (i)));
+ rtx y = gen_rtx_VEC_SELECT (elem_mode, SET_DEST (x), tmp);
+ sets[n_sets++].rtl = gen_rtx_SET (y, CONST_VECTOR_ELT (src, i));
+   }
+   }
   else
sets[n_sets++].rtl = x;
 }
@@ -4513,7 +4533,14 @@ cse_insn (rtx_insn *insn)
   struct set *sets = (struct set *) 0;
 
   if (GET_CODE (x) == SET)
-sets = XALLOCA (struct set);
+{
+  /* F

[PATCH]AArch64 RFC: Don't cost all scalar operations during vectorization if scalar will fuse

2021-08-31 Thread Tamar Christina via Gcc-patches
Hi All,

As the vectorizer has improved over time in capabilities it has started
over-vectorizing.  This has causes regressions in the order of 1-7x on libraries
that Arm produces.

The vector costs actually do make a lot of sense and I don't think that they are
wrong.  I think that the costs for the scalar code are wrong.

In particular the costing doesn't take into effect that scalar operation
can/will fuse as this happens in RTL.  Because of this the costs for the scalars
end up being always higher.

As an example the loop in PR 97984:

void x (long * __restrict a, long * __restrict b)
{
  a[0] *= b[0];
  a[1] *= b[1];
  a[0] += b[0];
  a[1] += b[1];
}

generates:

x:
ldp x2, x3, [x0]
ldr x4, [x1]
ldr q1, [x1]
mul x2, x2, x4
ldr x4, [x1, 8]
fmovd0, x2
ins v0.d[1], x3
mul x1, x3, x4
ins v0.d[1], x1
add v0.2d, v0.2d, v1.2d
str q0, [x0]
ret

On an actual loop the prologue costs would make the loop too expensive so we
produce the scalar output, but with SLP there's no loop overhead costs so we end
up trying to vectorize this. Because SLP discovery is started from the stores we
will end up vectorizing and costing the add but not the MUL.

To counter this the patch adjusts the costing when it finds an operation that
can be fused and discounts the cost of the "other" operation being fused in.

The attached testcase shows that even when we discount it we still get still get
vectorized code when profitable to do so, e.g. SVE.

This happens as well with other operations such as scalar operations where
shifts can be fused in or for e.g. bfxil.  As such sending this for feedback.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master? If the approach is acceptable I can add support for more.

Thanks,
Tamar

gcc/ChangeLog:

PR target/97984
* config/aarch64/aarch64.c (aarch64_add_stmt_cost): Check for fusing
madd.

gcc/testsuite/ChangeLog:

PR target/97984
* gcc.target/aarch64/pr97984-1.c: New test.
* gcc.target/aarch64/pr97984-2.c: New test.
* gcc.target/aarch64/pr97984-3.c: New test.
* gcc.target/aarch64/pr97984-4.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
4cd4b037f2606e515ad8f4669d2cd13a509dd0a4..329b556311310d86aaf546d7b395a3750a9d57d4
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -15536,6 +15536,39 @@ aarch64_add_stmt_cost (class vec_info *vinfo, void 
*data, int count,
stmt_cost = aarch64_sve_adjust_stmt_cost (vinfo, kind, stmt_info,
  vectype, stmt_cost);
 
+  /* Scale costs if operation is fusing.  */
+  if (stmt_info && kind == scalar_stmt)
+  {
+   if (gassign *stmt = dyn_cast (STMT_VINFO_STMT (stmt_info)))
+ {
+   switch (gimple_assign_rhs_code (stmt))
+   {
+   case PLUS_EXPR:
+   case MINUS_EXPR:
+ {
+   /* Check if operation can fuse into MSUB or MADD.  */
+   tree rhs1 = gimple_assign_rhs1 (stmt);
+   if (gassign *stmt1 = dyn_cast (SSA_NAME_DEF_STMT 
(rhs1)))
+ if (gimple_assign_rhs_code (stmt1) == MULT_EXPR)
+   {
+ stmt_cost = 0;
+ break;
+  }
+   tree rhs2 = gimple_assign_rhs2 (stmt);
+   if (gassign *stmt2 = dyn_cast (SSA_NAME_DEF_STMT 
(rhs2)))
+ if (gimple_assign_rhs_code (stmt2) == MULT_EXPR)
+   {
+ stmt_cost = 0;
+ break;
+   }
+ }
+ break;
+   default:
+ break;
+   }
+ }
+  }
+
   if (stmt_info && aarch64_use_new_vector_costs_p ())
{
  /* Account for any extra "embedded" costs that apply additively
diff --git a/gcc/testsuite/gcc.target/aarch64/pr97984-1.c 
b/gcc/testsuite/gcc.target/aarch64/pr97984-1.c
new file mode 100644
index 
..9d403eb76ec3a72747f47e718a88ed6b062643f9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr97984-1.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O3 -fdump-tree-slp-all" } */
+
+void x (long * __restrict a, long * __restrict b)
+{
+  a[0] *= b[0];
+  a[1] *= b[1];
+  a[0] += b[0];
+  a[1] += b[1];
+}
+
+/* { dg-final { scan-tree-dump-times "not vectorized: vectorization is not 
profitable" 1 "slp2" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/pr97984-2.c 
b/gcc/testsuite/gcc.target/aarch64/pr97984-2.c
new file mode 100644
index 
..a4086380fd613035f7ce3e8e8c89e853efa1304e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr97984-2.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } *

[PATCH]AArch64[RFC] Force complicated constant to memory when beneficial

2021-08-31 Thread Tamar Christina via Gcc-patches
Hi All,

Consider the following case

#include 

uint64_t
test4 (uint8x16_t input)
{
uint8x16_t bool_input = vshrq_n_u8(input, 7);
poly64x2_t mask = vdupq_n_p64(0x0102040810204080UL);
poly64_t prodL = vmull_p64((poly64_t)vgetq_lane_p64((poly64x2_t)bool_input, 
0),
vgetq_lane_p64(mask, 0));
poly64_t prodH = vmull_high_p64((poly64x2_t)bool_input, mask);
uint8x8_t res = vtrn2_u8((uint8x8_t)prodL, (uint8x8_t)prodH);
return vget_lane_u16((uint16x4_t)res, 3);
}

which generates (after my CSE patches):

test4:
ushrv0.16b, v0.16b, 7
mov x0, 16512
movkx0, 0x1020, lsl 16
movkx0, 0x408, lsl 32
movkx0, 0x102, lsl 48
fmovd1, x0
pmull   v2.1q, v0.1d, v1.1d
dup v1.2d, v1.d[0]
pmull2  v0.1q, v0.2d, v1.2d
trn2v2.8b, v2.8b, v0.8b
umovw0, v2.h[3]
re

which is suboptimal since the constant is never needed on the genreg side and
should have been materialized on the SIMD side since the constant is so big
that it requires 5 instruction to create otherwise. 4 mov/movk and one fmov.

The problem is that the choice of on which side to materialize the constant can
only be done during reload.  We may need an extra register (to hold the
addressing) and so can't be done after reload.

I have tried to support this with a pattern during reload, but the problem is I
can't seem to find a way to tell reload it should spill a constant under
condition x.  Instead I tried with a split which reload selects when the
condition hold.

This has a couple of issues:

1. The pattern can be expanded late (could be fixed with !reload_completed).
2. Because it's split so late we can't seem to be able to share the anchors for
   the ADRP.
3. Because it's split so late and basically reload doesn't know about the spill
   and so the ADD lo12 isn't pushed into the addressing mode of the LDR.

I don't know how to properly fix these since I think the only way is for reload
to do the spill properly itself, but in this case not having the patter makes it
avoid the mem pattern and pick r <- n instead followed by r -> w.

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64.md (*movdi_aarch6): Add Dx -> W.
* config/aarch64/constraints.md (Dx): New.

--- inline copy of patch -- 
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 
eb8ccd4b97bbd4f0c3ff5791e48cfcfb42ec6c2e..a18886cb65c86daa16baa1691b1718f2d3a1be6c
 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -1298,8 +1298,8 @@ (define_insn_and_split "*movsi_aarch64"
 )
 
 (define_insn_and_split "*movdi_aarch64"
-  [(set (match_operand:DI 0 "nonimmediate_operand" "=r,k,r,r,r,r,r, r,w, m,m,  
r,  r, w,r,w, w")
-   (match_operand:DI 1 "aarch64_mov_operand"  " 
r,r,k,N,M,n,Usv,m,m,rZ,w,Usa,Ush,rZ,w,w,Dd"))]
+  [(set (match_operand:DI 0 "nonimmediate_operand" "=r,k,r,r,r,r,w  ,r  ,r,w, 
m,m,  r,  r, w,r,w,w")
+   (match_operand:DI 1 "aarch64_mov_operand"  " 
r,r,k,N,M,n,Dx,Usv,m,m,rZ,w,Usa,Ush,rZ,w,w,Dd"))]
   "(register_operand (operands[0], DImode)
 || aarch64_reg_or_zero (operands[1], DImode))"
   "@
@@ -1309,6 +1309,7 @@ (define_insn_and_split "*movdi_aarch64"
mov\\t%x0, %1
mov\\t%w0, %1
#
+   #
* return aarch64_output_sve_cnt_immediate (\"cnt\", \"%x0\", operands[1]);
ldr\\t%x0, %1
ldr\\t%d0, %1
@@ -1321,17 +1322,27 @@ (define_insn_and_split "*movdi_aarch64"
fmov\\t%d0, %d1
* return aarch64_output_scalar_simd_mov_immediate (operands[1], DImode);"
"(CONST_INT_P (operands[1]) && !aarch64_move_imm (INTVAL (operands[1]), 
DImode))
-&& REG_P (operands[0]) && GP_REGNUM_P (REGNO (operands[0]))"
+&& REG_P (operands[0])
+&& (GP_REGNUM_P (REGNO (operands[0]))
+   || (can_create_pseudo_p ()
+   && !aarch64_can_const_movi_rtx_p (operands[1], DImode)))"
[(const_int 0)]
"{
-   aarch64_expand_mov_immediate (operands[0], operands[1]);
+   if (GP_REGNUM_P (REGNO (operands[0])))
+aarch64_expand_mov_immediate (operands[0], operands[1]);
+   else
+{
+  rtx mem = force_const_mem (DImode, operands[1]);
+  gcc_assert (mem);
+  emit_move_insn (operands[0], mem);
+}
DONE;
 }"
   ;; The "mov_imm" type for CNTD is just a placeholder.
-  [(set_attr "type" "mov_reg,mov_reg,mov_reg,mov_imm,mov_imm,mov_imm,mov_imm,
+  [(set_attr "type" 
"mov_reg,mov_reg,mov_reg,mov_imm,mov_imm,mov_imm,mov_imm,mov_imm,
 load_8,load_8,store_8,store_8,adr,adr,f_mcr,f_mrc,fmov,
 neon_move")
-   (set_attr "arch" "*,*,*,*,*,*,sve,*,fp,*,fp,*,*,fp,fp,fp,simd")]
+   (set_attr "arch" "*,*,*,*,*,*,simd,sve,*,fp,*,fp,*,*,fp,fp,fp,simd")]
 )
 
 (define_insn "insv_imm"
diff --git a/gcc/config/aarch64/constraints.md 
b/gcc/config/aarch64/constraints.md
index 
3b49b452119c49320020fa9183314d9a25b92491..422d95b50a8e9608b57f

Re: libgo patch committed: Update to Go1.17rc2 release

2021-08-31 Thread H.J. Lu via Gcc-patches
On Thu, Aug 12, 2021 at 8:24 PM Ian Lance Taylor via Gcc-patches
 wrote:
>
> This patch updates libgo from the Go1.16.5 release to the Go 1.17rc2
> release.  As usual with these version updates, the patch itself is too
> large to attach to this e-mail message.  I've attached the changes to
> files that are specific to gccgo.  Bootstraped and ran Go testsuite on
> x86_64-pc-linux-gnu.  Committed to mainline.
>
> Ian

This breaks build with x32:

/export/gnu/import/git/gitlab/x86-gcc/libgo/go/runtime/hash64.go:35:30:
error: integer constant overflow
   35 | seed ^= hashkey[0] ^ m1
  |  ^
/export/gnu/import/git/gitlab/x86-gcc/libgo/go/runtime/hash64.go:61:50:
error: integer constant overflow
   61 | seed = mix(r8(p)^m2, r8(add(p, 8))^seed)
  |  ^
/export/gnu/import/git/gitlab/x86-gcc/libgo/go/runtime/hash64.go:62:60:
error: integer constant overflow
   62 | seed1 = mix(r8(add(p, 16))^m3,
r8(add(p, 24))^seed1)
  |^
/export/gnu/import/git/gitlab/x86-gcc/libgo/go/runtime/hash64.go:63:60:
error: integer constant overflow
   63 | seed2 = mix(r8(add(p, 32))^m4,
r8(add(p, 40))^seed2)
  |^
/export/gnu/import/git/gitlab/x86-gcc/libgo/go/runtime/hash64.go:69:42:
error: integer constant overflow
   69 | seed = mix(r8(p)^m2, r8(add(p, 8))^seed)
  |  ^
/export/gnu/import/git/gitlab/x86-gcc/libgo/go/runtime/hash64.go:76:20:
error: integer constant overflow
   76 | return mix(m5^s, mix(a^m2, b^seed))
  |^
/export/gnu/import/git/gitlab/x86-gcc/libgo/go/runtime/hash64.go:76:32:
error: integer constant overflow
   76 | return mix(m5^s, mix(a^m2, b^seed))
  |^
/export/gnu/import/git/gitlab/x86-gcc/libgo/go/runtime/hash64.go:81:22:
error: integer constant overflow
   81 | return mix(m5^4, mix(a^m2, a^seed^hashkey[0]^m1))
  |  ^
/export/gnu/import/git/gitlab/x86-gcc/libgo/go/runtime/hash64.go:81:32:
error: integer constant overflow
   81 | return mix(m5^4, mix(a^m2, a^seed^hashkey[0]^m1))
  |^
/export/gnu/import/git/gitlab/x86-gcc/libgo/go/runtime/hash64.go:81:54:
error: integer constant overflow
   81 | return mix(m5^4, mix(a^m2, a^seed^hashkey[0]^m1))

The problem is that hashkey is an array of uintptr, but hash64.go is enabled for
many targets with 32-bit uintptr:

commit c5b21c3f4c17b0649155035d2f9aa97b2da8a813
Author: Ian Lance Taylor 
Date:   Fri Jul 30 14:28:58 2021 -0700

libgo: update to Go1.17rc2

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/341629

diff --git a/libgo/go/runtime/hash64.go b/libgo/go/runtime/hash64.go
index 704bbe6f62b..4b32d515c4b 100644
--- a/libgo/go/runtime/hash64.go
+++ b/libgo/go/runtime/hash64.go
@@ -3,113 +3,98 @@
 // license that can be found in the LICENSE file.

 // Hashing algorithm inspired by
-//   xxhash: https://code.google.com/p/xxhash/
-// cityhash: https://code.google.com/p/cityhash/
+// wyhash: https://github.com/wangyi-fudan/wyhash

+//go:build amd64 || arm64 || mips64 || mips64le || ppc64 || ppc64le
|| riscv64 || s390x || wasm || alpha || amd64p32 || arm64be || ia64 ||
mips64p32 || mips64p32le || sparc64
 // +build amd64 arm64 mips64 mips64le ppc64 ppc64le riscv64 s390x
wasm alpha amd64p32 arm64be ia64 mips64p32 mips64p32le sparc64


-- 
H.J.


RE: [PATCH] Preserve SUBREG_PROMOTED_VAR_P on (extend:HI (subreg/s:QI (reg:SI)))

2021-08-31 Thread Roger Sayle

Hi Christophe,
I'm testing the attached patch, but without an aarch64, it'll take a while to 
figure
out the toolchain to reproduce the failure.  Neither of the platforms I tested 
were
affected, but I can see it's unsafe to reuse the subreg_promoted_reg idiom from
just a few lines earlier.  Any help testing the attached patch on an affected 
target
would be much appreciated.

Sorry for the inconvenience.
Roger
--

-Original Message-
From: Christophe LYON  
Sent: 31 August 2021 13:32
To: Roger Sayle ; 'GCC Patches' 

Subject: Re: [PATCH] Preserve SUBREG_PROMOTED_VAR_P on (extend:HI (subreg/s:QI 
(reg:SI)))


On 29/08/2021 09:46, Roger Sayle wrote:
> SUBREG_PROMOTED_VAR_P is a mechanism for tracking that a partial 
> subreg is correctly zero-extended or sign-extended in the parent 
> register.  For example, the RTL (subreg/s/v:QI (reg/v:SI 23 [ x ]) 0) 
> indicates that the byte x is zero extended in reg:SI 23, which is useful for 
> optimization.
> An example is that zero extending the above QImode value to HImode can 
> simply use a wider subreg, i.e. (subreg:HI (reg/v:SI 23 [ x ]) 0).
>
> This patch addresses the oversight/missed optimization opportunity 
> that the new HImode subreg above should retain its 
> SUBREG_PROMOTED_VAR_P annotation as its value is guaranteed to be 
> correctly extended in the SImode parent.  The code below to preserve 
> SUBREG_PROMOTED_VAR_P is already present in the middle-end (e.g. 
> simplify-rtx.c:7232-7242) but missing from one or two (precisely three) 
> places that (accidentally) strip it.
>
> Whilst there I also added another optimization.  If we need to extend 
> the above QImode value beyond the SImode register holding it, say to 
> DImode, we can eliminate the SUBREG and simply extend from the SImode 
> register to DImode.
>
> This patch has been tested on x86_64-pc-linux-gnu with "make bootstrap"
> and "make -k check" with no new failures, and on a cross-compiler to 
> nvptx-none, where the function "long foo(char x) { return x; }" now 
> requires one less instruction.
>
> OK for mainline?


Hi,

This patch causes an ICE when building an aarch64 toolchain:

during RTL pass: expand
In file included from
/tmp/5987050_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/soft-fp/soft-fp.h:318,
  from
/tmp/5987050_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/soft-fp/floatditf.c:32:
/tmp/5987050_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/soft-fp/floatditf.c:
 
In function '__floatditf':
/tmp/5987050_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/soft-fp/op-2.h:249:37:
 
internal compiler error: in subreg_promoted_mode, at rtl.h:3132
   249 |   _FP_PACK_RAW_2_flo.bits.exp   = X##_e;\
/tmp/5987050_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/soft-fp/quad.h:229:33:
 
note: in expansion of macro '_FP_PACK_RAW_2'
   229 | # define FP_PACK_RAW_Q(val, X)  _FP_PACK_RAW_2 (Q, (val), X)
   | ^~
/tmp/5987050_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/soft-fp/floatditf.c:42:3:
 
note: in expansion of macro 'FP_PACK_RAW_Q'
42 |   FP_PACK_RAW_Q (a, A);
   |   ^
0xa0b53a subreg_promoted_mode

/tmp/5987050_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/rtl.h:3132
0xa0b53a convert_modes(machine_mode, machine_mode, rtx_def*, int)
/tmp/5987050_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:699
0xa003bc expand_expr_real_2(separate_ops*, rtx_def*, machine_mode,
expand_modifier)
/tmp/5987050_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:9091
0xa0765c expand_expr_real_1(tree_node*, rtx_def*, machine_mode, 
expand_modifier, rtx_def**, bool)
/tmp/5987050_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:10497
0x9fcef6 expand_expr_real_2(separate_ops*, rtx_def*, machine_mode,
expand_modifier)
/tmp/5987050_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:9798
0xa0765c expand_expr_real_1(tree_node*, rtx_def*, machine_mode, 
expand_modifier, rtx_def**, bool)
/tmp/5987050_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:10497
0xa1099e expand_expr
/tmp/5987050_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.h:301
0xa1099e expand_operands(tree_node*, tree_node*, rtx_def*, rtx_def**, 
rtx_def**, expand_modifier)
/tmp/5987050_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:8308
0x9fcdff expand_expr_real_2(separate_ops*, rtx_def*, machine_mode,
expand_modifier)
/tmp/5987050_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:10288


Can you check?


Thanks,

Christophe


>
> 2021-08-29  Roger Sayle  
>
> gcc/ChangeLog
>   * expr.c (convert_modes): Preserve SUBREG_PROMOTED_VAR_P when
>   creating a (wider) partial subreg from a SUBREG_PROMOTED_VAR_P
>   subreg.
>   * simplify-rtx.c (simplify_unary_operation_1) [SIGN_EXTEND]:
>   Likewise, preserve SUBREG_PROMOTED_VAR_P when creating a (wider)
>   partial subreg from a SUBREG_PROMOTED_VAR_P subreg.  Generate
>   SIGN_EXTEND of the SUBREG_REG when a su

[committed] libstdc++: Fix 17_intro/names.cc failures on Solaris

2021-08-31 Thread Jonathan Wakely via Gcc-patches
Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* testsuite/17_intro/names.cc: Undefine some more names used
by Solaris system headers.

Tested x86_64-linux and sparc-solaris2.11. Committed to trunk.

commit 69b09c5599b201ac039db564c303f7b20d87e0df
Author: Jonathan Wakely 
Date:   Tue Aug 31 10:25:53 2021

libstdc++: Fix 17_intro/names.cc failures on Solaris

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* testsuite/17_intro/names.cc: Undefine some more names used
by Solaris system headers.

diff --git a/libstdc++-v3/testsuite/17_intro/names.cc 
b/libstdc++-v3/testsuite/17_intro/names.cc
index b945511e088..b5e926fb09f 100644
--- a/libstdc++-v3/testsuite/17_intro/names.cc
+++ b/libstdc++-v3/testsuite/17_intro/names.cc
@@ -243,6 +243,12 @@
 #endif
 
 #ifdef __sun__
+//  defines these as members of fex_numeric_t
+#undef l
+#undef f
+#undef d
+#undef q
+#undef p
 // See https://gcc.gnu.org/ml/libstdc++/2019-05/msg00175.html
 #undef ptr
 #endif


Re: [PATCH] testsuite: Fix gcc.dg/vect/pr101145* tests [PR101145]

2021-08-31 Thread Richard Biener via Gcc-patches
On Tue, 31 Aug 2021, Jakub Jelinek wrote:

> Hi!
> 
> I'm getting:
> FAIL: gcc.dg/vect/pr101145.c scan-tree-dump-times vect "vectorized 1 loops" 7
> FAIL: gcc.dg/vect/pr101145_1.c scan-tree-dump-times vect "vectorized 1 loops" 
> 2
> FAIL: gcc.dg/vect/pr101145_2.c scan-tree-dump-times vect "vectorized 1 loops" 
> 2
> FAIL: gcc.dg/vect/pr101145_3.c scan-tree-dump-times vect "vectorized 1 loops" 
> 2
> FAIL: gcc.dg/vect/pr101145.c -flto -ffat-lto-objects  scan-tree-dump-times 
> vect "vectorized 1 loops" 7
> FAIL: gcc.dg/vect/pr101145_1.c -flto -ffat-lto-objects  scan-tree-dump-times 
> vect "vectorized 1 loops" 2
> FAIL: gcc.dg/vect/pr101145_2.c -flto -ffat-lto-objects  scan-tree-dump-times 
> vect "vectorized 1 loops" 2
> FAIL: gcc.dg/vect/pr101145_3.c -flto -ffat-lto-objects  scan-tree-dump-times 
> vect "vectorized 1 loops" 2
> on i686-linux (or x86_64-linux with -m32/-mno-sse).
> The problem is that those tests use dg-options, which in */vect/ testsuite
> throws away all the carefully added default options to enable vectorization
> on each target (and which e.g. vect_int etc. effective targets rely on).
> The old way would be to name those tests gcc.dg/vect/O3-pr101145*,
> but we can also use dg-additional-options (which doesn't throw the default
> options, just appends to them) which is IMO better so that we don't have to
> rename the tests.
> 
> Tested on x86_64-linux and i686-linux, ok for trunk?

OK.

Richard.

> 2021-08-31  Jakub Jelinek  
> 
>   PR tree-optimization/102072
>   * gcc.dg/vect/pr101145.c: Use dg-additional-options with just -O3
>   instead of dg-options with -O3 -fdump-tree-vect-details.
>   * gcc.dg/vect/pr101145_1.c: Likewise.
>   * gcc.dg/vect/pr101145_2.c: Likewise.
>   * gcc.dg/vect/pr101145_3.c: Likewise.
> 
> --- gcc/testsuite/gcc.dg/vect/pr101145.c.jj   2021-08-30 08:36:11.295515537 
> +0200
> +++ gcc/testsuite/gcc.dg/vect/pr101145.c  2021-08-31 14:04:35.691964573 
> +0200
> @@ -1,5 +1,5 @@
>  /* { dg-require-effective-target vect_int } */
> -/* { dg-options "-O3 -fdump-tree-vect-details" } */
> +/* { dg-additional-options "-O3" } */
>  #include 
>  
>  unsigned __attribute__ ((noinline))
> --- gcc/testsuite/gcc.dg/vect/pr101145_1.c.jj 2021-08-30 08:36:11.295515537 
> +0200
> +++ gcc/testsuite/gcc.dg/vect/pr101145_1.c2021-08-31 14:04:55.083691474 
> +0200
> @@ -1,5 +1,5 @@
>  /* { dg-require-effective-target vect_int } */
> -/* { dg-options "-O3 -fdump-tree-vect-details" } */
> +/* { dg-additional-options "-O3" } */
>  #define TYPE signed char
>  #define MIN -128
>  #define MAX 127
> --- gcc/testsuite/gcc.dg/vect/pr101145_2.c.jj 2021-08-30 08:36:11.295515537 
> +0200
> +++ gcc/testsuite/gcc.dg/vect/pr101145_2.c2021-08-31 14:05:05.868539591 
> +0200
> @@ -1,5 +1,5 @@
>  /* { dg-require-effective-target vect_int } */
> -/* { dg-options "-O3 -fdump-tree-vect-details" } */
> +/* { dg-additional-options "-O3" } */
>  #define TYPE unsigned char
>  #define MIN 0
>  #define MAX 255
> --- gcc/testsuite/gcc.dg/vect/pr101145_3.c.jj 2021-08-30 08:36:11.295515537 
> +0200
> +++ gcc/testsuite/gcc.dg/vect/pr101145_3.c2021-08-31 14:05:17.903370103 
> +0200
> @@ -1,5 +1,5 @@
>  /* { dg-require-effective-target vect_int } */
> -/* { dg-options "-O3 -fdump-tree-vect-details" } */
> +/* { dg-additional-options "-O3" } */
>  #define TYPE int *
>  #define MIN ((TYPE)0)
>  #define MAX ((TYPE)((long long)-1))
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)


Re: [PATCH] PR middle-end/100810: Penalize IV candidates with undefined value bases

2021-08-31 Thread Richard Biener via Gcc-patches
On Tue, Aug 31, 2021 at 8:22 AM Roger Sayle  wrote:
>
>
> Time to hopefully earn some goodwill from the team; this patch fixes
> a P1 wrong-code-on-valid regression in ivopts.  Many thanks to Andrew
> Pinski for help with the analysis.
>
> Consider the code fragment below:
>
> int i;
> for (j=0; j<10; j++)
>   i++;
>
> This results in a loop containing two induction variables, i and j,
> where j is initialized, but i isn't (typically indicated in tree dumps
> by qualified ssa names like i(D).  In PR 100810, the loop optimizers
> end up selecting i as the "best" candidate (perhaps because having no
> initialization it's cheaper) which leads to problems in later passes
> when (the equivalent of) j is considered not to have the value 10 after
> the loop, as its definition is now computed from an undefined/uninitialized
> value.
>
> The fix below is to add a field to track whether any "iv_group" contains
> a use of an iv based on a undefined value, and then prohibit IVs that are
> based on undefined values from being candidates for groups that don't use
> undefined values.  This may seem lenient, but it allows an IV with an
> undefined base to be a candidate for itself, and is a sufficient condition
> to avoid the above bug/regression.  A stricter condition might be to only
> allow "undefined_value iv"s as candidates for iv_groups where *all*
> uses are to "undefined_value ivs"?  My concern was that this might lead
> to cases/loops that no longer have suitable candidates (i.e. a possible
> performance regression).
>
> Hopefully, the tree-loop optimization experts agree with my analysis/fix.

So the reason why the generated code is "wrong" is that we end up with

  # i_16 = PHI 
  _40 = (unsigned int) b.7_13;
  _47 = (unsigned int) i_24(D);
  _49 = _40 + _47;
  _39 = (unsigned int) i_16;
  _22 = -_39;
  _33 = _22 + _49;

where we relate i_24(D) and i_16 = PHI  but two
evaluations of the same undefined SSA name do not necessarily
yield the same value.  The undefined SSA names pop in via
SCEV and niter analysis so trying to fix this in IVOPTs is plugging
the hole only in a single place.  Not only do the SSA names not
evaluate to the same value in the end, but here CCP assumes
that UNDEFINED - UNDEFINED is UNDEFINED but
i_24(D) - i_24(D) is _not_ UNDEFINED.

So

Visiting statement:
_40 = (unsigned int) b.7_13;
which is likely CONSTANT
Lattice value changed to VARYING.  Adding SSA edges to worklist.

Visiting statement:
_47 = (unsigned int) i_24(D);
which is likely UNDEFINED
Lattice value changed to UNDEFINED.  Adding SSA edges to worklist.
marking stmt to be not simulated again

Visiting statement:
_49 = _47 + _40;
which is likely UNDEFINED

this last conclusion looks wrong to me, in fact most of the UNDEFINED
proapgation handling in CCP seems to assume that i_24(D) - i_24(D)
is not necessarily zero which is in conflict with us happily pulling
such not "stabilized" operands into expressions and propagating them
at will.

ISTR a paper about this very issue from the clang/llvm folks and they
inventing some special notion for not-quite-undefined or so ...

I don't have a good answer yet to the problem at hand (also given
CCP isn't the only place that doesn't treat i_24(D) as having the
same value in all contexts).  The classical testcase for this would be

int __attribute__((noipa)) foo (int i)
{
  int j, k, m;
  int val = j;
  if (i)
k = val;
  else
k = 3;
  if (i)
m = val;
  else
m = 2;
  return m - k;
}
int main()
{
  if (foo (1) != 0)
__builtin_abort ();
}

where 'k' is optimistically 3 from merging UNDEFINED and 3
from two executable edges.  The reasoning is that eventually
only initialized values may contribute to sth meaningful which
fails to consider X - X.  We have not considered this to be a bug
but we now have to work towards making the compiler itself
avoid creating such cancellations ...

Richard.

> This patch has been tested on x86_64-pc-linux-gnu with a "make bootstrap"
> and "make -k check" with no new failures.  Ok for mainline?
>
> 2021-08-31  Roger Sayle  
> Andrew Pinski  
>
> gcc/ChangeLog
> PR middle-end/100810
> * tree-ssa-loop-ivopts.c (struct iv_group): Add a new
> uses_undefined_value_p field, indicating this group has
> uses of iv's whose base is ssa_undefined_value_p.
> (record_use): Update uses_undefined_value_p as required.
> (record_group): Initialize uses_undefined_value_p to false.
> (determine_group_iv_cost_generic): Consider a candidate with
> a ssa_undefined_value_p base to have infinite_cost for a
> group where uses_undefined_value_p is false.
>
> gcc/testsuite/ChangeLog
> PR middle-end/100810
> * gcc.dg/pr100810.c: New test case.
>
> Roger
> --
>


Re: [PATCH] testsuite: Fix gcc.dg/vect/pr101145* tests [PR101145]

2021-08-31 Thread guojiufu via Gcc-patches

On 2021-08-31 20:12, Jakub Jelinek wrote:

Hi!

I'm getting:
FAIL: gcc.dg/vect/pr101145.c scan-tree-dump-times vect "vectorized 1 
loops" 7
FAIL: gcc.dg/vect/pr101145_1.c scan-tree-dump-times vect "vectorized 1 
loops" 2
FAIL: gcc.dg/vect/pr101145_2.c scan-tree-dump-times vect "vectorized 1 
loops" 2
FAIL: gcc.dg/vect/pr101145_3.c scan-tree-dump-times vect "vectorized 1 
loops" 2

FAIL: gcc.dg/vect/pr101145.c -flto -ffat-lto-objects
scan-tree-dump-times vect "vectorized 1 loops" 7
FAIL: gcc.dg/vect/pr101145_1.c -flto -ffat-lto-objects
scan-tree-dump-times vect "vectorized 1 loops" 2
FAIL: gcc.dg/vect/pr101145_2.c -flto -ffat-lto-objects
scan-tree-dump-times vect "vectorized 1 loops" 2
FAIL: gcc.dg/vect/pr101145_3.c -flto -ffat-lto-objects
scan-tree-dump-times vect "vectorized 1 loops" 2
on i686-linux (or x86_64-linux with -m32/-mno-sse).
The problem is that those tests use dg-options, which in */vect/ 
testsuite
throws away all the carefully added default options to enable 
vectorization
on each target (and which e.g. vect_int etc. effective targets rely 
on).

The old way would be to name those tests gcc.dg/vect/O3-pr101145*,
but we can also use dg-additional-options (which doesn't throw the 
default
options, just appends to them) which is IMO better so that we don't 
have to

rename the tests.

Tested on x86_64-linux and i686-linux, ok for trunk?

2021-08-31  Jakub Jelinek  

PR tree-optimization/102072
* gcc.dg/vect/pr101145.c: Use dg-additional-options with just -O3
instead of dg-options with -O3 -fdump-tree-vect-details.
* gcc.dg/vect/pr101145_1.c: Likewise.
* gcc.dg/vect/pr101145_2.c: Likewise.
* gcc.dg/vect/pr101145_3.c: Likewise.

--- gcc/testsuite/gcc.dg/vect/pr101145.c.jj	2021-08-30 
08:36:11.295515537 +0200
+++ gcc/testsuite/gcc.dg/vect/pr101145.c	2021-08-31 14:04:35.691964573 
+0200

@@ -1,5 +1,5 @@
 /* { dg-require-effective-target vect_int } */
-/* { dg-options "-O3 -fdump-tree-vect-details" } */
+/* { dg-additional-options "-O3" } */
 #include 

 unsigned __attribute__ ((noinline))
--- gcc/testsuite/gcc.dg/vect/pr101145_1.c.jj   2021-08-30
08:36:11.295515537 +0200
+++ gcc/testsuite/gcc.dg/vect/pr101145_1.c	2021-08-31 
14:04:55.083691474 +0200

@@ -1,5 +1,5 @@
 /* { dg-require-effective-target vect_int } */
-/* { dg-options "-O3 -fdump-tree-vect-details" } */
+/* { dg-additional-options "-O3" } */
 #define TYPE signed char
 #define MIN -128
 #define MAX 127
--- gcc/testsuite/gcc.dg/vect/pr101145_2.c.jj   2021-08-30
08:36:11.295515537 +0200
+++ gcc/testsuite/gcc.dg/vect/pr101145_2.c	2021-08-31 
14:05:05.868539591 +0200

@@ -1,5 +1,5 @@
 /* { dg-require-effective-target vect_int } */
-/* { dg-options "-O3 -fdump-tree-vect-details" } */
+/* { dg-additional-options "-O3" } */
 #define TYPE unsigned char
 #define MIN 0
 #define MAX 255
--- gcc/testsuite/gcc.dg/vect/pr101145_3.c.jj   2021-08-30
08:36:11.295515537 +0200
+++ gcc/testsuite/gcc.dg/vect/pr101145_3.c	2021-08-31 
14:05:17.903370103 +0200

@@ -1,5 +1,5 @@
 /* { dg-require-effective-target vect_int } */
-/* { dg-options "-O3 -fdump-tree-vect-details" } */
+/* { dg-additional-options "-O3" } */
 #define TYPE int *
 #define MIN ((TYPE)0)
 #define MAX ((TYPE)((long long)-1))

Jakub


Hi Jakub,

Thanks for point out this!
Just find most of the cases in /vect/ are using dg-additional-options 
instead dg-options.


BR.
Jiufu Guo


Re: [PATCH] libgccjit: add some reflection functions in the jit C api

2021-08-31 Thread Antoni Boucher via Gcc-patches
David: PING

Le jeudi 29 juillet 2021 à 08:59 -0400, Antoni Boucher a écrit :
> David: PING
> 
> Le lundi 19 juillet 2021 à 12:10 -0400, Antoni Boucher a écrit :
> > I'm sending the patch once again for review/approval.
> > 
> > I fixed the doc to use the new function names.
> > 
> > Le vendredi 18 juin 2021 à 16:37 -0400, David Malcolm a écrit :
> > > On Fri, 2021-06-18 at 15:41 -0400, Antoni Boucher wrote:
> > > > I have write access now.
> > > 
> > > Great.
> > > 
> > > > I'm not sure how I'm supposed to send my patches:
> > > > should I put it in personal branches and you'll merge them?
> > > 
> > > Please send them to this mailing list for review; once they're
> > > approved
> > > you can merge them.
> > > 
> > > > 
> > > > And for the MAINTAINERS file, should I just push to master
> > > > right
> > > > away,
> > > > after sending it to the mailing list?
> > > 
> > > I think people just push the MAINTAINERS change and then let the
> > > list
> > > know, since it makes a good test that write access is working
> > > correctly.
> > > 
> > > Dave
> > > 
> > > > 
> > > > Thanks for your help!
> > > > 
> > > > Le vendredi 18 juin 2021 à 12:09 -0400, David Malcolm a écrit :
> > > > > On Fri, 2021-06-18 at 11:55 -0400, Antoni Boucher wrote:
> > > > > > Le vendredi 11 juin 2021 à 14:00 -0400, David Malcolm a
> > > > > > écrit :
> > > > > > > On Fri, 2021-06-11 at 08:15 -0400, Antoni Boucher wrote:
> > > > > > > > Thank you for your answer.
> > > > > > > > I attached the updated patch.
> > > > > > > 
> > > > > > > BTW you (or possibly me) dropped the mailing lists; was
> > > > > > > that
> > > > > > > deliberate?
> > > > > > 
> > > > > > Oh, my bad.
> > > > > > 
> > > > > 
> > > > > [...]
> > > > > 
> > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > > I have signed the FSF copyright attribution.
> > > > > > > 
> > > > > > > I can push changes on your behalf, but I'd prefer it if
> > > > > > > you
> > > > > > > did
> > > > > > > it,
> > > > > > > especially given that you have various other patches you
> > > > > > > want
> > > > > > > to
> > > > > > > get
> > > > > > > in.
> > > > > > > 
> > > > > > > Instructions on how to get push rights to the git repo
> > > > > > > are
> > > > > > > here:
> > > > > > >   https://gcc.gnu.org/gitwrite.html
> > > > > > > 
> > > > > > > I can sponsor you.
> > > > > > 
> > > > > > Thanks.
> > > > > > I did sign up to get push rights.
> > > > > > Have you accepted my request to get those?
> > > > > 
> > > > > I did, but I didn't see any kind of notification.  Did you
> > > > > get
> > > > > an
> > > > > email
> > > > > about it?
> > > > > 
> > > > > 
> > > > > Dave
> > > > > 
> > > > 
> > > > 
> > > 
> > > 
> > 
> 
> 




Re: [PATCH] Preserve SUBREG_PROMOTED_VAR_P on (extend:HI (subreg/s:QI (reg:SI)))

2021-08-31 Thread Christophe LYON via Gcc-patches



On 29/08/2021 09:46, Roger Sayle wrote:

SUBREG_PROMOTED_VAR_P is a mechanism for tracking that a partial subreg
is correctly zero-extended or sign-extended in the parent register.  For
example, the RTL (subreg/s/v:QI (reg/v:SI 23 [ x ]) 0) indicates that the
byte x is zero extended in reg:SI 23, which is useful for optimization.
An example is that zero extending the above QImode value to HImode can
simply use a wider subreg, i.e. (subreg:HI (reg/v:SI 23 [ x ]) 0).

This patch addresses the oversight/missed optimization opportunity that
the new HImode subreg above should retain its SUBREG_PROMOTED_VAR_P
annotation as its value is guaranteed to be correctly extended in the
SImode parent.  The code below to preserve SUBREG_PROMOTED_VAR_P is already
present in the middle-end (e.g. simplify-rtx.c:7232-7242) but missing
from one or two (precisely three) places that (accidentally) strip it.

Whilst there I also added another optimization.  If we need to extend
the above QImode value beyond the SImode register holding it, say to
DImode, we can eliminate the SUBREG and simply extend from the SImode
register to DImode.

This patch has been tested on x86_64-pc-linux-gnu with "make bootstrap"
and "make -k check" with no new failures, and on a cross-compiler to
nvptx-none, where the function "long foo(char x) { return x; }" now
requires one less instruction.

OK for mainline?



Hi,

This patch causes an ICE when building an aarch64 toolchain:

during RTL pass: expand
In file included from 
/tmp/5987050_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/soft-fp/soft-fp.h:318,
 from 
/tmp/5987050_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/soft-fp/floatditf.c:32:
/tmp/5987050_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/soft-fp/floatditf.c: 
In function '__floatditf':
/tmp/5987050_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/soft-fp/op-2.h:249:37: 
internal compiler error: in subreg_promoted_mode, at rtl.h:3132

  249 |   _FP_PACK_RAW_2_flo.bits.exp   = X##_e;    \
/tmp/5987050_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/soft-fp/quad.h:229:33: 
note: in expansion of macro '_FP_PACK_RAW_2'

  229 | # define FP_PACK_RAW_Q(val, X)  _FP_PACK_RAW_2 (Q, (val), X)
  | ^~
/tmp/5987050_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/soft-fp/floatditf.c:42:3: 
note: in expansion of macro 'FP_PACK_RAW_Q'

   42 |   FP_PACK_RAW_Q (a, A);
  |   ^
0xa0b53a subreg_promoted_mode

/tmp/5987050_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/rtl.h:3132
0xa0b53a convert_modes(machine_mode, machine_mode, rtx_def*, int)
/tmp/5987050_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:699
0xa003bc expand_expr_real_2(separate_ops*, rtx_def*, machine_mode, 
expand_modifier)

/tmp/5987050_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:9091
0xa0765c expand_expr_real_1(tree_node*, rtx_def*, machine_mode, 
expand_modifier, rtx_def**, bool)

/tmp/5987050_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:10497
0x9fcef6 expand_expr_real_2(separate_ops*, rtx_def*, machine_mode, 
expand_modifier)

/tmp/5987050_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:9798
0xa0765c expand_expr_real_1(tree_node*, rtx_def*, machine_mode, 
expand_modifier, rtx_def**, bool)

/tmp/5987050_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:10497
0xa1099e expand_expr
/tmp/5987050_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.h:301
0xa1099e expand_operands(tree_node*, tree_node*, rtx_def*, rtx_def**, 
rtx_def**, expand_modifier)

/tmp/5987050_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:8308
0x9fcdff expand_expr_real_2(separate_ops*, rtx_def*, machine_mode, 
expand_modifier)

/tmp/5987050_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:10288


Can you check?


Thanks,

Christophe




2021-08-29  Roger Sayle  

gcc/ChangeLog
* expr.c (convert_modes): Preserve SUBREG_PROMOTED_VAR_P when
creating a (wider) partial subreg from a SUBREG_PROMOTED_VAR_P
subreg.
* simplify-rtx.c (simplify_unary_operation_1) [SIGN_EXTEND]:
Likewise, preserve SUBREG_PROMOTED_VAR_P when creating a (wider)
partial subreg from a SUBREG_PROMOTED_VAR_P subreg.  Generate
SIGN_EXTEND of the SUBREG_REG when a subreg would be paradoxical.
[ZERO_EXTEND]: Likewise, preserve SUBREG_PROMOTED_VAR_P when
creating a (wider) partial subreg from a SUBREG_PROMOTED_VAR_P
subreg.  Generate ZERO_EXTEND of the SUBREG_REG when a subreg
would be paradoxical.

Roger
--



[PATCH] testsuite: Fix gcc.dg/vect/pr101145* tests [PR101145]

2021-08-31 Thread Jakub Jelinek via Gcc-patches
Hi!

I'm getting:
FAIL: gcc.dg/vect/pr101145.c scan-tree-dump-times vect "vectorized 1 loops" 7
FAIL: gcc.dg/vect/pr101145_1.c scan-tree-dump-times vect "vectorized 1 loops" 2
FAIL: gcc.dg/vect/pr101145_2.c scan-tree-dump-times vect "vectorized 1 loops" 2
FAIL: gcc.dg/vect/pr101145_3.c scan-tree-dump-times vect "vectorized 1 loops" 2
FAIL: gcc.dg/vect/pr101145.c -flto -ffat-lto-objects  scan-tree-dump-times vect 
"vectorized 1 loops" 7
FAIL: gcc.dg/vect/pr101145_1.c -flto -ffat-lto-objects  scan-tree-dump-times 
vect "vectorized 1 loops" 2
FAIL: gcc.dg/vect/pr101145_2.c -flto -ffat-lto-objects  scan-tree-dump-times 
vect "vectorized 1 loops" 2
FAIL: gcc.dg/vect/pr101145_3.c -flto -ffat-lto-objects  scan-tree-dump-times 
vect "vectorized 1 loops" 2
on i686-linux (or x86_64-linux with -m32/-mno-sse).
The problem is that those tests use dg-options, which in */vect/ testsuite
throws away all the carefully added default options to enable vectorization
on each target (and which e.g. vect_int etc. effective targets rely on).
The old way would be to name those tests gcc.dg/vect/O3-pr101145*,
but we can also use dg-additional-options (which doesn't throw the default
options, just appends to them) which is IMO better so that we don't have to
rename the tests.

Tested on x86_64-linux and i686-linux, ok for trunk?

2021-08-31  Jakub Jelinek  

PR tree-optimization/102072
* gcc.dg/vect/pr101145.c: Use dg-additional-options with just -O3
instead of dg-options with -O3 -fdump-tree-vect-details.
* gcc.dg/vect/pr101145_1.c: Likewise.
* gcc.dg/vect/pr101145_2.c: Likewise.
* gcc.dg/vect/pr101145_3.c: Likewise.

--- gcc/testsuite/gcc.dg/vect/pr101145.c.jj 2021-08-30 08:36:11.295515537 
+0200
+++ gcc/testsuite/gcc.dg/vect/pr101145.c2021-08-31 14:04:35.691964573 
+0200
@@ -1,5 +1,5 @@
 /* { dg-require-effective-target vect_int } */
-/* { dg-options "-O3 -fdump-tree-vect-details" } */
+/* { dg-additional-options "-O3" } */
 #include 
 
 unsigned __attribute__ ((noinline))
--- gcc/testsuite/gcc.dg/vect/pr101145_1.c.jj   2021-08-30 08:36:11.295515537 
+0200
+++ gcc/testsuite/gcc.dg/vect/pr101145_1.c  2021-08-31 14:04:55.083691474 
+0200
@@ -1,5 +1,5 @@
 /* { dg-require-effective-target vect_int } */
-/* { dg-options "-O3 -fdump-tree-vect-details" } */
+/* { dg-additional-options "-O3" } */
 #define TYPE signed char
 #define MIN -128
 #define MAX 127
--- gcc/testsuite/gcc.dg/vect/pr101145_2.c.jj   2021-08-30 08:36:11.295515537 
+0200
+++ gcc/testsuite/gcc.dg/vect/pr101145_2.c  2021-08-31 14:05:05.868539591 
+0200
@@ -1,5 +1,5 @@
 /* { dg-require-effective-target vect_int } */
-/* { dg-options "-O3 -fdump-tree-vect-details" } */
+/* { dg-additional-options "-O3" } */
 #define TYPE unsigned char
 #define MIN 0
 #define MAX 255
--- gcc/testsuite/gcc.dg/vect/pr101145_3.c.jj   2021-08-30 08:36:11.295515537 
+0200
+++ gcc/testsuite/gcc.dg/vect/pr101145_3.c  2021-08-31 14:05:17.903370103 
+0200
@@ -1,5 +1,5 @@
 /* { dg-require-effective-target vect_int } */
-/* { dg-options "-O3 -fdump-tree-vect-details" } */
+/* { dg-additional-options "-O3" } */
 #define TYPE int *
 #define MIN ((TYPE)0)
 #define MAX ((TYPE)((long long)-1))

Jakub



[PATCH] vectorizer: Fix up vectorization using WIDEN_MINUS_EXPR [PR102124]

2021-08-31 Thread Jakub Jelinek via Gcc-patches
Hi!

The following testcase is miscompiled on aarch64-linux at -O3 since the
introduction of WIDEN_MINUS_EXPR.
The problem is if the inner type (half_type) is unsigned and the result
type in which the subtraction is performed (type) has precision more than
twice as larger as the inner type's precision.
For other widening operations like WIDEN_{PLUS,MULT}_EXPR, if half_type
is unsigned, the addition/multiplication result in itype is also unsigned
and needs to be zero-extended to type.
But subtraction is special, even when half_type is unsigned, the subtraction
behaves as signed (also regardless of whether the result type is signed or
unsigned), 0xfeU - 0xffU is -1 or 0xU, not 0x.

I think it is better not to use mixed signedness of types in
WIDEN_MINUS_EXPR (have unsigned vector of operands and signed result
vector), so this patch instead adds another cast to make sure we always
sign-extend the result from itype to type if type is wider than itype.

Bootstrapped/regtested on aarch64-linux, x86_64-linux and i686-linux, ok
for trunk/11.3?

2021-08-31  Jakub Jelinek  

PR tree-optimization/102124
* tree-vect-patterns.c (vect_recog_widen_op_pattern): For ORIG_CODE
MINUS_EXPR, if itype is unsigned with smaller precision than type,
add an extra cast to signed variant of itype to ensure sign-extension.

* gcc.dg/torture/pr102124.c: New test.

--- gcc/tree-vect-patterns.c.jj 2021-08-17 21:05:07.0 +0200
+++ gcc/tree-vect-patterns.c2021-08-30 11:54:03.651474845 +0200
@@ -1268,11 +1268,31 @@ vect_recog_widen_op_pattern (vec_info *v
   /* Check target support  */
   tree vectype = get_vectype_for_scalar_type (vinfo, half_type);
   tree vecitype = get_vectype_for_scalar_type (vinfo, itype);
+  tree ctype = itype;
+  tree vecctype = vecitype;
+  if (orig_code == MINUS_EXPR
+  && TYPE_UNSIGNED (itype)
+  && TYPE_PRECISION (type) > TYPE_PRECISION (itype))
+{
+  /* Subtraction is special, even if half_type is unsigned and no matter
+whether type is signed or unsigned, if type is wider than itype,
+we need to sign-extend from the widening operation result to the
+result type.
+Consider half_type unsigned char, operand 1 0xfe, operand 2 0xff,
+itype unsigned short and type either int or unsigned int.
+Widened (unsigned short) 0xfe - (unsigned short) 0xff is
+(unsigned short) 0x, but for type int we want the result -1
+and for type unsigned int 0x rather than 0x.  */
+  ctype = build_nonstandard_integer_type (TYPE_PRECISION (itype), 0);
+  vecctype = get_vectype_for_scalar_type (vinfo, ctype);
+}
+
   enum tree_code dummy_code;
   int dummy_int;
   auto_vec dummy_vec;
   if (!vectype
   || !vecitype
+  || !vecctype
   || !supportable_widening_operation (vinfo, wide_code, last_stmt_info,
  vecitype, vectype,
  &dummy_code, &dummy_code,
@@ -1291,8 +1311,12 @@ vect_recog_widen_op_pattern (vec_info *v
   gimple *pattern_stmt = gimple_build_assign (var, wide_code,
  oprnd[0], oprnd[1]);
 
+  if (vecctype != vecitype)
+pattern_stmt = vect_convert_output (vinfo, last_stmt_info, ctype,
+   pattern_stmt, vecitype);
+
   return vect_convert_output (vinfo, last_stmt_info,
- type, pattern_stmt, vecitype);
+ type, pattern_stmt, vecctype);
 }
 
 /* Try to detect multiplication on widened inputs, converting MULT_EXPR
--- gcc/testsuite/gcc.dg/torture/pr102124.c.jj  2021-08-30 12:08:05.838649133 
+0200
+++ gcc/testsuite/gcc.dg/torture/pr102124.c 2021-08-30 12:07:52.669834031 
+0200
@@ -0,0 +1,27 @@
+/* PR tree-optimization/102124 */
+
+int
+foo (const unsigned char *a, const unsigned char *b, unsigned long len)
+{
+  int ab, ba; 
+  unsigned long i;
+  for (i = 0, ab = 0, ba = 0; i < len; i++)
+{
+  ab |= a[i] - b[i];
+  ba |= b[i] - a[i];
+}   
+  return (ab | ba) >= 0;
+}
+
+int
+main ()
+{
+  unsigned char a[32] = { 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 
'a', 'a', 'a', 'a', 'a', 'a' };
+  unsigned char b[32] = { 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 
'a', 'a', 'a', 'a', 'a', 'a' };
+  unsigned char c[32] = { 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 
'b', 'b', 'b', 'b', 'b', 'b' };
+  if (!foo (a, b, 16))
+__builtin_abort ();
+  if (foo (a, c, 16))
+__builtin_abort ();
+  return 0;
+}

Jakub



Re: [PATCH 2/2] Get rid of all float-int special cases in validate_subreg.

2021-08-31 Thread Richard Biener via Gcc-patches
On Tue, Aug 31, 2021 at 1:17 PM liuhongt  wrote:
>
> gcc/ChangeLog:

OK.

Thanks,
Richard.

> * emit-rtl.c (validate_subreg): Get rid of all float-int
> special cases.
> ---
>  gcc/emit-rtl.c | 40 
>  1 file changed, 40 deletions(-)
>
> diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
> index ff3b4449b37..77ea8948ee8 100644
> --- a/gcc/emit-rtl.c
> +++ b/gcc/emit-rtl.c
> @@ -922,46 +922,6 @@ validate_subreg (machine_mode omode, machine_mode imode,
>
>poly_uint64 regsize = REGMODE_NATURAL_SIZE (imode);
>
> -  /* ??? This should not be here.  Temporarily continue to allow word_mode
> - subregs of anything.  The most common offender is (subreg:SI (reg:DF)).
> - Generally, backends are doing something sketchy but it'll take time to
> - fix them all.  */
> -  if (omode == word_mode)
> -;
> -  /* ??? Similarly, e.g. with (subreg:DF (reg:TI)).  Though store_bit_field
> - is the culprit here, and not the backends.  */
> -  else if (known_ge (osize, regsize) && known_ge (isize, osize))
> -;
> -  /* Allow component subregs of complex and vector.  Though given the below
> - extraction rules, it's not always clear what that means.  */
> -  else if ((COMPLEX_MODE_P (imode) || VECTOR_MODE_P (imode))
> -  && GET_MODE_INNER (imode) == omode)
> -;
> -  /* ??? x86 sse code makes heavy use of *paradoxical* vector subregs,
> - i.e. (subreg:V4SF (reg:SF) 0) or (subreg:V4SF (reg:V2SF) 0).  This
> - surely isn't the cleanest way to represent this.  It's questionable
> - if this ought to be represented at all -- why can't this all be hidden
> - in post-reload splitters that make arbitrarily mode changes to the
> - registers themselves.  */
> -  else if (VECTOR_MODE_P (omode)
> -  && GET_MODE_INNER (omode) == GET_MODE_INNER (imode))
> -;
> -  /* Subregs involving floating point modes are not allowed to
> - change size.  Therefore (subreg:DI (reg:DF) 0) is fine, but
> - (subreg:SI (reg:DF) 0) isn't.  */
> -  else if (FLOAT_MODE_P (imode) || FLOAT_MODE_P (omode))
> -{
> -  if (! (known_eq (isize, osize)
> -/* LRA can use subreg to store a floating point value in
> -   an integer mode.  Although the floating point and the
> -   integer modes need the same number of hard registers,
> -   the size of floating point mode can be less than the
> -   integer mode.  LRA also uses subregs for a register
> -   should be used in different mode in on insn.  */
> -|| lra_in_progress))
> -   return false;
> -}
> -
>/* Paradoxical subregs must have offset zero.  */
>if (maybe_gt (osize, isize))
>  return known_eq (offset, 0U);
> --
> 2.27.0
>


Re: [PATCH] Check the type of mask while generating cond_op in gimple simplication.

2021-08-31 Thread Richard Biener via Gcc-patches
On Tue, Aug 31, 2021 at 12:18 PM Hongtao Liu  wrote:
>
> On Mon, Aug 30, 2021 at 8:25 PM Richard Biener via Gcc-patches
>  wrote:
> >
> > On Fri, Aug 27, 2021 at 8:53 AM liuhongt  wrote:
> > >
> > >   When gimple simplifcation try to combine op and vec_cond_expr to 
> > > cond_op,
> > > it doesn't check if mask type matches. It causes an ICE when expand 
> > > cond_op
> > > with mismatched mode.
> > >   This patch add a function named cond_vectorized_internal_fn_supported_p
> > >  to additionally check mask type than vectorized_internal_fn_supported_p.
> > >
> > >   Bootstrapped and regtested on x86_64-linux-gnu{-m32,}.
> > >   Ok for trunk?
> > >
> > > gcc/ChangeLog:
> > >
> > > PR middle-end/102080
> > > * internal-fn.c (cond_vectorized_internal_fn_supported_p): New 
> > > functions.
> > > * internal-fn.h (cond_vectorized_internal_fn_supported_p): New 
> > > declaration.
> > > * match.pd: Check the type of mask while generating cond_op in
> > > gimple simplication.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > PR middle-end/102080
> > > * gcc.target/i386/pr102080.c: New test.
> > > ---
> > >  gcc/internal-fn.c| 22 ++
> > >  gcc/internal-fn.h|  1 +
> > >  gcc/match.pd | 24 
> > >  gcc/testsuite/gcc.target/i386/pr102080.c | 16 
> > >  4 files changed, 55 insertions(+), 8 deletions(-)
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr102080.c
> > >
> > > diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
> > > index 1360a00f0b9..8b2b65db1a7 100644
> > > --- a/gcc/internal-fn.c
> > > +++ b/gcc/internal-fn.c
> > > @@ -4102,6 +4102,28 @@ expand_internal_call (gcall *stmt)
> > >expand_internal_call (gimple_call_internal_fn (stmt), stmt);
> > >  }
> > >
> > > +/* Check cond_op for vector modes since 
> > > vectorized_internal_fn_supported_p
> > > +   doesn't check if mask type matches.  */
> > > +bool
> > > +cond_vectorized_internal_fn_supported_p (internal_fn ifn, tree type,
> > > +tree mask_type)
> > > +{
> > > +  if (!vectorized_internal_fn_supported_p (ifn, type))
> > > +return false;
> > > +
> > > +  machine_mode mask_mode;
> > > +  machine_mode vmode = TYPE_MODE (type);
> > > +  int size1, size2;
> > > +  if (VECTOR_MODE_P (vmode)
> > > +  && targetm.vectorize.get_mask_mode (vmode).exists(&mask_mode)
> > > +  && GET_MODE_SIZE (mask_mode).is_constant (&size1)
> > > +  && GET_MODE_SIZE (TYPE_MODE (mask_type)).is_constant (&size2)
> > > +  && size1 != size2)
> >
> > Why do we check for equal size rather than just mode equality which
> I originally thought  TYPE_MODE of vector(8)  was
> not QImode, Changed the patch to check mode equality.
> Update patch.

Looking at all this it seems the match.pd patterns should have not
used vectorized_internal_fn_supported_p but direct_internal_fn_supported_p
which is equivalent here because we're always working with vector modes?

And then shouldn't we look at the actual optab whether the mask mode matches
the expectation rather than going around via the target hook which may not have
enough context to decide which mask mode to use?

In any case if the approach of the patch is correct shouldn't it do

  if (VECTOR_MODE_P (vmode)
  && (!targetm.vectorize.get_mask_mode (vmode).exists(&mask_mode)
 || mask_mode != TYPE_MODE (mask_type)))
return false;

that is, not return true if there's no mask mode for the data mode?

Given the first observation should we call the function
direct_cond_internal_fn_supported_p () instead and as to the second
observation, look at the optab operands mode?

Richard.

> > I think would work for non-constant sized modes as well?  And when
> > using sizes you'd instead use maybe_ne (GET_MODE_SIZE (mask_mode),
> > GET_MODE_SIZE (TYPE_MODE (mask_type)))
> >
> > Thanks,
> > Richard.
> >
> > > +return false;
> > > +
> > > +  return true;
> > > +}
> > > +
> > >  /* If TYPE is a vector type, return true if IFN is a direct internal
> > > function that is supported for that type.  If TYPE is a scalar type,
> > > return true if IFN is a direct internal function that is supported for
> > > diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h
> > > index 19d0f849a5a..f0aea00103c 100644
> > > --- a/gcc/internal-fn.h
> > > +++ b/gcc/internal-fn.h
> > > @@ -236,5 +236,6 @@ extern void expand_PHI (internal_fn, gcall *);
> > >  extern void expand_SHUFFLEVECTOR (internal_fn, gcall *);
> > >
> > >  extern bool vectorized_internal_fn_supported_p (internal_fn, tree);
> > > +extern bool cond_vectorized_internal_fn_supported_p (internal_fn, tree, 
> > > tree);
> > >
> > >  #endif
> > > diff --git a/gcc/match.pd b/gcc/match.pd
> > > index e5bbb123a6a..72b1bc674db 100644
> > > --- a/gcc/match.pd
> > > +++ b/gcc/match.pd
> > > @@ -6987,14 +6987,18 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (R

Re: [PATCH] tree-optimization/102139 - fix SLP DR base alignment

2021-08-31 Thread Richard Biener via Gcc-patches
On Tue, Aug 31, 2021 at 11:26 AM Richard Biener via Gcc-patches
 wrote:
>
> When doing whole-function SLP we have to make sure the recorded
> base alignments we compute as the maximum alignment seen for a
> base anywhere in the function is actually valid at the point
> we want to make use of it.
>
> To make this work we now record the stmt the alignment was derived
> from in addition to the DRs innermost behavior and we use a
> dominance check to verify the recorded info is valid when doing
> BB vectorization.
>
> Note this leaves a small(?) hole for the case where we have sth
> like
>
> unaligned DR
> call (); // does not return
> aligned DR
>
> since we'll derive an aligned access for the earlier DR but the
> later DR is never actually reached since the call does not
> return.  To plug this hole one option (for the easy backporting)
> would be to simply not use the base-alignment recording at all.
> Alternatively we'd have to store the dataref grouping 'id' somewhere
> in the DR itself and use that to handle this particular case.

It turns out this isn't too difficult so the following is a patch adjusted
to cover that case together with a testcase.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

OK?

Thanks,
Richard.

2021-08-31  Richard Biener  

PR tree-optimization/102139
* tree-vectorizer.h (vec_base_alignments): Adjust hash-map
type to record a std::pair of the stmt-info and the innermost
loop behavior.
(dr_vec_info::group): New member.
* tree-vect-data-refs.c (vect_record_base_alignment): Adjust.
(vect_compute_data_ref_alignment): Verify the recorded
base alignment can be used.
(data_ref_pair): Remove.
(dr_group_sort_cmp): Adjust.
(vect_analyze_data_ref_accesses): Store the group-ID in the
dr_vec_info and operate on a vector of dr_vec_infos.

* gcc.dg/torture/pr102139.c: New testcase.


p
Description: Binary data


Re: [PATCH v3] Fix incomplete computation in fill_always_executed_in_1

2021-08-31 Thread Richard Biener via Gcc-patches
On Tue, 31 Aug 2021, Xionghu Luo wrote:

> 
> 
> On 2021/8/30 17:19, Richard Biener wrote:
>   bitmap_set_bit (work_set, loop->header->index);
>  +  unsigned bb_index;
> -  for (i = 0; i < loop->num_nodes; i++)
>  -{
>  -  edge_iterator ei;
>  -  bb = bbs[i];
>  +  unsigned array_size = last_basic_block_for_fn (cfun) + 1;
>  +  int *bbd = XNEWVEC (int, array_size);
>  +  bbd = XDUPVEC (int, bbi, array_size);
> >>> I don't think you need to copy 'bbi' but you can re-use the
> >>> state from the outer loop processing.  Did you run into any
> >>> issues with that?
> >> Yes.  For example, adding a small if-else block to ssa-lim-19.c,
> >> Then block "x->j += tem * i;" of bb 6 is always executed for loop 2, when
> >> call
> >> fill_always_executed_in_1 for loop 1, bbi[6] is decreased from 2 to 1 to 0,
> >> then if fill_always_executed_in_1 is called again for loop 2, it's value is
> >> not
> >> reset so bbi[6] won't be set ALWAYS EXECUTE, this is wrong.
> >>
> >>
> >> struct X { int i; int j; int k;};
> >>
> >> void foo(struct X *x, int n, int l, int m)
> >> {
> >>   for (int j = 0; j < l; j++)  // loop 1
> >> {
> >>   for (int i = 0; i < n; ++i)  // loop 2
> >> {
> >>   if (m)
> >> x->j++;
> >>   else
> >> x->j = m+n+l;
> >>
> >>   int *p = &x->j;   // bb 6
> >>   int tem = *p;
> >>   x->j += tem * i;
> >> }
> >>   int *r = &x->k;
> >>   int tem2 = *r;
> >>   x->k += tem2 * j;
> >> }
> >> }
> > Hmm, but if the outer loop processing reaches bb 6 then
> > it should have set it ALWAYS_EXECUTED in loop 1 already?
> 
> But bb 6 is NOT ALWAYS_EXECUTED for loop 1, it is only ALWAYS_EXECUTED for
> loop 2 as it requires n>0.  Please refer to the attached file
> ssa-lim-19.c.138t.lim2.
> 
> ;;
> ;; Loop 1
> ;;  header 8, latch 12
> ;;  depth 1, outer 0
> ;;  nodes: 8 12 7 6 4 5 3 13 11
> ;;
> ;; Loop 2
> ;;  header 3, latch 13
> ;;  depth 2, outer 1
> ;;  nodes: 3 13 6 4 5
> ;; 2 succs { 10 9 }
> ;; 10 succs { 8 }
> ;; 11 succs { 3 }
> ;; 3 succs { 4 5 }
> ;; 4 succs { 6 }
> ;; 5 succs { 6 }
> ;; 6 succs { 13 7 }
> ;; 13 succs { 3 }
> ;; 7 succs { 12 9 }
> ;; 12 succs { 8 }
> ;; 8 succs { 11 7 }
> ;; 9 succs { 1 }
> 
> always executed: bb->index:8, loop->num: 1
> always executed: bb->index:7, loop->num: 1
> always executed: bb->index:3, loop->num: 2
> always executed: bb->index:6, loop->num: 2
> 
> 8<---
>  /  \  |
>  11   \ |
>  / \|
>  3<---  \   | 
> /\|  \  |
> 4 5   |   \ |
> \/|\|
>  6| \   |
>  |-->13  \  |
>  |--> 7 |
>  /\|
>  9 12---
> 
> (gdb) x /15x bbd
> 0x1354c9b0: 0x  0x  0x0001  0x0001
> 0x1354c9c0: 0x0001  0x0001  0x0002  0x0002
> 0x1354c9d0: 0x0001  0x0002  0x0001  0x0001
> 0x1354c9e0: 0x0001  0x0001  0x
> 
> our algorithm will walk through 8->11->3->4->5->6->7,
> for loop 1, exit at edge 7->9.
> 
> (gdb) x /15x bbd
> 0x1354c9b0: 0x  0x  0x0001  0x
> 0x1354c9c0: 0x  0x  0x  0x
> 0x1354c9d0: 0x0001  0x0002  0x0001  0x
> 0x1354c9e0: 0x0001  0x  0x
> 
> If we don't reset bbd to incoming_edge by memcpy, bbd[3],bbd[4],bbd[5]
> and bbd[6] is 0 now for loop 2, fill_always_executed_in_1 couldn't set
> ALWAYS_EXECUTED correctly for loop 2 at bb 3 and bb 6.
> 
> > 
> >>
> >>>
>  +  while (!bitmap_empty_p (work_set))
>  +{
>  +  bb_index = bitmap_first_set_bit (work_set);
>  +  bitmap_clear_bit (work_set, bb_index);
>  +  bb = BASIC_BLOCK_FOR_FN (cfun, bb_index);
> if (dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
>  -last = bb;
>  -
>  +SET_ALWAYS_EXECUTED_IN (bb, loop);
>    if (bitmap_bit_p (contains_call, bb->index))
>  break;
> >>> I think you want to continue; here (process remaining worklist
> >>> but not continue greedy walking this block)
> >> Same as above, if use 'continue' instead of 'break', the algorithm
> >> seems also not work again.  If inner loop contains a jump to outmost
> >> loop, the blocks after the jump block will be set to ALWAYS EXECUTE
> >> incorrectly.
> >>
>  -
>  +  edge_iterator ei;
>    FOR_EACH_EDGE (e, ei, bb->succs)
>   {
>  -  /* If there is an exit from this BB.  */
>    if (!flow_bb_inside_loop_p (loop, e->dest))
>   break;
> >>> in particular this should keep the outer 'bbi' valid to re-use.
> >>>
> >>> But again, you want 'continue;' the greedy walk to other edges.
> >>> If that's not valid (I'd need to think about this) then with
> >>> your patch whet

Re: [PATCH v2] md/define_c_enum: support value assignation

2021-08-31 Thread YunQiang Su
YunQiang Su  于2021年8月31日周二 下午7:09写道:
>
> Currently, the enums from define_c_enum and define_enum can only
> has values one by one from 0.
>
> In fact we can support the behaviour just like C, aka like
>   (define_enum "mips_isa" [(mips1 1) mips2 (mips32 32) mips32r2]),
> then we can get
>   enum mips_isa {
> MIPS_ISA_MIPS1 = 1,
> MIPS_ISA_MIPS2 = 2,
> MIPS_ISA_MIPS32 = 32,
> MIPS_ISA_MIPS32R2 = 33
>   };
>
> gcc/ChangeLog:
> * read-md.c (md_reader::handle_enum): support value assignation.
> * doc/md.texi: record define_c_enum value assignation support.
> ---
>  gcc/doc/md.texi |  4 
>  gcc/read-md.c   | 28 
>  2 files changed, 28 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index f8047aefc..2b41cb7fb 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -11074,6 +11074,8 @@ The syntax is as follows:
>  (define_c_enum "@var{name}" [
>@var{value0}
>@var{value1}
> +  (@var{value32} 32)
> +  @var{value33}
>@dots{}
>@var{valuen}
>  ])
> @@ -11086,6 +11088,8 @@ in @file{insn-constants.h}:
>  enum @var{name} @{
>@var{value0} = 0,
>@var{value1} = 1,
> +  @var{value32} = 32,
> +  @var{value33} = 33,
>@dots{}
>@var{valuen} = @var{n}
>  @};
> diff --git a/gcc/read-md.c b/gcc/read-md.c
> index bb419e0f6..2d01c69fc 100644
> --- a/gcc/read-md.c
> +++ b/gcc/read-md.c
> @@ -902,7 +902,9 @@ void
>  md_reader::handle_enum (file_location loc, bool md_p)
>  {
>char *enum_name, *value_name;
> -  struct md_name name;
> +  unsigned int cur_value;
> +  struct md_name name, value;
> +  bool value_given;

This flag is not needed at all. So please ignore V2.
See V3 please.

>struct enum_type *def;
>struct enum_value *ev;
>void **slot;
> @@ -928,6 +930,7 @@ md_reader::handle_enum (file_location loc, bool md_p)
>*slot = def;
>  }
>
> +  cur_value = def->num_values;
>require_char_ws ('[');
>
>while ((c = read_skip_spaces ()) != ']')
> @@ -937,28 +940,45 @@ md_reader::handle_enum (file_location loc, bool md_p)
>   error_at (loc, "unterminated construct");
>   exit (1);
> }
> -  unread_char (c);
> -  read_name (&name);
> +  value_given = false;
> +  if (c == '(')
> +   {
> + read_name (&name);
> + read_name (&value);
> + require_char_ws (')');
> + cur_value = atoi(value.string);
> + value_given = true;
> +   }
> +  else
> +   {
> + unread_char (c);
> + read_name (&name);
> +   }
>
>ev = XNEW (struct enum_value);
>ev->next = 0;
>if (md_p)
> {
>   value_name = concat (def->name, "_", name.string, NULL);
> + if (value_given)
> +   cur_value = atoi (value.string);
>   upcase_string (value_name);
>   ev->name = xstrdup (name.string);
> }
>else
> {
>   value_name = xstrdup (name.string);
> + if (value_given)
> +   cur_value = atoi (value.string);
>   ev->name = value_name;
> }
>ev->def = add_constant (get_md_constants (), value_name,
> - md_decimal_string (def->num_values), def);
> + md_decimal_string (cur_value), def);
>
>*def->tail_ptr = ev;
>def->tail_ptr = &ev->next;
>def->num_values++;
> +  cur_value++;
>  }
>  }
>
> --
> 2.30.2
>


[PATCH v3] md/define_c_enum: support value assignation

2021-08-31 Thread YunQiang Su
Currently, the enums from define_c_enum and define_enum can only
has values one by one from 0.

In fact we can support the behaviour just like C, aka like
  (define_enum "mips_isa" [(mips1 1) mips2 (mips32 32) mips32r2]),
then we can get
  enum mips_isa {
MIPS_ISA_MIPS1 = 1,
MIPS_ISA_MIPS2 = 2,
MIPS_ISA_MIPS32 = 32,
MIPS_ISA_MIPS32R2 = 33
  };

gcc/ChangeLog:
* read-md.c (md_reader::handle_enum): support value assignation.
* doc/md.texi: record define_c_enum value assignation support.
---
 gcc/doc/md.texi |  4 
 gcc/read-md.c   | 21 +
 2 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index f8047aefc..2b41cb7fb 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -11074,6 +11074,8 @@ The syntax is as follows:
 (define_c_enum "@var{name}" [
   @var{value0}
   @var{value1}
+  (@var{value32} 32)
+  @var{value33}
   @dots{}
   @var{valuen}
 ])
@@ -11086,6 +11088,8 @@ in @file{insn-constants.h}:
 enum @var{name} @{
   @var{value0} = 0,
   @var{value1} = 1,
+  @var{value32} = 32,
+  @var{value33} = 33,
   @dots{}
   @var{valuen} = @var{n}
 @};
diff --git a/gcc/read-md.c b/gcc/read-md.c
index bb419e0f6..0fbe924d1 100644
--- a/gcc/read-md.c
+++ b/gcc/read-md.c
@@ -902,7 +902,8 @@ void
 md_reader::handle_enum (file_location loc, bool md_p)
 {
   char *enum_name, *value_name;
-  struct md_name name;
+  unsigned int cur_value;
+  struct md_name name, value;
   struct enum_type *def;
   struct enum_value *ev;
   void **slot;
@@ -928,6 +929,7 @@ md_reader::handle_enum (file_location loc, bool md_p)
   *slot = def;
 }
 
+  cur_value = def->num_values;
   require_char_ws ('[');
 
   while ((c = read_skip_spaces ()) != ']')
@@ -937,8 +939,18 @@ md_reader::handle_enum (file_location loc, bool md_p)
  error_at (loc, "unterminated construct");
  exit (1);
}
-  unread_char (c);
-  read_name (&name);
+  if (c == '(')
+   {
+ read_name (&name);
+ read_name (&value);
+ require_char_ws (')');
+ cur_value = atoi(value.string);
+   }
+  else
+   {
+ unread_char (c);
+ read_name (&name);
+   }
 
   ev = XNEW (struct enum_value);
   ev->next = 0;
@@ -954,11 +966,12 @@ md_reader::handle_enum (file_location loc, bool md_p)
  ev->name = value_name;
}
   ev->def = add_constant (get_md_constants (), value_name,
- md_decimal_string (def->num_values), def);
+ md_decimal_string (cur_value), def);
 
   *def->tail_ptr = ev;
   def->tail_ptr = &ev->next;
   def->num_values++;
+  cur_value++;
 }
 }
 
-- 
2.30.2



[PATCH 2/2] Get rid of all float-int special cases in validate_subreg.

2021-08-31 Thread liuhongt via Gcc-patches
gcc/ChangeLog:

* emit-rtl.c (validate_subreg): Get rid of all float-int
special cases.
---
 gcc/emit-rtl.c | 40 
 1 file changed, 40 deletions(-)

diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
index ff3b4449b37..77ea8948ee8 100644
--- a/gcc/emit-rtl.c
+++ b/gcc/emit-rtl.c
@@ -922,46 +922,6 @@ validate_subreg (machine_mode omode, machine_mode imode,
 
   poly_uint64 regsize = REGMODE_NATURAL_SIZE (imode);
 
-  /* ??? This should not be here.  Temporarily continue to allow word_mode
- subregs of anything.  The most common offender is (subreg:SI (reg:DF)).
- Generally, backends are doing something sketchy but it'll take time to
- fix them all.  */
-  if (omode == word_mode)
-;
-  /* ??? Similarly, e.g. with (subreg:DF (reg:TI)).  Though store_bit_field
- is the culprit here, and not the backends.  */
-  else if (known_ge (osize, regsize) && known_ge (isize, osize))
-;
-  /* Allow component subregs of complex and vector.  Though given the below
- extraction rules, it's not always clear what that means.  */
-  else if ((COMPLEX_MODE_P (imode) || VECTOR_MODE_P (imode))
-  && GET_MODE_INNER (imode) == omode)
-;
-  /* ??? x86 sse code makes heavy use of *paradoxical* vector subregs,
- i.e. (subreg:V4SF (reg:SF) 0) or (subreg:V4SF (reg:V2SF) 0).  This
- surely isn't the cleanest way to represent this.  It's questionable
- if this ought to be represented at all -- why can't this all be hidden
- in post-reload splitters that make arbitrarily mode changes to the
- registers themselves.  */
-  else if (VECTOR_MODE_P (omode)
-  && GET_MODE_INNER (omode) == GET_MODE_INNER (imode))
-;
-  /* Subregs involving floating point modes are not allowed to
- change size.  Therefore (subreg:DI (reg:DF) 0) is fine, but
- (subreg:SI (reg:DF) 0) isn't.  */
-  else if (FLOAT_MODE_P (imode) || FLOAT_MODE_P (omode))
-{
-  if (! (known_eq (isize, osize)
-/* LRA can use subreg to store a floating point value in
-   an integer mode.  Although the floating point and the
-   integer modes need the same number of hard registers,
-   the size of floating point mode can be less than the
-   integer mode.  LRA also uses subregs for a register
-   should be used in different mode in on insn.  */
-|| lra_in_progress))
-   return false;
-}
-
   /* Paradoxical subregs must have offset zero.  */
   if (maybe_gt (osize, isize))
 return known_eq (offset, 0U);
-- 
2.27.0



[PATCH 0/2] Get rid of all float-int special cases in validate_subreg.

2021-08-31 Thread liuhongt via Gcc-patches
Hi:
  There's 2 patches, the first patch revert my r12-3218 which caused ICE
in PR102133, the second one remove all float-int special cases in
validate_subreg as suggested in [1].

  Bootstrapped and regtested on x86_64-linux-gnu{-m32,}.
  Ok for trunk?

PS: I am building SPEC2017 and eembc to see whether binaries are the same as
HEAD~2, i guess they're the same.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2021-August/578189.html.

liuhongt (2):
  Revert "Make sure we're playing with integral modes before call
extract_integral_bit_field."
  Get rid of all float-int special cases in validate_subreg.

 gcc/emit-rtl.c |  40 ---
 gcc/expmed.c   | 103 -
 2 files changed, 25 insertions(+), 118 deletions(-)

-- 
2.27.0



[PATCH 1/2] Revert "Make sure we're playing with integral modes before call extract_integral_bit_field."

2021-08-31 Thread liuhongt via Gcc-patches
This reverts commit 7218c2ec365ce95f5a1012a6eb425b0a36aec6bf.

 PR middle-end/102133
---
 gcc/expmed.c | 103 +--
 1 file changed, 25 insertions(+), 78 deletions(-)

diff --git a/gcc/expmed.c b/gcc/expmed.c
index f083d6e86d0..3143f38e057 100644
--- a/gcc/expmed.c
+++ b/gcc/expmed.c
@@ -71,14 +71,7 @@ static void store_split_bit_field (rtx, opt_scalar_int_mode,
 static rtx extract_integral_bit_field (rtx, opt_scalar_int_mode,
   unsigned HOST_WIDE_INT,
   unsigned HOST_WIDE_INT, int, rtx,
-  machine_mode, machine_mode,
-  scalar_int_mode, bool, bool);
-static rtx extract_and_convert_fixed_bit_field (scalar_int_mode,
-   machine_mode, machine_mode,
-   rtx, opt_scalar_int_mode,
-   unsigned HOST_WIDE_INT,
-   unsigned HOST_WIDE_INT, rtx,
-   int, bool);
+  machine_mode, machine_mode, bool, bool);
 static rtx extract_fixed_bit_field (machine_mode, rtx, opt_scalar_int_mode,
unsigned HOST_WIDE_INT,
unsigned HOST_WIDE_INT, rtx, int, bool);
@@ -1639,7 +1632,6 @@ extract_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, 
poly_uint64 bitnum,
 {
   rtx op0 = str_rtx;
   machine_mode mode1;
-  scalar_int_mode int_tmode;
 
   if (tmode == VOIDmode)
 tmode = mode;
@@ -1861,46 +1853,10 @@ extract_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, 
poly_uint64 bitnum,
   /* It's possible we'll need to handle other cases here for
  polynomial bitnum and bitsize.  */
 
-  /* Make sure we are playing with integral modes.  Pun with subregs
- if we aren't. When tmode is HFmode, op0 is SImode, there will be ICE
- in extract_integral_bit_field.  */
-  opt_scalar_int_mode target_imode = int_mode_for_mode (tmode);
-  if (!target_imode.exists (&int_tmode) || int_tmode != tmode)
-{
-  if (target_imode.exists (&int_tmode))
-   {
- rtx ret = extract_integral_bit_field (op0, op0_mode,
-   bitsize.to_constant (),
-   bitnum.to_constant (),
-   unsignedp, NULL, int_tmode,
-   int_tmode, int_tmode,
-   reverse, fallback_p);
- gcc_assert (ret);
-
- if (!REG_P (ret))
-   ret = force_reg (int_tmode, ret);
- return gen_lowpart_SUBREG (tmode, ret);
-   }
-  else
-   {
- if (!fallback_p)
-   return NULL;
-
- int_tmode = int_mode_for_mode (mode).require ();
- return extract_and_convert_fixed_bit_field (int_tmode, tmode, mode,
- op0, op0_mode,
- bitsize.to_constant (),
- bitnum.to_constant (),
- target, unsignedp,
- reverse);
-   }
-}
-
   /* From here on we need to be looking at a fixed-size insertion.  */
   return extract_integral_bit_field (op0, op0_mode, bitsize.to_constant (),
 bitnum.to_constant (), unsignedp,
-target, mode, tmode,
-int_tmode, reverse, fallback_p);
+target, mode, tmode, reverse, fallback_p);
 }
 
 /* Subroutine of extract_bit_field_1, with the same arguments, except
@@ -1913,7 +1869,6 @@ extract_integral_bit_field (rtx op0, opt_scalar_int_mode 
op0_mode,
unsigned HOST_WIDE_INT bitsize,
unsigned HOST_WIDE_INT bitnum, int unsignedp,
rtx target, machine_mode mode, machine_mode tmode,
-   scalar_int_mode int_tmode,
bool reverse, bool fallback_p)
 {
   /* Handle fields bigger than a word.  */
@@ -2080,10 +2035,29 @@ extract_integral_bit_field (rtx op0, 
opt_scalar_int_mode op0_mode,
   if (!fallback_p)
 return NULL;
 
-  return extract_and_convert_fixed_bit_field (int_tmode, tmode, mode,
- op0, op0_mode, bitsize,
- bitnum, target, unsignedp,
- reverse);
+  /* Find a correspondingly-sized integer field, so we can apply
+ shifts and masks to it.  */
+  scalar_int_mode int_mode;
+  if (!int_mode_for_mode (tmode).exists (&in

Re: [PATCH] Make sure we're playing with integral modes before call extract_integral_bit_field.

2021-08-31 Thread Richard Biener via Gcc-patches
On Tue, Aug 31, 2021 at 8:48 AM Hongtao Liu  wrote:
>
> On Tue, Aug 31, 2021 at 2:30 PM Hongtao Liu  wrote:
> >
> > On Tue, Aug 31, 2021 at 2:11 PM Richard Biener
> >  wrote:
> > >
> > > On Fri, Aug 27, 2021 at 6:50 AM Hongtao Liu  wrote:
> > > >
> > > > On Thu, Aug 26, 2021 at 7:09 PM Richard Biener via Gcc-patches
> > > >  wrote:
> > > > >
> > > > > On Thu, Aug 26, 2021 at 12:50 PM Richard Sandiford
> > > > >  wrote:
> > > > > >
> > > > > > Richard Biener via Gcc-patches  writes:
> > > > > > > On Thu, Aug 26, 2021 at 11:06 AM Richard Sandiford
> > > > > > >  wrote:
> > > > > > >>
> > > > > > >> Richard Biener via Gcc-patches  writes:
> > > > > > >> > One thought I had is whether we can "fix" validate_subreg to 
> > > > > > >> > have less
> > > > > > >> > "weird" allowed float-int
> > > > > > >> > special cases.  As said upthread I think that we either should 
> > > > > > >> > allow
> > > > > > >> > all of those, implying that
> > > > > > >> > subregs work semantically as if there's subregs to same-sized 
> > > > > > >> > integer
> > > > > > >> > modes inbetween or
> > > > > > >> > disallow them all and make sure we're actually doing that 
> > > > > > >> > explicitely.
> > > > > > >> >
> > > > > > >> > For example
> > > > > > >> >
> > > > > > >> >   /* ??? Similarly, e.g. with (subreg:DF (reg:TI)).  Though 
> > > > > > >> > store_bit_field
> > > > > > >> >  is the culprit here, and not the backends.  */
> > > > > > >> >   else if (known_ge (osize, regsize) && known_ge (isize, 
> > > > > > >> > osize))
> > > > > > >> > ;
> > > > > > >> >
> > > > > > >> > I can't decipther rtl.text as to what the semantics of such a 
> > > > > > >> > subreg is
> > > > > > >> > given the docs hand-wave about WORDS_BIG_ENDIAN vs.
> > > > > > >> > FLOAT_WORDS_BIG_ENDIAN but don't actually say what happens
> > > > > > >> > when you mix those in a subreg.  So maybe the above should
> > > > > > >> > have explicitely have WORDS_BIG_ENDIAN == 
> > > > > > >> > FLOAT_WORDS_BIG_ENDIAN.
> > > > > > >> >
> > > > > > >> > But then the world would be much simpler if subregs of 
> > > > > > >> > non-same size
> > > > > > >> > modes have explicit documentation for the mode kinds we have.
> > > > > > >>
> > > > > > >> Yeah.  Although validate_subreg was a good idea, some of the 
> > > > > > >> mode checks
> > > > > > >> are IMO a failed experiment.  The hope was that eventually we'd 
> > > > > > >> remove
> > > > > > >> all those special exceptions once the culprit has been fixed.  
> > > > > > >> However,
> > > > > > >> the code is over 16 years old at this point and those changes 
> > > > > > >> never
> > > > > > >> happened.
> > > > > > >>
> > > > > > >> Nested subregs aren't a thing (thankfully) and one of the big 
> > > > > > >> disadvantages
> > > > > > >> of the current validate_subreg mode-changing rules is that they 
> > > > > > >> aren't
> > > > > > >> transitive.  This can artificially require temporary pseudos for 
> > > > > > >> things
> > > > > > >> that could be expressed directly as a single subreg.
> > > > > > >
> > > > > > > And that's what the proposed patch does (add same-mode size 
> > > > > > > integer mode
> > > > > > > punning intermediate subregs).
> > > > > > >
> > > > > > > So if that's not supposed to be necessary then why restrict 
> > > > > > > subregs at all?
> > > > > >
> > > > > > I was trying to say: I'm not sure we should.
> > > > > >
> > > > > > > I mean you seem to imply that the semantics would be clear and 
> > > > > > > well-defined
> > > > > > > (to you - not to me).  The only thing is that of course not all 
> > > > > > > subregs are
> > > > > > > "implemented" by a target (or can be, w/o spilling).
> > > > > >
> > > > > > Yeah.  That's for TARGET_CAN_CHANGE_MODE_CLASS to decide.
> > > > > > But it only comes in to play during RA or when trying to take
> > > > > > the subreg of a particular hard register.  Transitivity doesn't
> > > > > > matter so much for the hard register case since the result of
> > > > > > simplify_gen_subreg should then be another hard register.
> > > > > >
> > > > > > > Which means - we should adjust validate_subreg with another 
> > > > > > > special-case
> > > > > > > or rather generalize the existing ones to an overall set that 
> > > > > > > makes more
> > > > > > > sense?
> > > > > >
> > > > > > Maybe it's too radical, but I would whether we should just get rid 
> > > > > > of:
> > > > > >
> > > > > >   /* ??? This should not be here.  Temporarily continue to allow 
> > > > > > word_mode
> > > > > >  subregs of anything.  The most common offender is (subreg:SI 
> > > > > > (reg:DF)).
> > > > > >  Generally, backends are doing something sketchy but it'll take 
> > > > > > time to
> > > > > >  fix them all.  */
> > > > > >   if (omode == word_mode)
> > > > > > ;
> > > > > >   /* ??? Similarly, e.g. with (subreg:DF (reg:TI)).  Though 
> > > > > > store_bit_field
> > > > > >  is the culprit here, and not the backends.  */
> > > > > >   else if (kn

[PATCH v2] md/define_c_enum: support value assignation

2021-08-31 Thread YunQiang Su
Currently, the enums from define_c_enum and define_enum can only
has values one by one from 0.

In fact we can support the behaviour just like C, aka like
  (define_enum "mips_isa" [(mips1 1) mips2 (mips32 32) mips32r2]),
then we can get
  enum mips_isa {
MIPS_ISA_MIPS1 = 1,
MIPS_ISA_MIPS2 = 2,
MIPS_ISA_MIPS32 = 32,
MIPS_ISA_MIPS32R2 = 33
  };

gcc/ChangeLog:
* read-md.c (md_reader::handle_enum): support value assignation.
* doc/md.texi: record define_c_enum value assignation support.
---
 gcc/doc/md.texi |  4 
 gcc/read-md.c   | 28 
 2 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index f8047aefc..2b41cb7fb 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -11074,6 +11074,8 @@ The syntax is as follows:
 (define_c_enum "@var{name}" [
   @var{value0}
   @var{value1}
+  (@var{value32} 32)
+  @var{value33}
   @dots{}
   @var{valuen}
 ])
@@ -11086,6 +11088,8 @@ in @file{insn-constants.h}:
 enum @var{name} @{
   @var{value0} = 0,
   @var{value1} = 1,
+  @var{value32} = 32,
+  @var{value33} = 33,
   @dots{}
   @var{valuen} = @var{n}
 @};
diff --git a/gcc/read-md.c b/gcc/read-md.c
index bb419e0f6..2d01c69fc 100644
--- a/gcc/read-md.c
+++ b/gcc/read-md.c
@@ -902,7 +902,9 @@ void
 md_reader::handle_enum (file_location loc, bool md_p)
 {
   char *enum_name, *value_name;
-  struct md_name name;
+  unsigned int cur_value;
+  struct md_name name, value;
+  bool value_given;
   struct enum_type *def;
   struct enum_value *ev;
   void **slot;
@@ -928,6 +930,7 @@ md_reader::handle_enum (file_location loc, bool md_p)
   *slot = def;
 }
 
+  cur_value = def->num_values;
   require_char_ws ('[');
 
   while ((c = read_skip_spaces ()) != ']')
@@ -937,28 +940,45 @@ md_reader::handle_enum (file_location loc, bool md_p)
  error_at (loc, "unterminated construct");
  exit (1);
}
-  unread_char (c);
-  read_name (&name);
+  value_given = false;
+  if (c == '(')
+   {
+ read_name (&name);
+ read_name (&value);
+ require_char_ws (')');
+ cur_value = atoi(value.string);
+ value_given = true;
+   }
+  else
+   {
+ unread_char (c);
+ read_name (&name);
+   }
 
   ev = XNEW (struct enum_value);
   ev->next = 0;
   if (md_p)
{
  value_name = concat (def->name, "_", name.string, NULL);
+ if (value_given)
+   cur_value = atoi (value.string);
  upcase_string (value_name);
  ev->name = xstrdup (name.string);
}
   else
{
  value_name = xstrdup (name.string);
+ if (value_given)
+   cur_value = atoi (value.string);
  ev->name = value_name;
}
   ev->def = add_constant (get_md_constants (), value_name,
- md_decimal_string (def->num_values), def);
+ md_decimal_string (cur_value), def);
 
   *def->tail_ptr = ev;
   def->tail_ptr = &ev->next;
   def->num_values++;
+  cur_value++;
 }
 }
 
-- 
2.30.2



[PATCH] tree-optimization/102142 - fix typo in loop BB reduc cost adjustment

2021-08-31 Thread Richard Biener via Gcc-patches
This fixes a typo in the condition guarding the cleanup of the
visited flag of costed scalar stmts.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-08-31  Richard Biener  

PR tree-optimization/102142
* tree-vect-slp.c (vect_bb_vectorization_profitable_p): Fix
condition under which to unset the visited flag.

* g++.dg/torture/pr102142.C: New testcase.
---
 gcc/testsuite/g++.dg/torture/pr102142.C | 9 +
 gcc/tree-vect-slp.c | 2 +-
 2 files changed, 10 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/torture/pr102142.C

diff --git a/gcc/testsuite/g++.dg/torture/pr102142.C 
b/gcc/testsuite/g++.dg/torture/pr102142.C
new file mode 100644
index 000..8e3ea5d96b5
--- /dev/null
+++ b/gcc/testsuite/g++.dg/torture/pr102142.C
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+
+extern short arr_597[];
+extern bool arr_601[];
+int test_var_13;
+void test(short arr_391[][9][2][2]) {
+  for (int i_60 = 0; i_60 < 11; i_60 += test_var_13)
+arr_597[22] = arr_601[i_60] = arr_391[0][0][1][4];
+}
diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index 4ca24408249..fa3566f3d06 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -5396,7 +5396,7 @@ vect_bb_vectorization_profitable_p (bb_vec_info bb_vinfo,
 
   /* Unset visited flag.  This is delayed when the subgraph is profitable
  and we process the loop for remaining unvectorized if-converted code.  */
-  if (orig_loop && !profitable)
+  if (!orig_loop || !profitable)
 FOR_EACH_VEC_ELT (scalar_costs, i, cost)
   gimple_set_visited  (cost->stmt_info->stmt, false);
 
-- 
2.31.1


Re: [PATCH] Check the type of mask while generating cond_op in gimple simplication.

2021-08-31 Thread Hongtao Liu via Gcc-patches
On Mon, Aug 30, 2021 at 8:25 PM Richard Biener via Gcc-patches
 wrote:
>
> On Fri, Aug 27, 2021 at 8:53 AM liuhongt  wrote:
> >
> >   When gimple simplifcation try to combine op and vec_cond_expr to cond_op,
> > it doesn't check if mask type matches. It causes an ICE when expand cond_op
> > with mismatched mode.
> >   This patch add a function named cond_vectorized_internal_fn_supported_p
> >  to additionally check mask type than vectorized_internal_fn_supported_p.
> >
> >   Bootstrapped and regtested on x86_64-linux-gnu{-m32,}.
> >   Ok for trunk?
> >
> > gcc/ChangeLog:
> >
> > PR middle-end/102080
> > * internal-fn.c (cond_vectorized_internal_fn_supported_p): New 
> > functions.
> > * internal-fn.h (cond_vectorized_internal_fn_supported_p): New 
> > declaration.
> > * match.pd: Check the type of mask while generating cond_op in
> > gimple simplication.
> >
> > gcc/testsuite/ChangeLog:
> >
> > PR middle-end/102080
> > * gcc.target/i386/pr102080.c: New test.
> > ---
> >  gcc/internal-fn.c| 22 ++
> >  gcc/internal-fn.h|  1 +
> >  gcc/match.pd | 24 
> >  gcc/testsuite/gcc.target/i386/pr102080.c | 16 
> >  4 files changed, 55 insertions(+), 8 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr102080.c
> >
> > diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
> > index 1360a00f0b9..8b2b65db1a7 100644
> > --- a/gcc/internal-fn.c
> > +++ b/gcc/internal-fn.c
> > @@ -4102,6 +4102,28 @@ expand_internal_call (gcall *stmt)
> >expand_internal_call (gimple_call_internal_fn (stmt), stmt);
> >  }
> >
> > +/* Check cond_op for vector modes since vectorized_internal_fn_supported_p
> > +   doesn't check if mask type matches.  */
> > +bool
> > +cond_vectorized_internal_fn_supported_p (internal_fn ifn, tree type,
> > +tree mask_type)
> > +{
> > +  if (!vectorized_internal_fn_supported_p (ifn, type))
> > +return false;
> > +
> > +  machine_mode mask_mode;
> > +  machine_mode vmode = TYPE_MODE (type);
> > +  int size1, size2;
> > +  if (VECTOR_MODE_P (vmode)
> > +  && targetm.vectorize.get_mask_mode (vmode).exists(&mask_mode)
> > +  && GET_MODE_SIZE (mask_mode).is_constant (&size1)
> > +  && GET_MODE_SIZE (TYPE_MODE (mask_type)).is_constant (&size2)
> > +  && size1 != size2)
>
> Why do we check for equal size rather than just mode equality which
I originally thought  TYPE_MODE of vector(8)  was
not QImode, Changed the patch to check mode equality.
Update patch.

> I think would work for non-constant sized modes as well?  And when
> using sizes you'd instead use maybe_ne (GET_MODE_SIZE (mask_mode),
> GET_MODE_SIZE (TYPE_MODE (mask_type)))
>
> Thanks,
> Richard.
>
> > +return false;
> > +
> > +  return true;
> > +}
> > +
> >  /* If TYPE is a vector type, return true if IFN is a direct internal
> > function that is supported for that type.  If TYPE is a scalar type,
> > return true if IFN is a direct internal function that is supported for
> > diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h
> > index 19d0f849a5a..f0aea00103c 100644
> > --- a/gcc/internal-fn.h
> > +++ b/gcc/internal-fn.h
> > @@ -236,5 +236,6 @@ extern void expand_PHI (internal_fn, gcall *);
> >  extern void expand_SHUFFLEVECTOR (internal_fn, gcall *);
> >
> >  extern bool vectorized_internal_fn_supported_p (internal_fn, tree);
> > +extern bool cond_vectorized_internal_fn_supported_p (internal_fn, tree, 
> > tree);
> >
> >  #endif
> > diff --git a/gcc/match.pd b/gcc/match.pd
> > index e5bbb123a6a..72b1bc674db 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -6987,14 +6987,18 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >   cond_op (COND_BINARY)
> >   (simplify
> >(vec_cond @0 (view_convert? (uncond_op@4 @1 @2)) @3)
> > -  (with { tree op_type = TREE_TYPE (@4); }
> > -   (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), 
> > op_type)
> > +  (with { tree op_type = TREE_TYPE (@4);
> > + tree mask_type = TREE_TYPE (@0); }
> > +   (if (cond_vectorized_internal_fn_supported_p (as_internal_fn (cond_op),
> > +op_type, mask_type)
> > && element_precision (type) == element_precision (op_type))
> >  (view_convert (cond_op @0 @1 @2 (view_convert:op_type @3))
> >   (simplify
> >(vec_cond @0 @1 (view_convert? (uncond_op@4 @2 @3)))
> > -  (with { tree op_type = TREE_TYPE (@4); }
> > -   (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), 
> > op_type)
> > +  (with { tree op_type = TREE_TYPE (@4);
> > + tree mask_type = TREE_TYPE (@0); }
> > +   (if (cond_vectorized_internal_fn_supported_p (as_internal_fn (cond_op),
> > +op_type, mask_type)
> > && element_precision (type) == element_precision (op_type))
> >  

[PATCH] C: PR c/79412: Poison decls with error_mark_node after type mismatch

2021-08-31 Thread Roger Sayle

This patch fixes an ICE during error-recovery regression in the C front-end.
The symptom is that the middle-end's sanity checking assertions fail during
gimplification when being asked to increment an array, which is non-sense.
The issue is that the C-front end has detected the type mismatch and
reported an error to the user, but hasn't provided any indication of this
to the middle-end, simply passing bogus trees that the optimizers recognize
as invalid.

This appears to be a frequently reported ICE with 94730, 94731, 101036
and 101365 all marked as duplicates.

I believe the correct (polite) fix is to mark the mismatched types as
problematic/dubious in the front-end, when the error is spotted, so that
the middle-end has a heads-up and can be a little more forgiving.  This
patch to c-decl.c's duplicate_decls sets (both) mismatched types to
error_mark_node if they are significantly different, and we've issued
an error message.  Alas, this is too punitive for FUNCTION_DECLs where
we store return types, parameter lists, parameter types and attributes
in the type, but fortunately the middle-end is already more cautious
about trusting possibly suspect function types.

This fix required one minor change to the testsuite, typedef-var-2.c
where after conflicting type definitions, we now no longer assume that
the (first or) second definition is the correct one.  This change only
affects the behaviour after seen_error(), so should be relatively safe.

This patch has been tested on x86_64-pc-linux-gnu with "make bootstrap"
and "make -k check" with no new failures.  Ok for mainline?


2020-08-31  Roger Sayle  

gcc/c/ChangeLog
PR c/79412
* c-decl.c (duplicate_decls): On significant mismatches, mark the
types of both (non-function) decls as error_mark_node, so that the
middle-end can see the code is malformed.
(free_attr_access_data): Don't process if the type has been set to
error_mark_node.

gcc/testsuite/ChangeLog
PR c/79412
* gcc.dg/pr79412.c: New test case.
* gcc.dg/typedef-var-2.c: Update expeted errors.

Roger
--

diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
index 221a67f..52fa2ca 100644
--- a/gcc/c/c-decl.c
+++ b/gcc/c/c-decl.c
@@ -2957,6 +2957,17 @@ duplicate_decls (tree newdecl, tree olddecl)
 {
   /* Avoid `unused variable' and other warnings for OLDDECL.  */
   suppress_warning (olddecl, OPT_Wunused);
+  /* If the types are completely different, poison them both with
+error_mark_node.  */
+  if (TREE_CODE (TREE_TYPE (newdecl)) != TREE_CODE (TREE_TYPE (olddecl))
+ && olddecl != error_mark_node
+ && seen_error())
+   {
+ if (TREE_CODE (olddecl) != FUNCTION_DECL)
+   TREE_TYPE (olddecl) = error_mark_node;
+ if (TREE_CODE (newdecl) != FUNCTION_DECL)
+   TREE_TYPE (newdecl) = error_mark_node;
+   }
   return false;
 }
 
@@ -12209,7 +12220,7 @@ free_attr_access_data ()
  attr_access::free_lang_data (attrs);
 
   tree fntype = TREE_TYPE (n->decl);
-  if (!fntype)
+  if (!fntype || fntype == error_mark_node)
continue;
   tree attrs = TYPE_ATTRIBUTES (fntype);
   if (!attrs)
diff --git a/gcc/testsuite/gcc.dg/typedef-var-2.c 
b/gcc/testsuite/gcc.dg/typedef-var-2.c
index 716d29c..bc119a0 100644
--- a/gcc/testsuite/gcc.dg/typedef-var-2.c
+++ b/gcc/testsuite/gcc.dg/typedef-var-2.c
@@ -4,12 +4,13 @@
 int f (void)
 {
   extern float v;   
-
+/* { dg-message "note: previous declaration" "previous declaration" { target 
*-*-* } .-1 } */
   return (v > 0.0f);
 }
 
 extern int t;
+/* { dg-message "note: previous declaration" "previous declaration" { target 
*-*-* } .-1 } */
 
 typedef float t; /* { dg-error "redeclared as different kind of symbol" } */
 
-t v = 4.5f;
+t v = 4.5f;  /* { dg-error "conflicting types" } */
/* { dg-do compile } */
/* { dg-options "-O2" } */
int a;
/* { dg-message "note: previous declaration" "previous declaration" { target 
*-*-* } .-1 } */
void fn1 ()
{
  a++;
}
int a[] = {2};  /* { dg-error "conflicting types" } */


Re: [RFA] Some libgcc headers are missing the runtime exception

2021-08-31 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> On Mon, Aug 30, 2021 at 12:59 PM Thomas Schwinge
>  wrote:
>>
>> Hi!
>>
>> Ping.  For easy reference I've again attached Richard Sandiford's
>> "libgcc: Add missing runtime exception notices".
>>
>> On 2021-07-12T17:34:09+0100, Richard Sandiford via Gcc-patches 
>>  wrote:
>> > David Edelsohn  writes:
>> >> On Mon, Jul 12, 2021 at 11:58 AM Richard Sandiford
>> >>  wrote:
>> >>> David Edelsohn  writes:
>> >>> > On Fri, Jul 9, 2021 at 1:31 PM Richard Sandiford
>> >>> >  wrote:
>> >>> >> David Edelsohn  writes:
>> >>> >> > On Fri, Jul 9, 2021 at 12:53 PM Richard Sandiford via Gcc
>> >>> >> >  wrote:
>> >>> >> >> It was pointed out to me off-list that 
>> >>> >> >> config/aarch64/value-unwind.h
>> >>> >> >> is missing the runtime exception.  It looks like a few other files
>> >>> >> >> are too; a fuller list is:
>> >>> >> >>
>> >>> >> >> libgcc/config/aarch64/value-unwind.h
>> >>> >> >> libgcc/config/frv/frv-abi.h
>> >>> >> >> libgcc/config/i386/value-unwind.h
>> >>> >> >> libgcc/config/pa/pa64-hpux-lib.h
>> >>> >> >>
>> >>> >> >> Certainly for the aarch64 file this was simply a mistake;
>> >>> >> >> it seems to have been copied from the i386 version, both of which
>> >>> >> >> reference the runtime exception but don't actually include it.
>> >>> >> >>
>> >>> >> >> What's the procedure for fixing this?  Can we treat it as a textual
>> >>> >> >> error or do the files need to be formally relicensed?
>> >>> >> >
>> >>> >> > I'm unsure what you mean by "formally relicensed".
>> >>> >>
>> >>> >> It seemed like there were two possibilities: the licence of the files
>> >>> >> is actually GPL + exception despite what the text says (the textual
>> >>> >> error case), or the licence of the files is plain GPL because the text
>> >>> >> has said so since the introduction of the files.  In the latter case
>> >>> >> I'd have imagined that someone would need to relicense the code so
>> >>> >> that it is GPL + exception.
>> >>> >>
>> >>> >> > It generally is considered a textual omission.  The runtime library
>> >>> >> > components of GCC are intended to be licensed under the runtime
>> >>> >> > exception, which was granted and approved at the time of 
>> >>> >> > introduction.
>> >>> >>
>> >>> >> OK, thanks.  So would a patch to fix at least the i386 and aarch64 
>> >>> >> header
>> >>> >> files be acceptable?  (I'm happy to fix the other two as well if 
>> >>> >> that's
>> >>> >> definitely the right thing to do.  It's just that there's more history
>> >>> >> involved there…)
>> >>> >
>> >>> > Please correct the text in the files. The files in libgcc used in the
>> >>> > GCC runtime are intended to be licensed with the runtime exception and
>> >>> > GCC previously was granted approval for that licensing and purpose.
>> >>> >
>> >>> > As you are asking the question, I sincerely doubt that ARM and Cavium
>> >>> > intended to apply a license without the exception to those files.  And
>> >>> > similarly for Intel and FRV.
>> >>>
>> >>> FTR, I think only Linaro (rather than Arm) touched the aarch64 file.
>> >>>
>> >>> > The runtime exception explicitly was intended for this purpose and
>> >>> > usage at the time that GCC received approval to apply the exception.
>> >>>
>> >>> Ack.  Is the patch below OK for trunk and branches?
>> >>
>> >> I'm not certain whom you are asking for approval,
>> >
>> > I was assuming it would need a global reviewer.
>> >
>> >> but it looks good to me.
>> >
>> > Thanks.
>>
>> So in addition to David, would a Global Reviewer please review this?
>
> OK.

Thanks, now pushed to GCC 9+.

Richard


Re: [PATCH 1/3] md/define_c_enum: support value assignation

2021-08-31 Thread Richard Sandiford via Gcc-patches
YunQiang Su  writes:
> Currently, the enums from define_c_enum and define_enum can only
> has values one by one from 0.
>
> In fact we can support the behaviour just like C, aka like
>   (define_enum "mips_isa" [mips1=1 mips2 mips32=32 mips32r2]),
> then we can get
>   enum mips_isa {
> MIPS_ISA_MIPS1 = 1,
> MIPS_ISA_MIPS2 = 2,
> MIPS_ISA_MIPS32 = 32,
> MIPS_ISA_MIPS32R2 = 33
>   };
>
> gcc/ChangeLog:
>   * read-md.c (md_reader::handle_enum): support value assignation.
>   * doc/md.texi: record define_c_enum value assignation support.

This seems like a nice feature to have.  However, the current (historical,
more lisp-like) syntax for define_constants uses:

  (NAME VALUE)

instead of:

  NAME = VALUE

I think we should do the same here for consistency.

Thanks,
Richard

> ---
>  gcc/doc/md.texi |  4 
>  gcc/read-md.c   | 15 +--
>  2 files changed, 17 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index f8047aefc..1c1282c4c 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -11074,6 +11074,8 @@ The syntax is as follows:
>  (define_c_enum "@var{name}" [
>@var{value0}
>@var{value1}
> +  @var{value32}=32
> +  @var{value33}
>@dots{}
>@var{valuen}
>  ])
> @@ -11086,6 +11088,8 @@ in @file{insn-constants.h}:
>  enum @var{name} @{
>@var{value0} = 0,
>@var{value1} = 1,
> +  @var{value32} = 32,
> +  @var{value33} = 33,
>@dots{}
>@var{valuen} = @var{n}
>  @};
> diff --git a/gcc/read-md.c b/gcc/read-md.c
> index bb419e0f6..43dfbe264 100644
> --- a/gcc/read-md.c
> +++ b/gcc/read-md.c
> @@ -901,7 +901,8 @@ md_decimal_string (int number)
>  void
>  md_reader::handle_enum (file_location loc, bool md_p)
>  {
> -  char *enum_name, *value_name;
> +  char *enum_name, *value_name, *token;
> +  unsigned int cur_value;
>struct md_name name;
>struct enum_type *def;
>struct enum_value *ev;
> @@ -928,6 +929,7 @@ md_reader::handle_enum (file_location loc, bool md_p)
>*slot = def;
>  }
>  
> +  cur_value = def->num_values;
>require_char_ws ('[');
>  
>while ((c = read_skip_spaces ()) != ']')
> @@ -945,20 +947,29 @@ md_reader::handle_enum (file_location loc, bool md_p)
>if (md_p)
>   {
> value_name = concat (def->name, "_", name.string, NULL);
> +   value_name = strtok (value_name, "=");
> +   token = strtok (NULL, "=");
> +   if (token)
> + cur_value = atoi (token);
> upcase_string (value_name);
> ev->name = xstrdup (name.string);
>   }
>else
>   {
> value_name = xstrdup (name.string);
> +   value_name = strtok (value_name, "=");
> +   token = strtok (NULL, "=");
> +   if (token)
> + cur_value = atoi (token);
> ev->name = value_name;
>   }
>ev->def = add_constant (get_md_constants (), value_name,
> -   md_decimal_string (def->num_values), def);
> +   md_decimal_string (cur_value), def);
>  
>*def->tail_ptr = ev;
>def->tail_ptr = &ev->next;
>def->num_values++;
> +  cur_value++;
>  }
>  }


[PATCH] middle-end/102129 - avoid TER of possibly trapping expressions

2021-08-31 Thread Richard Biener via Gcc-patches
The following avoids applying TER to possibly trapping expressions,
preventing a trapping FP multiplication to be moved across a call
that should not be executed.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-08-31  Richard Biener  

PR middle-end/102129
* tree-ssa-ter.c (find_replaceable_in_bb): Do not move
possibly trapping expressions across calls.
---
 gcc/tree-ssa-ter.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/gcc/tree-ssa-ter.c b/gcc/tree-ssa-ter.c
index 9eb8a11a7c0..3a057c2d25f 100644
--- a/gcc/tree-ssa-ter.c
+++ b/gcc/tree-ssa-ter.c
@@ -658,11 +658,15 @@ find_replaceable_in_bb (temp_expr_table *tab, basic_block 
bb)
 substitution list, or the def and use span a call such that
 we'll expand lifetimes across a call.  We also don't want to
 replace across these expressions that may call libcalls that
-clobber the register involved.  See PR 70184.  */
+clobber the register involved.  See PR 70184.  Neither
+do we want to move possibly trapping expressions across
+a call.  See PRs 102129 and 33593.  */
  if (gimple_has_volatile_ops (stmt) || same_root_var
  || (tab->call_cnt[ver] != cur_call_cnt
- && SINGLE_SSA_USE_OPERAND (SSA_NAME_DEF_STMT (use), 
SSA_OP_USE)
-== NULL_USE_OPERAND_P)
+ && (SINGLE_SSA_USE_OPERAND (SSA_NAME_DEF_STMT (use),
+ SSA_OP_USE)
+   == NULL_USE_OPERAND_P
+ || gimple_could_trap_p (SSA_NAME_DEF_STMT (use
  || tab->reg_vars_cnt[ver] != cur_reg_vars_cnt)
finished_with_expr (tab, ver, true);
  else
-- 
2.31.1


[PATCH] tree-optimization/102139 - fix SLP DR base alignment

2021-08-31 Thread Richard Biener via Gcc-patches
When doing whole-function SLP we have to make sure the recorded
base alignments we compute as the maximum alignment seen for a
base anywhere in the function is actually valid at the point
we want to make use of it.

To make this work we now record the stmt the alignment was derived
from in addition to the DRs innermost behavior and we use a
dominance check to verify the recorded info is valid when doing
BB vectorization.

Note this leaves a small(?) hole for the case where we have sth
like

unaligned DR
call (); // does not return
aligned DR

since we'll derive an aligned access for the earlier DR but the
later DR is never actually reached since the call does not
return.  To plug this hole one option (for the easy backporting)
would be to simply not use the base-alignment recording at all.
Alternatively we'd have to store the dataref grouping 'id' somewhere
in the DR itself and use that to handle this particular case.

For optimal handling we'd need the ability to record different
base alignments based on context, we could hash on the BB and
at query time walk immediate dominators to find the "best"
base alignment.  But I'd rather leave such improvement for trunk.

Any opinions?  The issue looks quite serious and IMHO warrants a
timely fix, even if partial - I'm not sure how often the same-BB
case would trigger.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Thanks,
Richard.

2021-08-31  Richard Biener  

PR tree-optimization/102139
* tree-vectorizer.h (vec_base_alignments): Adjust hash-map
type to record a std::pair of the stmt and the innermost
loop behavior.
* tree-vect-data-refs.c (vect_record_base_alignment): Adjust.
(vect_compute_data_ref_alignment): Verify the recorded
base alignment can be used.
---
 gcc/tree-vect-data-refs.c | 19 ---
 gcc/tree-vectorizer.h |  7 ---
 2 files changed, 16 insertions(+), 10 deletions(-)

diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index 37f46d1aaa3..e2549811961 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -895,11 +895,11 @@ vect_record_base_alignment (vec_info *vinfo, 
stmt_vec_info stmt_info,
innermost_loop_behavior *drb)
 {
   bool existed;
-  innermost_loop_behavior *&entry
+  std::pair &entry
 = vinfo->base_alignments.get_or_insert (drb->base_address, &existed);
-  if (!existed || entry->base_alignment < drb->base_alignment)
+  if (!existed || entry.second->base_alignment < drb->base_alignment)
 {
-  entry = drb;
+  entry = std::make_pair (stmt_info->stmt, drb);
   if (dump_enabled_p ())
dump_printf_loc (MSG_NOTE, vect_location,
 "recording new base alignment for %T\n"
@@ -1060,11 +1060,16 @@ vect_compute_data_ref_alignment (vec_info *vinfo, 
dr_vec_info *dr_info)
 
   /* Calculate the maximum of the pooled base address alignment and the
  alignment that we can compute for DR itself.  */
-  innermost_loop_behavior **entry = base_alignments->get (drb->base_address);
-  if (entry && base_alignment < (*entry)->base_alignment)
+  std::pair *entry
+= base_alignments->get (drb->base_address);
+  if (entry
+  && base_alignment < (*entry).second->base_alignment
+  && (loop_vinfo
+ || dominated_by_p (CDI_DOMINATORS, gimple_bb (stmt_info->stmt),
+gimple_bb (entry->first
 {
-  base_alignment = (*entry)->base_alignment;
-  base_misalignment = (*entry)->base_misalignment;
+  base_alignment = entry->second->base_alignment;
+  base_misalignment = entry->second->base_misalignment;
 }
 
   if (drb->offset_alignment < vect_align_c
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 72e018e8eac..8db642c7dc3 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -106,10 +106,11 @@ struct stmt_info_for_cost {
 
 typedef vec stmt_vector_for_cost;
 
-/* Maps base addresses to an innermost_loop_behavior that gives the maximum
-   known alignment for that base.  */
+/* Maps base addresses to an innermost_loop_behavior and the stmt it was
+   derived from that gives the maximum known alignment for that base.  */
 typedef hash_map vec_base_alignments;
+std::pair >
+ vec_base_alignments;
 
 /
   SLP
-- 
2.31.1


Re: [PATCH] c++, abi: Set DECL_FIELD_ABI_IGNORED on C++ zero width bitfields [PR102024]

2021-08-31 Thread Richard Biener via Gcc-patches
On Tue, 31 Aug 2021, Jakub Jelinek wrote:

> On Tue, Aug 31, 2021 at 09:57:44AM +0200, Richard Biener wrote:
> > Just to clarify - in the C++ FE these fields are meaningful for
> > layout purposes but they are only supposed to influence layout
> > but not ABI (but why does the C++ FE say that?) and thus the
> > 'DECL_FIELD_ABI_IGNORED' is a good term to use?  But we still want
> > to have the backends decide whether to actually follow this advice
> > and we do expect some to not do this?
> 
> The removal of zero-width bitfields was added (after structure layout)
> by
> https://gcc.gnu.org/legacy-ml/gcc-patches/1999-12/msg00589.html
> https://gcc.gnu.org/legacy-ml/gcc-patches/1999-12/msg00641.html
> The comment about it was:
> /* Delete all zero-width bit-fields from the list of fields.  Now
>that we have layed out the type they are no longer important.  */
> The only spot I see zero-width bit-fields mentioned in the Itanium ABI is:
> 
> empty class
>   A class with no non-static data members other than empty data members,
>   no unnamed bit-fields other than zero-width bit-fields, no virtual 
> functions,
>   no virtual base classes, and no non-empty non-virtual proper base classes. 
> 
> nearly empty class
>   A class that contains a virtual pointer, but no other data except 
> (possibly) virtual bases. In particular, it:
>- has no non-static data members and no non-zero-width unnamed bit-fields,
>- has no direct base classes that are not either empty, nearly empty, or 
> virtual,
>- has at most one non-virtual, nearly empty direct base class, and
>- has no proper base class that is empty, not morally virtual, and at an 
> offset other than zero. 
>   Such classes may be primary base classes even if virtual, sharing a virtual 
> pointer with the derived class. 
> 
> and the removal of remove_zero_width_bit_fields I believe didn't change
> anything on that, e.g. is_empty_class uses CLASSTYPE_EMPTY_P flag whose
> computation takes:
>   if (DECL_C_BIT_FIELD (field)
>   && integer_zerop (DECL_BIT_FIELD_REPRESENTATIVE (field)))
> /* We don't treat zero-width bitfields as making a class
>non-empty.  */
> ;
> into account (that is still before the bit-fields are finalized so
> width is stored differently, and it is necessary before the
> former remove_zero_width_bit_fields call).
> 
> The flag for these zero-width bitfields is a good name for the case
> where a target decides to keep the old GCC 11 ABI of not ignoring them
> for C and ignoring them for C++, in other cases it can be a little bit
> confusing, but I think we could define another macro with the same
> value for it if we find a good name for it (dunno what it would be though).
> But even if we have another name, if we reuse the flag we need to take
> it into account in the target code, and using a different flag would be a
> waste of the precious bits.
> Perhaps just clarify in tree.h above the DECL_FIELD_ABI_IGNORED the cases
> in which it is set?

Yeah, I think it conflates the C++ [Itanium] ABI and the psABI for
calling conventions.  The 'ABI' in DECL_FIELD_ABI_IGNORED refers
to the psABI as far as I understand the situation, but then it
might still be important for the psABI when dealing with
(non-)homogenous aggregates ...

So _maybe_ DECL_FIELD_FOR_LAYOUT might capture the bits better - the
field is present for layout (and possibly ABI), but it doesn't carry
any data so it doesn't have to be passed across function boundary
for example.

Anyway, I'm not stuck to whatever naming we choose but the situation
is complicated enough that we want some more elaborate docs in tree.h
I'll leave the final ACK to Jason (unless he's on vacation).

Thanks,
Richard.


Re: [PATCH] libstdc++-v3: Check for TLS support on mingw

2021-08-31 Thread Jonathan Wakely via Gcc-patches
It looks like my questions about this patch never got an answer, and
it never got applied.

Could somebody say whether TLS is enabled for native *-*-mingw*
builds? If it is, then we definitely need to add GCC_CHECK_TLS to the
cross-compiler config too.

For a linux-hosted x86_64-w64-mingw32 cross compiler I see TLS is not enabled:

/* Define to 1 if the target supports thread-local storage. */
/* #undef _GLIBCXX_HAVE_TLS */




On Mon, 19 Feb 2018 at 08:59, Hugo Beauzée-Luyssen  wrote:
>
> libstdc++-v3: Check for TLS support on mingw
>
> 2018-02-16  Hugo Beauzée-Luyssen  
>
> * crossconfig.m4: Check for TLS support on mignw
> * configure: regenerate
>
> Index: libstdc++-v3/crossconfig.m4
> ===
> --- libstdc++-v3/crossconfig.m4 (revision 257730)
> +++ libstdc++-v3/crossconfig.m4 (working copy)
> @@ -197,6 +197,7 @@ case "${host}" in
>  GLIBCXX_CHECK_LINKER_FEATURES
>  GLIBCXX_CHECK_MATH_SUPPORT
>  GLIBCXX_CHECK_STDLIB_SUPPORT
> +GCC_CHECK_TLS
>  ;;
>*-netbsd*)
>  SECTION_FLAGS='-ffunction-sections -fdata-sections'


  1   2   >