RE: [llvm-dev] [PATCH 0/2] Initial support for AVX512FP16

2021-07-14 Thread Wang, Pengfei via Gcc-patches
It seems Clang doesn't support -fexcess-precision=xxx:
https://github.com/llvm/llvm-project/blob/main/clang/test/Driver/clang_f_opts.c#L403

Thanks
Pengfei

-Original Message-
From: Hongtao Liu  
Sent: Thursday, July 15, 2021 2:35 PM
To: Wang, Pengfei 
Cc: Craig Topper ; Jakub Jelinek ; 
Liu, Hongtao ; gcc-patches@gcc.gnu.org; Joseph Myers 

Subject: Re: [llvm-dev] [PATCH 0/2] Initial support for AVX512FP16

On Thu, Jul 15, 2021 at 10:07 AM Wang, Pengfei  wrote:
>
> Clang for AArch64 promotes each individual operation and rounds immediately 
> afterwards. https://godbolt.org/z/qzGfv6nvo note the fcvts between the two 
> fadd operations. It's implemented in the LLVM backend where we can't see what 
> was originally a single expression.
>
>
>
> Yes, but this is not consistent with Clang document. I think we should ask 
> Clang FE to do the promotion and truncation.
>
>
>
> Thanks
>
> Pengfei
>
>
>
> From: llvm-dev  On Behalf Of Craig 
> Topper via llvm-dev
> Sent: Wednesday, July 14, 2021 11:32 PM
> To: Hongtao Liu 
> Cc: Jakub Jelinek ; llvm-dev 
> ; Liu, Hongtao ; 
> gcc-patches@gcc.gnu.org; Joseph Myers 
> Subject: Re: [llvm-dev] [PATCH 0/2] Initial support for AVX512FP16
>
>
>
> On Wed, Jul 14, 2021 at 12:45 AM Hongtao Liu via llvm-dev 
>  wrote:
>
> > >
> > Set excess_precision_type to FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 to 
> > round after each operation could keep semantics right.
> > And I'll document the behavior difference between soft-fp and
> > AVX512FP16 instruction for exceptions.
> I got some feedback from my colleague who's working on supporting
> _Float16 for llvm.
> The LLVM side wants to set  FLT_EVAL_METHOD_PROMOTE_TO_FLOAT for 
> soft-fp so that codes can be more efficient.
> i.e.
> _Float16 a, b, c, d;
> d = a + b + c;
>
> would be transformed to
> float tmp, tmp1, a1, b1, c1;
> a1 = (float) a;
> b1 = (float) b;
> c1 = (float) c;
> tmp = a1 + b1;
> tmp1 = tmp + c1;
> d = (_Float16) tmp;
>
> so there's only 1 truncation in the end.
>
> if users want to round back after every operation. codes should be 
> explicitly written as
> _Float16 a, b, c, d, e;
> e = a + b;
> d = e + c;
>
> That's what Clang does, quote from [1]
>  _Float16 arithmetic will be performed using native half-precision 
> support when available on the target (e.g. on ARMv8.2a); otherwise it 
> will be performed at a higher precision (currently always float) and 
> then truncated down to _Float16. Note that C and C++ allow 
> intermediate floating-point operands of an expression to be computed 
> with greater precision than is expressible in their type, so Clang may 
> avoid intermediate truncations in certain cases; this may lead to 
> results that are inconsistent with native arithmetic.
>
>
>
> Clang for AArch64 promotes each individual operation and rounds immediately 
> afterwards. https://godbolt.org/z/qzGfv6nvo note the fcvts between the two 
> fadd operations. It's implemented in the LLVM backend where we can't see what 
> was originally a single expression.
>
>
When i'm reading option documents for excess-precision from 
https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html

-fexcess-precision=style

This option allows further control over excess precision on machines where 
floating-point operations occur in a format with more precision or range than 
the IEEE standard and interchange floating-point types.
By default, -fexcess-precision=fast is in effect; this means that operations 
may be carried out in a wider precision than the types specified in the source 
if that would result in faster code, and it is unpredictable when rounding to 
the types specified in the source code takes place. When compiling C, if 
-fexcess-precision=standard is specified then excess precision follows the 
rules specified in ISO C99; in particular, both casts and assignments cause 
values to be rounded to their semantic types (whereas -ffloat-store only 
affects assignments). This option is enabled by default for C if a strict 
conformance option such as -std=c99 is used. -ffast-math enables 
-fexcess-precision=fast by default regardless of whether a strict conformance 
option is used.

For -fexcess-precision=fast,
 we should set flt_eval_mathond to FLT_EVAL_METHOD_PROMOTE_TO_FLOAT for 
soft-fp, and FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 for AVX512FP16

For  -fexcess-precision=standard
set FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 when TARGET_SSE2? so for soft-fp it will 
round back after every operation?
>
>
> and so does arm gcc
> quote from arm.c
>
> /* We can calculate either in 16-bit range and precision or
>32-bit range and precision.  Make that decision based on whether
>we have native support for the ARMv8.2-A 16-bit floating-point
>instructions or not.  */
> return (TARGET_VFP_FP16INST
> ? FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16
> : FLT_EVAL_METHOD_PROMOTE_TO_FLOAT);
>
>
> [1]https://clang.llvm.org/docs/LanguageExtensions.html
> > > --
> > > Joseph S. Myers
> > > jos...@codesourcery.com
> >
> >
> >
> > --
> > BR,
> >

Re: [llvm-dev] [PATCH 0/2] Initial support for AVX512FP16

2021-07-14 Thread Hongtao Liu via Gcc-patches
On Thu, Jul 15, 2021 at 10:07 AM Wang, Pengfei  wrote:
>
> Clang for AArch64 promotes each individual operation and rounds immediately 
> afterwards. https://godbolt.org/z/qzGfv6nvo note the fcvts between the two 
> fadd operations. It's implemented in the LLVM backend where we can't see what 
> was originally a single expression.
>
>
>
> Yes, but this is not consistent with Clang document. I think we should ask 
> Clang FE to do the promotion and truncation.
>
>
>
> Thanks
>
> Pengfei
>
>
>
> From: llvm-dev  On Behalf Of Craig Topper 
> via llvm-dev
> Sent: Wednesday, July 14, 2021 11:32 PM
> To: Hongtao Liu 
> Cc: Jakub Jelinek ; llvm-dev ; 
> Liu, Hongtao ; gcc-patches@gcc.gnu.org; Joseph Myers 
> 
> Subject: Re: [llvm-dev] [PATCH 0/2] Initial support for AVX512FP16
>
>
>
> On Wed, Jul 14, 2021 at 12:45 AM Hongtao Liu via llvm-dev 
>  wrote:
>
> > >
> > Set excess_precision_type to FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 to
> > round after each operation could keep semantics right.
> > And I'll document the behavior difference between soft-fp and
> > AVX512FP16 instruction for exceptions.
> I got some feedback from my colleague who's working on supporting
> _Float16 for llvm.
> The LLVM side wants to set  FLT_EVAL_METHOD_PROMOTE_TO_FLOAT for
> soft-fp so that codes can be more efficient.
> i.e.
> _Float16 a, b, c, d;
> d = a + b + c;
>
> would be transformed to
> float tmp, tmp1, a1, b1, c1;
> a1 = (float) a;
> b1 = (float) b;
> c1 = (float) c;
> tmp = a1 + b1;
> tmp1 = tmp + c1;
> d = (_Float16) tmp;
>
> so there's only 1 truncation in the end.
>
> if users want to round back after every operation. codes should be
> explicitly written as
> _Float16 a, b, c, d, e;
> e = a + b;
> d = e + c;
>
> That's what Clang does, quote from [1]
>  _Float16 arithmetic will be performed using native half-precision
> support when available on the target (e.g. on ARMv8.2a); otherwise it
> will be performed at a higher precision (currently always float) and
> then truncated down to _Float16. Note that C and C++ allow
> intermediate floating-point operands of an expression to be computed
> with greater precision than is expressible in their type, so Clang may
> avoid intermediate truncations in certain cases; this may lead to
> results that are inconsistent with native arithmetic.
>
>
>
> Clang for AArch64 promotes each individual operation and rounds immediately 
> afterwards. https://godbolt.org/z/qzGfv6nvo note the fcvts between the two 
> fadd operations. It's implemented in the LLVM backend where we can't see what 
> was originally a single expression.
>
>
When i'm reading option documents for excess-precision from
https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html

-fexcess-precision=style

This option allows further control over excess precision on machines
where floating-point operations occur in a format with more precision
or range than the IEEE standard and interchange floating-point types.
By default, -fexcess-precision=fast is in effect; this means that
operations may be carried out in a wider precision than the types
specified in the source if that would result in faster code, and it is
unpredictable when rounding to the types specified in the source code
takes place. When compiling C, if -fexcess-precision=standard is
specified then excess precision follows the rules specified in ISO
C99; in particular, both casts and assignments cause values to be
rounded to their semantic types (whereas -ffloat-store only affects
assignments). This option is enabled by default for C if a strict
conformance option such as -std=c99 is used. -ffast-math enables
-fexcess-precision=fast by default regardless of whether a strict
conformance option is used.

For -fexcess-precision=fast,
 we should set flt_eval_mathond to FLT_EVAL_METHOD_PROMOTE_TO_FLOAT
for soft-fp, and FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 for AVX512FP16

For  -fexcess-precision=standard
set FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 when TARGET_SSE2? so for
soft-fp it will round back after every operation?
>
>
> and so does arm gcc
> quote from arm.c
>
> /* We can calculate either in 16-bit range and precision or
>32-bit range and precision.  Make that decision based on whether
>we have native support for the ARMv8.2-A 16-bit floating-point
>instructions or not.  */
> return (TARGET_VFP_FP16INST
> ? FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16
> : FLT_EVAL_METHOD_PROMOTE_TO_FLOAT);
>
>
> [1]https://clang.llvm.org/docs/LanguageExtensions.html
> > > --
> > > Joseph S. Myers
> > > jos...@codesourcery.com
> >
> >
> >
> > --
> > BR,
> > Hongtao
>
>
>
> --
> BR,
> Hongtao
> ___
> LLVM Developers mailing list
> llvm-...@lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev



-- 
BR,
Hongtao


Re: [PATCH V2] Use preferred mode for doloop iv [PR61837].

2021-07-14 Thread Richard Biener
On Tue, 13 Jul 2021, Jiufu Guo wrote:

> Major changes from v1:
> * Add target hook to query preferred doloop mode.
> * Recompute doloop iv base from niter under preferred mode.
> 
> Currently, doloop.xx variable is using the type as niter which may shorter
> than word size.  For some cases, it would be better to use word size type.
> For example, on 64bit system, to access 32bit value, subreg maybe used.
> Then using 64bit type maybe better for niter if it can be present in
> both 32bit and 64bit.
> 
> This patch add target hook for querg perferred mode for doloop iv.
> And update doloop iv mode accordingly.
> 
> Bootstrap and regtest pass on powerpc64le, is this ok for trunk?
> 
> BR.
> Jiufu
> 
> gcc/ChangeLog:
> 
> 2021-07-13  Jiufu Guo  
> 
>   PR target/61837
>   * config/rs6000/rs6000.c (TARGET_PREFERRED_DOLOOP_MODE): New hook.
>   (rs6000_preferred_doloop_mode): New hook.
>   * doc/tm.texi: Regenerated.
>   * doc/tm.texi.in: Add hook preferred_doloop_mode.
>   * target.def (preferred_doloop_mode): New hook.
>   * targhooks.c (default_preferred_doloop_mode): New hook.
>   * targhooks.h (default_preferred_doloop_mode): New hook.
>   * tree-ssa-loop-ivopts.c (compute_doloop_base_on_mode): New function.
>   (add_iv_candidate_for_doloop): Call targetm.preferred_doloop_mode
>   and compute_doloop_base_on_mode.
> 
> gcc/testsuite/ChangeLog:
> 
> 2021-07-13  Jiufu Guo  
> 
>   PR target/61837
>   * gcc.target/powerpc/pr61837.c: New test.
> ---
>  gcc/config/rs6000/rs6000.c |  9 +++
>  gcc/doc/tm.texi|  4 ++
>  gcc/doc/tm.texi.in |  2 +
>  gcc/target.def |  7 +++
>  gcc/targhooks.c|  8 +++
>  gcc/targhooks.h|  2 +
>  gcc/testsuite/gcc.target/powerpc/pr61837.c | 16 ++
>  gcc/tree-ssa-loop-ivopts.c | 66 +-
>  8 files changed, 112 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr61837.c
> 
> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index 9a5db63d0ef..444f3c49288 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -1700,6 +1700,9 @@ static const struct attribute_spec 
> rs6000_attribute_table[] =
>  #undef TARGET_DOLOOP_COST_FOR_ADDRESS
>  #define TARGET_DOLOOP_COST_FOR_ADDRESS 10
>  
> +#undef TARGET_PREFERRED_DOLOOP_MODE
> +#define TARGET_PREFERRED_DOLOOP_MODE rs6000_preferred_doloop_mode
> +
>  #undef TARGET_ATOMIC_ASSIGN_EXPAND_FENV
>  #define TARGET_ATOMIC_ASSIGN_EXPAND_FENV rs6000_atomic_assign_expand_fenv
>  
> @@ -27867,6 +27870,12 @@ rs6000_predict_doloop_p (struct loop *loop)
>return true;
>  }
>  
> +static machine_mode
> +rs6000_preferred_doloop_mode (machine_mode)
> +{
> +  return word_mode;
> +}
> +
>  /* Implement TARGET_CANNOT_SUBSTITUTE_MEM_EQUIV_P.  */
>  
>  static bool
> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
> index 2a41ae5fba1..3f5881220f8 100644
> --- a/gcc/doc/tm.texi
> +++ b/gcc/doc/tm.texi
> @@ -11984,6 +11984,10 @@ By default, the RTL loop optimizer does not use a 
> present doloop pattern for
>  loops containing function calls or branch on table instructions.
>  @end deftypefn
>  
> +@deftypefn {Target Hook} machine_mode TARGET_PREFERRED_DOLOOP_MODE 
> (machine_mode @var{mode})
> +This hook returns a more preferred mode or the @var{mode} itself.
> +@end deftypefn
> +
>  @deftypefn {Target Hook} bool TARGET_LEGITIMATE_COMBINED_INSN (rtx_insn 
> *@var{insn})
>  Take an instruction in @var{insn} and return @code{false} if the instruction
>  is not appropriate as a combination of two or more instructions.  The
> diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
> index f881cdabe9e..38215149a92 100644
> --- a/gcc/doc/tm.texi.in
> +++ b/gcc/doc/tm.texi.in
> @@ -7917,6 +7917,8 @@ to by @var{ce_info}.
>  
>  @hook TARGET_INVALID_WITHIN_DOLOOP
>  
> +@hook TARGET_PREFERRED_DOLOOP_MODE
> +
>  @hook TARGET_LEGITIMATE_COMBINED_INSN
>  
>  @hook TARGET_CAN_FOLLOW_JUMP
> diff --git a/gcc/target.def b/gcc/target.def
> index c009671c583..91a96150e50 100644
> --- a/gcc/target.def
> +++ b/gcc/target.def
> @@ -4454,6 +4454,13 @@ loops containing function calls or branch on table 
> instructions.",
>   const char *, (const rtx_insn *insn),
>   default_invalid_within_doloop)
>  
> +DEFHOOK
> +(preferred_doloop_mode,
> + "This hook returns a more preferred mode or the @var{mode} itself.",
> + machine_mode,
> + (machine_mode mode),
> + default_preferred_doloop_mode)
> +
>  /* Returns true for a legitimate combined insn.  */
>  DEFHOOK
>  (legitimate_combined_insn,
> diff --git a/gcc/targhooks.c b/gcc/targhooks.c
> index 44a1facedcf..eb5190910dc 100644
> --- a/gcc/targhooks.c
> +++ b/gcc/targhooks.c
> @@ -660,6 +660,14 @@ default_predict_doloop_p (class loop *loop 
> ATTRIBUTE_UNUSED)
>return false;
>  }
>  
> +/* By default, just use the input 

Re: GCC 11.1.1 Status Report (2021-07-06)

2021-07-14 Thread Richard Biener
On Wed, 14 Jul 2021, H.J. Lu wrote:

> On Tue, Jul 6, 2021 at 12:00 AM Richard Biener  wrote:
> >
> >
> > Status
> > ==
> >
> > The GCC 11 branch is open for regression and documentation fixes.
> > It's time for a GCC 11.2 release and we are aiming for a release
> > candidate in about two weeks which would result in the GCC 11.2
> > release about three months after GCC 11.1.
> >
> > Two weeks give you ample time to care for important regressions
> > and backporting of fixes.  Please also look out for issues on
> > non-primary/secondary targets.
> >
> >
> > Quality Data
> > 
> >
> > Priority  #   Change from last report
> > ---   ---
> > P1
> > P2  272   +  20
> > P3   94   +  56
> > P4  210   +   2
> > P5   24   -   1
> > ---   ---
> > Total P1-P3 366   +  76
> > Total   600   +  79
> >
> >
> > Previous Report
> > ===
> >
> > https://gcc.gnu.org/pipermail/gcc/2021-April/235923.html
> 
> Hi,
> 
> I'd like to backport this regression fix:
> 
> https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=cc11b924bfe7752edbba052ca71653f46a60887a
> 
> to GCC 11 for
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101395

OK.

Richard.


[PATCH] add myself to DCO section

2021-07-14 Thread Trevor Saunders
fyi, in case I forget to sign off, all my commits this year forwards are under
the DCO (as I do not have an assignment).

Trev

Signed-off-by: Trevor Saunders 
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 48cfa3fda1d..7cb63e5f62b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -705,3 +705,4 @@ information.
  Jeff Law  
  Jeff Law  
  Gaius Mulley  
+ Trevor Saunders   
-- 
2.20.1



Re: [PATCH V2] Use preferred mode for doloop iv [PR61837].

2021-07-14 Thread guojiufu via Gcc-patches

On 2021-07-15 02:04, Segher Boessenkool wrote:

Hi!

On Wed, Jul 14, 2021 at 06:26:28PM +0800, guojiufu wrote:

PR target/61837


Wrong PR number?


There is a patch optimize "add -1; zero_ext; add +1" to "zero_ext" 
already.
Having this patch would help to avoid the left 'zero_ext', so, I reuse 
this

PR number.




+@deftypefn {Target Hook} machine_mode TARGET_PREFERRED_DOLOOP_MODE
(machine_mode @var{mode})
+This hook takes a @var{mode} which is the original mode of doloop IV.
+And if the target prefers other mode for doloop IV, this hook returns
the
+preferred mode.
+For example, on 64bit target, DImode may be preferred than SImode.
+This hook could return the original mode itself if the target prefer 
to

+keep the original mode.
+The origianl mode and return mode should be MODE_INT.
+@end deftypefn


(Typo, "original").  That has all the right contents, but needs someone
who is better at English than me to look at it / improve it.

+/* { dg-final {scan-rtl-dump-not "zero_extend.*doloop" 
"loop2_doloop"}

} */
+/* { dg-final {scan-rtl-dump-not "reg:SI.*doloop" "loop2_doloop" {
target lp64 } } } */


(Don't use format=flowed in your mails, or certainly not in those
containing patches -- it was rewrapped).



Oh, thanks for point out this!

If you use .* in scan REs, you should be aware that "." matches 
newlines

by default, so you can match "reg:SI" on one line and "doloop" on a
later one, in that second one.

You can write

/* { dg-final {scan-rtl-dump-not {(?p)reg:SI.*doloop} "loop2_doloop" {
target lp64 } } } */

(note: {} are much more convenient around most REs, you need a lot of
escaping without it) to get "partial newline-sensitive matching", which
is usually what you want (see "man re_syntax" for the details).


Thanks so much!  This helps me a lot about writing test cases, 
especially

on how to scan-xxx re in test case!




The generic changes look fine to me (but what do I know about Gimple!)
The rs6000 changes are fine if the rest is approved (and see the
testcase comments).  Thanks!


Thanks again!

BR,
Jiufu




Segher


[pushed] c++: fix tree_contains_struct for C++ types [PR101095]

2021-07-14 Thread Jason Merrill via Gcc-patches
Many of the types from cp-tree.def were only marked as having tree_common,
when actually most of them have type_non_common.  This broke
g++.dg/modules/xtreme-header-2, as the modules code relies on
tree_contains_struct to know what bits it needs to stream.

We don't seem to use type_non_common for TYPE_ARGUMENT_PACK, so I bumped it
down to TS_TYPE_COMMON.  I tried doing the same in cp_tree_size, but that
breaks without more extensive changes to tree_node_structure.

Why do we need the init_ts function anyway?  It seems redundant with
tree_node_structure.

Tested x86_64-pc-linux-gnu, applying to trunk.

PR c++/101095

gcc/cp/ChangeLog:

* cp-objcp-common.c (cp_common_init_ts): Mark types as types.
(cp_tree_size): Remove redundant entries.
---
 gcc/cp/cp-objcp-common.c | 24 ++--
 1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/gcc/cp/cp-objcp-common.c b/gcc/cp/cp-objcp-common.c
index 46b2248574c..ee255732d5a 100644
--- a/gcc/cp/cp-objcp-common.c
+++ b/gcc/cp/cp-objcp-common.c
@@ -72,10 +72,13 @@ cp_tree_size (enum tree_code code)
 case DEFERRED_NOEXCEPT:return sizeof (tree_deferred_noexcept);
 case OVERLOAD: return sizeof (tree_overload);
 case STATIC_ASSERT: return sizeof (tree_static_assert);
-case TYPE_ARGUMENT_PACK:
-case TYPE_PACK_EXPANSION:  return sizeof (tree_type_non_common);
-case NONTYPE_ARGUMENT_PACK:
-case EXPR_PACK_EXPANSION:  return sizeof (tree_exp);
+#if 0
+  /* This would match cp_common_init_ts, but breaks GC because
+tree_node_structure_for_code returns TS_TYPE_NON_COMMON for all
+types.  */
+case UNBOUND_CLASS_TEMPLATE:
+case TYPE_ARGUMENT_PACK:   return sizeof (tree_type_common);
+#endif
 case ARGUMENT_PACK_SELECT: return sizeof (tree_argument_pack_select);
 case TRAIT_EXPR:   return sizeof (tree_trait_expr);
 case LAMBDA_EXPR:   return sizeof (tree_lambda_expr);
@@ -456,13 +459,8 @@ cp_common_init_ts (void)
 
   /* Random new trees.  */
   MARK_TS_COMMON (BASELINK);
-  MARK_TS_COMMON (DECLTYPE_TYPE);
   MARK_TS_COMMON (OVERLOAD);
   MARK_TS_COMMON (TEMPLATE_PARM_INDEX);
-  MARK_TS_COMMON (TYPENAME_TYPE);
-  MARK_TS_COMMON (TYPEOF_TYPE);
-  MARK_TS_COMMON (UNBOUND_CLASS_TEMPLATE);
-  MARK_TS_COMMON (UNDERLYING_TYPE);
 
   /* New decls.  */
   MARK_TS_DECL_COMMON (TEMPLATE_DECL);
@@ -472,10 +470,16 @@ cp_common_init_ts (void)
   MARK_TS_DECL_NON_COMMON (USING_DECL);
 
   /* New Types.  */
+  MARK_TS_TYPE_COMMON (UNBOUND_CLASS_TEMPLATE);
+  MARK_TS_TYPE_COMMON (TYPE_ARGUMENT_PACK);
+
+  MARK_TS_TYPE_NON_COMMON (DECLTYPE_TYPE);
+  MARK_TS_TYPE_NON_COMMON (TYPENAME_TYPE);
+  MARK_TS_TYPE_NON_COMMON (TYPEOF_TYPE);
+  MARK_TS_TYPE_NON_COMMON (UNDERLYING_TYPE);
   MARK_TS_TYPE_NON_COMMON (BOUND_TEMPLATE_TEMPLATE_PARM);
   MARK_TS_TYPE_NON_COMMON (TEMPLATE_TEMPLATE_PARM);
   MARK_TS_TYPE_NON_COMMON (TEMPLATE_TYPE_PARM);
-  MARK_TS_TYPE_NON_COMMON (TYPE_ARGUMENT_PACK);
   MARK_TS_TYPE_NON_COMMON (TYPE_PACK_EXPANSION);
 
   /* Statements.  */

base-commit: c4fee1c646d52a9001a53fa0d4072db86b9be791
-- 
2.27.0



Re: [PATCH 1/4] force decls to be allocated through build_decl to initialize them

2021-07-14 Thread Trevor Saunders
On Wed, Jul 14, 2021 at 01:27:54PM +0200, Richard Biener wrote:
> On Wed, Jul 14, 2021 at 10:20 AM Trevor Saunders  
> wrote:
> >
> > prior to this commit all calls to build_decl used input_location, even if
> > temporarily  until build_decl reset the location to something else that it 
> > was
> > told was the proper location.  To avoid using the global we need the caller 
> > to
> > pass in the location it wants, however that's not possible with make_node 
> > since
> > it makes other types of nodes.  So we force all callers who wish to make a 
> > decl
> > to go through build_decl which already takes a location argument.  To avoid
> > changing behavior this just explicitly passes in input_location to 
> > build_decl
> > for callers of make_node that create a decl, however it would seem in many 
> > of
> > these cases that the location of the decl being coppied might be a better
> > location.
> >
> > bootstrapped and regtested on x86_64-linux-gnu, ok?
> 
> I think all eventually DECL_ARTIFICIAL decls should better use
> UNKNOWN_LOCATION instead of input_location.

You'd know if that might break something better than me, but that seems
sensible in principal.  That said, I would like to incrementally do one
thing at a time, rather than change make_node to use unknown_location,
and set the location to something else all at once, but I suppose I
could first change some callers to be build_decl (unknown_location, ...)
and then come back to changing make_node when there's fewer callers to
reason about if that's preferable.

> I'm not sure if I like the (transitional) extra arg to make_node, I suppose
> we could hide make_node by declaring it in tree-raw.h or so or by
> guarding the decl with NEED_MAKE_NODE.  There's nothing inherently
> wrong with calling make_node.  So what I mean with transitional is that
> with this change we should simply set the location to UNKNOWN_LOCATION
> (aka zero, which it already is), not input_location, in make_node.

I sort of think it makes sense to move all the tree class specific bits
out of make_node to functions for that specific type of tree, but it is
mostly unrelated.  One advantage of that is that it saves pointless
initialization in the module / lto streamer that gets over written with
the streamed values.  However having used the argument to find all the
places that create decls, and having updated them, while the argument
and asserts do  prevent leaving the location uninitialized by mistake,
I'd be fine with dropping that part and just updating all the make_node
callers to use build_decl.

thanks

Trev

> 
> Richard.
> 
> > Trev
> >
> > gcc/ChangeLog:
> >
> > * cfgexpand.c (avoid_deep_ter_for_debug): Call build_decl not
> > make_node.
> > (expand_gimple_basic_block): Likewise.
> > * ipa-param-manipulation.c (ipa_param_adjustments::modify_call):
> > * Likewise.
> > (ipa_param_body_adjustments::reset_debug_stmts): Likewise.
> > * omp-simd-clone.c (ipa_simd_modify_stmt_ops): Likewise.
> > * stor-layout.c (start_bitfield_representative): Likewise.
> > * tree-inline.c (remap_ssa_name): Likewise.
> > (tree_function_versioning): Likewise.
> > * tree-into-ssa.c (rewrite_debug_stmt_uses): Likewise.
> > * tree-nested.c (lookup_field_for_decl): Likewise.
> > (get_chain_field): Likewise.
> > (create_field_for_decl): Likewise.
> > (get_nl_goto_field): Likewise.
> > (finalize_nesting_tree_1): Likewise.
> > * tree-ssa-ccp.c (optimize_atomic_bit_test_and): Likewise.
> > * tree-ssa-loop-ivopts.c (remove_unused_ivs): Likewise.
> > * tree-ssa-phiopt.c (spaceship_replacement): Likewise.
> > * tree-ssa-reassoc.c (make_new_ssa_for_def): Likewise.
> > * tree-ssa.c (insert_debug_temp_for_var_def): Likewise.
> > * tree-streamer-in.c (streamer_alloc_tree): Adjust.
> > * tree.c (make_node): Add argument to specify the caller.
> > (build_decl): Move initialization from make_node.
> > * tree.h (enum make_node_caller): new enum.
> > (make_node): Adjust prototype.
> > * varasm.c (make_debug_expr_from_rtl): call build_decl.
> >
> > gcc/cp/ChangeLog:
> >
> > * constraint.cc (build_type_constraint): Call build_decl not 
> > make_node.
> > * cp-gimplify.c (cp_genericize_r): Likewise.
> > * parser.c (cp_parser_introduction_list): Likewise.
> > * module.cc (trees_in::start): Adjust.
> >
> > gcc/fortran/ChangeLog:
> >
> > * trans-decl.c (generate_namelist_decl): Call build_decl not 
> > make_node.
> > * trans-types.c (gfc_get_array_descr_info): Likewise.
> >
> > gcc/objc/ChangeLog:
> >
> > * objc-act.c (objc_add_property_declaration): Call build_decl not
> > make_node.
> > (maybe_make_artificial_property_decl): Likewise.
> > (objc_build_keyword_decl): Likewise.
> > (build_method_decl): Likewise.
> > ---
> >

RE: [llvm-dev] [PATCH 0/2] Initial support for AVX512FP16

2021-07-14 Thread Wang, Pengfei via Gcc-patches
  *   Clang for AArch64 promotes each individual operation and rounds 
immediately afterwards. https://godbolt.org/z/qzGfv6nvo note the fcvts between 
the two fadd operations. It's implemented in the LLVM backend where we can't 
see what was originally a single expression.

Yes, but this is not consistent with Clang document. I think we should ask 
Clang FE to do the promotion and truncation.

Thanks
Pengfei

From: llvm-dev  On Behalf Of Craig Topper via 
llvm-dev
Sent: Wednesday, July 14, 2021 11:32 PM
To: Hongtao Liu 
Cc: Jakub Jelinek ; llvm-dev ; Liu, 
Hongtao ; gcc-patches@gcc.gnu.org; Joseph Myers 

Subject: Re: [llvm-dev] [PATCH 0/2] Initial support for AVX512FP16

On Wed, Jul 14, 2021 at 12:45 AM Hongtao Liu via llvm-dev 
mailto:llvm-...@lists.llvm.org>> wrote:
> >
> Set excess_precision_type to FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 to
> round after each operation could keep semantics right.
> And I'll document the behavior difference between soft-fp and
> AVX512FP16 instruction for exceptions.
I got some feedback from my colleague who's working on supporting
_Float16 for llvm.
The LLVM side wants to set  FLT_EVAL_METHOD_PROMOTE_TO_FLOAT for
soft-fp so that codes can be more efficient.
i.e.
_Float16 a, b, c, d;
d = a + b + c;

would be transformed to
float tmp, tmp1, a1, b1, c1;
a1 = (float) a;
b1 = (float) b;
c1 = (float) c;
tmp = a1 + b1;
tmp1 = tmp + c1;
d = (_Float16) tmp;

so there's only 1 truncation in the end.

if users want to round back after every operation. codes should be
explicitly written as
_Float16 a, b, c, d, e;
e = a + b;
d = e + c;

That's what Clang does, quote from [1]
 _Float16 arithmetic will be performed using native half-precision
support when available on the target (e.g. on ARMv8.2a); otherwise it
will be performed at a higher precision (currently always float) and
then truncated down to _Float16. Note that C and C++ allow
intermediate floating-point operands of an expression to be computed
with greater precision than is expressible in their type, so Clang may
avoid intermediate truncations in certain cases; this may lead to
results that are inconsistent with native arithmetic.

Clang for AArch64 promotes each individual operation and rounds immediately 
afterwards. https://godbolt.org/z/qzGfv6nvo note the fcvts between the two fadd 
operations. It's implemented in the LLVM backend where we can't see what was 
originally a single expression.


and so does arm gcc
quote from arm.c

/* We can calculate either in 16-bit range and precision or
   32-bit range and precision.  Make that decision based on whether
   we have native support for the ARMv8.2-A 16-bit floating-point
   instructions or not.  */
return (TARGET_VFP_FP16INST
? FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16
: FLT_EVAL_METHOD_PROMOTE_TO_FLOAT);


[1]https://clang.llvm.org/docs/LanguageExtensions.html
> > --
> > Joseph S. Myers
> > jos...@codesourcery.com
>
>
>
> --
> BR,
> Hongtao



--
BR,
Hongtao
___
LLVM Developers mailing list
llvm-...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev


PING^2 [PATCH v2] combine: Tweak the condition of last_set invalidation

2021-07-14 Thread Kewen.Lin via Gcc-patches
Hi,

Gentle ping this:

https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572555.html

BR,
Kewen

on 2021/6/28 下午3:00, Kewen.Lin via Gcc-patches wrote:
> Hi!
> 
> I'd like to gentle ping this:
> 
> https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572555.html
> 
> 
> BR,
> Kewen
> 
> on 2021/6/11 下午9:16, Kewen.Lin via Gcc-patches wrote:
>> Hi Segher,
>>
>> Thanks for the review!
>>
>> on 2021/6/10 上午4:17, Segher Boessenkool wrote:
>>> Hi!
>>>
>>> On Wed, Dec 16, 2020 at 04:49:49PM +0800, Kewen.Lin wrote:
 Currently we have the check:

   if (!insn
  || (value && rsp->last_set_table_tick >= label_tick_ebb_start))
rsp->last_set_invalid = 1; 

 which means if we want to record some value for some reg and
 this reg got refered before in a valid scope,
>>>
>>> If we already know it is *set* in this same extended basic block.
>>> Possibly by the same instruction btw.
>>>
 we invalidate the
 set of reg (last_set_invalid to 1).  It avoids to find the wrong
 set for one reg reference, such as the case like:

... op regX  // this regX could find wrong last_set below
regX = ...   // if we think this set is valid
... op regX
>>>
>>> Yup, exactly.
>>>
 But because of retry's existence, the last_set_table_tick could
 be set by some later reference insns, but we see it's set due
 to retry on the set (for that reg) insn again, such as:

insn 1
insn 2

regX = ... --> (a)
... op regX--> (b)

insn 3

// assume all in the same BB.

 Assuming we combine 1, 2 -> 3 sucessfully and replace them as two
 (3 insns -> 2 insns),
>>>
>>> This will delete insn 1 and write the combined result to insns 2 and 3.
>>>
 retrying from insn1 or insn2 again:
>>>
>>> Always 2, but your point remains valid.
>>>
 it will scan insn (a) again, the below condition holds for regX:

   (value && rsp->last_set_table_tick >= label_tick_ebb_start)

 it will mark this set as invalid set.  But actually the
 last_set_table_tick here is set by insn (b) before retrying, so it
 should be safe to be taken as valid set.
>>>
>>> Yup.
>>>
 This proposal is to check whether the last_set_table safely happens
 after the current set, make the set still valid if so.
>>>
 Full SPEC2017 building shows this patch gets more sucessful combines
 from 1902208 to 1902243 (trivial though).
>>>
>>> Do you have some example, or maybe even a testcase?  :-)
>>>
>>
>> Sorry for the late reply, it took some time to get one reduced case.
>>
>> typedef struct SA *pa_t;
>>
>> struct SC {
>>   int h;
>>   pa_t elem[];
>> };
>>
>> struct SD {
>>   struct SC *e;
>> };
>>
>> struct SA {
>>   struct {
>> struct SD f[1];
>>   } g;
>> };
>>
>> void foo(pa_t *k, char **m) {
>>   int l, i;
>>   pa_t a;
>>   l = (int)a->g.f[5].e;
>>   i = 0;
>>   for (; i < l; i++) {
>> k[i] = a->g.f[5].e->elem[i];
>> m[i] = "";
>>   }
>> }
>>
>> Baseline is r12-0 and the option is "-O3 -mcpu=power9 -fno-strict-aliasing",
>> with this patch, the generated assembly can save two rlwinm s.
>>
 +  /* Record the luid of the insn whose expression involving register n.  
 */
 +
 +  int last_set_table_luid;
>>>
>>> "Record the luid of the insn for which last_set_table_tick was set",
>>> right?
>>>
>>
>> But it can be updated later to one smaller luid, how about the wording like:
>>
>>
>> +  /* Record the luid of the insn which uses register n, the insn should
>> + be the first one using register n in that block of the insn which
>> + last_set_table_tick was set for.  */
>>
>>
 -static void update_table_tick (rtx);
 +static void update_table_tick (rtx, int);
>>>
>>> Please remove this declaration instead, the function is not used until
>>> after its actual definition :-)
>>>
>>
>> Done.
>>
 @@ -13243,7 +13247,21 @@ update_table_tick (rtx x)
for (r = regno; r < endregno; r++)
{
  reg_stat_type *rsp = ®_stat[r];
 -rsp->last_set_table_tick = label_tick;
 +if (rsp->last_set_table_tick >= label_tick_ebb_start)
 +  {
 +/* Later references should not have lower ticks.  */
 +gcc_assert (label_tick >= rsp->last_set_table_tick);
>>>
>>> This should be obvious, but checking it won't hurt, okay.
>>>
 +/* Should pick up the lowest luid if the references
 +   are in the same block.  */
 +if (label_tick == rsp->last_set_table_tick
 +&& rsp->last_set_table_luid > insn_luid)
 +  rsp->last_set_table_luid = insn_luid;
>>>
>>> Why?  Is it conservative for the check you will do later?  Please spell
>>> this out, it is crucial!
>>>
>>
>> Since later the combinations involving this insn probably make the
>> register be used in one insn sitting ahead (which has smaller luid than
>> the one which was reco

[r12-2300 Regression] FAIL: gcc.dg/vect/vect-reduc-dot-9.c -flto -ffat-lto-objects execution test on Linux/x86_64

2021-07-14 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

1e0ab1c4ba6159ad7ce71c6cddd5e04d2a636742 is the first bad commit
commit 1e0ab1c4ba6159ad7ce71c6cddd5e04d2a636742
Author: Tamar Christina 
Date:   Wed Jul 14 15:21:40 2021 +0100

middle-end: Add tests middle end generic tests for sign differing 
dotproduct.

caused

FAIL: gcc.dg/vect/vect-reduc-dot-10.c execution test
FAIL: gcc.dg/vect/vect-reduc-dot-10.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-reduc-dot-11.c execution test
FAIL: gcc.dg/vect/vect-reduc-dot-11.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-reduc-dot-12.c execution test
FAIL: gcc.dg/vect/vect-reduc-dot-12.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-reduc-dot-13.c execution test
FAIL: gcc.dg/vect/vect-reduc-dot-13.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-reduc-dot-14.c execution test
FAIL: gcc.dg/vect/vect-reduc-dot-14.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-reduc-dot-15.c execution test
FAIL: gcc.dg/vect/vect-reduc-dot-15.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-reduc-dot-16.c execution test
FAIL: gcc.dg/vect/vect-reduc-dot-16.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-reduc-dot-17.c execution test
FAIL: gcc.dg/vect/vect-reduc-dot-17.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-reduc-dot-18.c execution test
FAIL: gcc.dg/vect/vect-reduc-dot-18.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-reduc-dot-22.c -flto -ffat-lto-objects  
scan-tree-dump-not vect "vect_recog_dot_prod_pattern: detected"
FAIL: gcc.dg/vect/vect-reduc-dot-22.c scan-tree-dump-not vect 
"vect_recog_dot_prod_pattern: detected"
FAIL: gcc.dg/vect/vect-reduc-dot-9.c execution test
FAIL: gcc.dg/vect/vect-reduc-dot-9.c -flto -ffat-lto-objects execution test

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-2300/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-reduc-dot-10.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-reduc-dot-10.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-reduc-dot-10.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-reduc-dot-10.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-reduc-dot-11.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-reduc-dot-11.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-reduc-dot-11.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-reduc-dot-11.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-reduc-dot-12.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-reduc-dot-12.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-reduc-dot-12.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-reduc-dot-12.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-reduc-dot-13.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-reduc-dot-13.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-reduc-dot-13.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-reduc-dot-13.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-reduc-dot-14.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-reduc-dot-14.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-reduc-dot-14.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-reduc-dot-14.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-reduc-dot-15.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-reduc-dot-15.c 
--target_boa

PING^3 [PATCH v2] rs6000: Add load density heuristic

2021-07-14 Thread Kewen.Lin via Gcc-patches
Hi,

Gentle ping this:

https://gcc.gnu.org/pipermail/gcc-patches/2021-May/571258.html

BR,
Kewen

on 2021/6/28 下午3:01, Kewen.Lin via Gcc-patches wrote:
> Hi,
> 
> Gentle ping this:
> 
> https://gcc.gnu.org/pipermail/gcc-patches/2021-May/571258.html
> 
> BR,
> Kewen
> 
> on 2021/6/9 上午10:26, Kewen.Lin via Gcc-patches wrote:
>> Hi,
>>
>> Gentle ping this:
>>
>> https://gcc.gnu.org/pipermail/gcc-patches/2021-May/571258.html
>>
>> BR,
>> Kewen
>>
>> on 2021/5/26 上午10:59, Kewen.Lin via Gcc-patches wrote:
>>> Hi,
>>>
>>> This is the updated version of patch to deal with the bwaves_r
>>> degradation due to vector construction fed by strided loads.
>>>
>>> As Richi's comments [1], this follows the similar idea to over
>>> price the vector construction fed by VMAT_ELEMENTWISE or
>>> VMAT_STRIDED_SLP.  Instead of adding the extra cost on vector
>>> construction costing immediately, it firstly records how many
>>> loads and vectorized statements in the given loop, later in
>>> rs6000_density_test (called by finish_cost) it computes the
>>> load density ratio against all vectorized stmts, and check
>>> with the corresponding thresholds DENSITY_LOAD_NUM_THRESHOLD
>>> and DENSITY_LOAD_PCT_THRESHOLD, do the actual extra pricing
>>> if both thresholds are exceeded.
>>>
>>> Note that this new load density heuristic check is based on
>>> some fields in target cost which are updated as needed when
>>> scanning each add_stmt_cost entry, it's independent of the
>>> current function rs6000_density_test which requires to scan
>>> non_vect stmts.  Since it's checking the load stmts count
>>> vs. all vectorized stmts, it's kind of density, so I put
>>> it in function rs6000_density_test.  With the same reason to
>>> keep it independent, I didn't put it as an else arm of the
>>> current existing density threshold check hunk or before this
>>> hunk.
>>>
>>> In the investigation of -1.04% degradation from 526.blender_r
>>> on Power8, I noticed that the extra penalized cost 320 on one
>>> single vector construction with type V16QI is much exaggerated,
>>> which makes the final body cost unreliable, so this patch adds
>>> one maximum bound for the extra penalized cost for each vector
>>> construction statement.
>>>
>>> Bootstrapped/regtested on powerpc64le-linux-gnu P9.
>>>
>>> Full SPEC2017 performance evaluation on Power8/Power9 with
>>> option combinations:
>>>   * -O2 -ftree-vectorize {,-fvect-cost-model=very-cheap} {,-ffast-math}
>>>   * {-O3, -Ofast} {,-funroll-loops}
>>>
>>> bwaves_r degradations on P8/P9 have been fixed, nothing else
>>> remarkable was observed.
>>>
>>> Is it ok for trunk?
>>>
>>> [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570076.html
>>>
>>> BR,
>>> Kewen
>>> -
>>> gcc/ChangeLog:
>>>
>>> * config/rs6000/rs6000.c (struct rs6000_cost_data): New members
>>> nstmts, nloads and extra_ctor_cost.
>>> (rs6000_density_test): Add load density related heuristics and the
>>> checks, do extra costing on vector construction statements if need.
>>> (rs6000_init_cost): Init new members.
>>> (rs6000_update_target_cost_per_stmt): New function.
>>> (rs6000_add_stmt_cost): Factor vect_nonmem hunk out to function
>>> rs6000_update_target_cost_per_stmt and call it.
>>>
>>


[PATCH] Remove legacy external declarations in toplev.h [PR101447]

2021-07-14 Thread ashimida via Gcc-patches



External declarations in ./gcc/toplev.h is no longer used in newest
version of gcc and should be cleaned up to avoid misunderstandings.

gcc/ChangeLog:

* toplev.h (set_random_seed):

---
diff --git a/gcc/toplev.h b/gcc/toplev.h
index 175944c..f543554 100644
--- a/gcc/toplev.h
+++ b/gcc/toplev.h
@@ -94,11 +94,6 @@ extern bool set_src_pwd (const 
char *);

 extern HOST_WIDE_INT get_random_seed (bool);
 extern void set_random_seed (const char *);

-extern unsigned int min_align_loops_log;
-extern unsigned int min_align_jumps_log;
-extern unsigned int min_align_labels_log;
-extern unsigned int min_align_functions_log;
-
 extern void parse_alignment_opts (void);

 extern void initialize_rtl (void);

---
The history FYI:
https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=e6de53356769e13178975c18b4ce019a800ea946;hp=118f2d8bc3e6804996ca2953b86454ec950054bf



[PATCH] correct range of stpcpy result (PR 101397)

2021-07-14 Thread Martin Sebor via Gcc-patches

Access warnings look through calls to the subset of built-ins
that return one of their pointer arguments to find the object
the pointer it points to and its offset.  The computation is
wrong for functions like stpcpy, stpncpy and mempcpy that
return a pointer plus some offset, and leads to a false positive
-Warray-bounds in Glibc with the recent refactoring of the warning
to take advantage of this logic.

The attached patch corrects this mistake by accounting for this
property of these functions while at the same time constraining
the offset to the size of the source argument for better
accuracy.

Tested on x86_64-linux and by also building Glibc there.

Martin
PR middle-end/101397 - spurious warning writing to the result of stpcpy minus 1


gcc/ChangeLog:

	PR middle-end/101397
	* builtins.c (gimple_call_return_array): Add argument.  Correct
	offsets for memchr, mempcpy, stpcpy, and stpncpy.
	(compute_objsize_r): Adjust offset computation for argument returning
	built-ins.

gcc/testsuite/ChangeLog:

	PR middle-end/101397
	* gcc.dg/Warray-bounds-80.c: New test.
	* gcc.dg/Warray-bounds-81.c: New test.
	* gcc.dg/Warray-bounds-82.c: New test.
	* gcc.dg/Warray-bounds-83.c: New test.
	* gcc.dg/Warray-bounds-84.c: New test.
	* gcc.dg/Wstringop-overflow-46.c: Adjust expected output.

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 39ab139b7e1..170d776c410 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -5200,12 +5200,19 @@ get_offset_range (tree x, gimple *stmt, offset_int r[2], range_query *rvals)
 /* Return the argument that the call STMT to a built-in function returns
or null if it doesn't.  On success, set OFFRNG[] to the range of offsets
from the argument reflected in the value returned by the built-in if it
-   can be determined, otherwise to 0 and HWI_M1U respectively.  */
+   can be determined, otherwise to 0 and HWI_M1U respectively.  Set
+   *PAST_END for functions like mempcpy that might return a past the end
+   pointer (most functions return a dereferenceable pointer to an existing
+   element of an array).  */
 
 static tree
-gimple_call_return_array (gimple *stmt, offset_int offrng[2],
+gimple_call_return_array (gimple *stmt, offset_int offrng[2], bool *past_end,
 			  range_query *rvals)
 {
+  /* Clear and set below for the rare function(s) that might return
+ a past-the-end pointer.  */
+  *past_end = false;
+
   {
 /* Check for attribute fn spec to see if the function returns one
of its arguments.  */
@@ -5213,6 +5220,7 @@ gimple_call_return_array (gimple *stmt, offset_int offrng[2],
 unsigned int argno;
 if (fnspec.returns_arg (&argno))
   {
+	/* Functions return the first argument (not a range).  */
 	offrng[0] = offrng[1] = 0;
 	return gimple_call_arg (stmt, argno);
   }
@@ -5242,6 +5250,7 @@ gimple_call_return_array (gimple *stmt, offset_int offrng[2],
   if (gimple_call_num_args (stmt) != 2)
 	return NULL_TREE;
 
+  /* Allocation functions return a pointer to the beginning.  */
   offrng[0] = offrng[1] = 0;
   return gimple_call_arg (stmt, 1);
 }
@@ -5253,10 +5262,6 @@ gimple_call_return_array (gimple *stmt, offset_int offrng[2],
 case BUILT_IN_MEMMOVE:
 case BUILT_IN_MEMMOVE_CHK:
 case BUILT_IN_MEMSET:
-case BUILT_IN_STPCPY:
-case BUILT_IN_STPCPY_CHK:
-case BUILT_IN_STPNCPY:
-case BUILT_IN_STPNCPY_CHK:
 case BUILT_IN_STRCAT:
 case BUILT_IN_STRCAT_CHK:
 case BUILT_IN_STRCPY:
@@ -5265,18 +5270,34 @@ gimple_call_return_array (gimple *stmt, offset_int offrng[2],
 case BUILT_IN_STRNCAT_CHK:
 case BUILT_IN_STRNCPY:
 case BUILT_IN_STRNCPY_CHK:
+  /* Functions return the first argument (not a range).  */
   offrng[0] = offrng[1] = 0;
   return gimple_call_arg (stmt, 0);
 
 case BUILT_IN_MEMPCPY:
 case BUILT_IN_MEMPCPY_CHK:
   {
+	/* The returned pointer is in a range constrained by the smaller
+	   of the upper bound of the size argument and the source object
+	   size.  */
+	offrng[0] = 0;
+	offrng[1] = HOST_WIDE_INT_M1U;
 	tree off = gimple_call_arg (stmt, 2);
-	if (!get_offset_range (off, stmt, offrng, rvals))
+	bool off_valid = get_offset_range (off, stmt, offrng, rvals);
+	if (!off_valid || offrng[0] != offrng[1])
 	  {
-	offrng[0] = 0;
-	offrng[1] = HOST_WIDE_INT_M1U;
+	/* If the offset is either indeterminate or in some range,
+	   try to constrain its upper bound to at most the size
+	   of the source object.  */
+	access_ref aref;
+	tree src = gimple_call_arg (stmt, 1);
+	if (compute_objsize (src, 1, &aref, rvals)
+		&& aref.sizrng[1] < offrng[1])
+	  offrng[1] = aref.sizrng[1];
 	  }
+
+	/* Mempcpy may return a past-the-end pointer.  */
+	*past_end = true;
 	return gimple_call_arg (stmt, 0);
   }
 
@@ -5284,23 +5305,63 @@ gimple_call_return_array (gimple *stmt, offset_int offrng[2],
   {
 	tree off = gimple_call_arg (stmt, 2);
 	if (get_offset_range (off, stmt, offrng, rvals))
-	  offrng[0] = 0;
+	  off

Re: [RFC/PATCH] vect: Recog mul_highpart pattern

2021-07-14 Thread Kewen.Lin via Gcc-patches
on 2021/7/15 上午3:32, Segher Boessenkool wrote:
> On Wed, Jul 14, 2021 at 12:32:24PM +0100, Richard Sandiford wrote:
>> TBH, 79 vs. 80 isn't normally something I'd worry about when reviewing
>> new code.  But I know in the past people have asked for 79 to be used
>> for the “end+1” reason, so I don't think we should “fix” existing code
>> that honours the 79 limit so that it no longer does, especially when the
>> lines surrounding the code aren't changing.
> 
> The normal rule is you cannot go over 80.  It is perfectly fine to have
> shorter lines, certainly if that is nice for some other reason, so
> automatically (by some tool) changing this is Just Wrong.
> 

OK, could this be applied to changelog entry too?  I guess yes?

BR,
Kewen


Re: [RFC/PATCH] vect: Recog mul_highpart pattern

2021-07-14 Thread Kewen.Lin via Gcc-patches
on 2021/7/14 下午7:32, Richard Sandiford wrote:
> "Kewen.Lin"  writes:
>> Hi Richard,
>>
>> on 2021/7/14 下午4:38, Richard Sandiford wrote:
>>> "Kewen.Lin"  writes:
 gcc/ChangeLog:

* internal-fn.c (first_commutative_argument): Add info for IFN_MULH.
* internal-fn.def (IFN_MULH): New internal function.
* tree-vect-patterns.c (vect_recog_mulhs_pattern): Add support to
recog normal multiply highpart as IFN_MULH.
>>>
>>> LGTM FWIW, although:
>>>
>>
>> Thanks for the review!
>>
 @@ -2030,8 +2048,7 @@ vect_recog_mulhs_pattern (vec_info *vinfo,
/* Check for target support.  */
tree new_vectype = get_vectype_for_scalar_type (vinfo, new_type);
if (!new_vectype
 -  || !direct_internal_fn_supported_p
 -  (ifn, new_vectype, OPTIMIZE_FOR_SPEED))
 +  || !direct_internal_fn_supported_p (ifn, new_vectype, 
 OPTIMIZE_FOR_SPEED))
  return NULL;
  
/* The IR requires a valid vector type for the cast result, even though
 @@ -2043,8 +2060,8 @@ vect_recog_mulhs_pattern (vec_info *vinfo,
/* Generate the IFN_MULHRS call.  */
tree new_var = vect_recog_temp_ssa_var (new_type, NULL);
tree new_ops[2];
 -  vect_convert_inputs (vinfo, last_stmt_info, 2, new_ops, new_type,
 - unprom_mult, new_vectype);
 +  vect_convert_inputs (vinfo, last_stmt_info, 2, new_ops, new_type, 
 unprom_mult,
 + new_vectype);
gcall *mulhrs_stmt
  = gimple_build_call_internal (ifn, 2, new_ops[0], new_ops[1]);
gimple_call_set_lhs (mulhrs_stmt, new_var);
>>>
>>> …these changes look like formatting only.  (I guess it's down to whether
>>> or not the 80th column should be kept free for an “end of line+1” cursor.)
>>>
>>
>> Yeah, just for formatting, the formatting tool (clang-format) reformatted
>> them.  Thanks for the information on "end of line+1" cursor, I didn't know
>> that before.  I guess you prefer me to keep the original format?  If so I
>> will remove them when committing it.  I was thinking whether I should change
>> field ColumnLimit of my .clang-format to 79 to avoid this kind of case to
>> be caught by formatting tool again.  Hope reviewers won't nit-pick the exact
>> 80 column cases then. :)
> 
> TBH, 79 vs. 80 isn't normally something I'd worry about when reviewing
> new code.  But I know in the past people have asked for 79 to be used
> for the “end+1” reason, so I don't think we should “fix” existing code
> that honours the 79 limit so that it no longer does, especially when the
> lines surrounding the code aren't changing.
> 

Thanks for the explanation!  Agree.

> There's also a risk of yo-yo-ing if someone else is using clang-format
> and does have the limit set to 79 columns.
> 
> So yeah, I think it'd better to commit without the two hunks above.
> 

Will fix them.  Thanks for catching and explanations!

BR,
Kewen


Re: [POWER10] __morestack calls from pcrel code

2021-07-14 Thread David Edelsohn via Gcc-patches
On Wed, Jul 14, 2021 at 8:01 PM Alan Modra  wrote:
>
> On Wed, Jun 30, 2021 at 05:06:30PM -0300, Tulio Magno Quites Machado Filho 
> wrote:
> > Alan Modra via Gcc-patches  writes:
> >
> > > Compiling gcc/testsuite/gcc.dg/split-*.c and others with -mcpu=power10
> > > and linking with a non-pcrel libgcc results in crashes due to the
> > > power10 pcrel code not having r2 set for the generic-morestack.c
> > > functions called from __morestack.  There is also a problem when
> > > non-pcrel code calls a pcrel libgcc.  See the patch comments.
> > >
> > > A similar situation theoretically occurs with ELFv1 multi-toc
> > > executables, when __morestack might be located in a different toc
> > > group to its caller.  This patch makes no attempt to fix that, since
> > > the gold linker does not support multi-toc (gold is needed for proper
> > > support of -fsplit-stack code) nor does gcc emit __morestack calls
> > > that support multi-toc.
> > >
> > > Bootstrapped and regression tested power64le-linux with both
> > > -mcpu=power10 and -mcpu=power9.  OK for mainline and backporting to
> > > gcc-11 and gcc-10?
> > >
> > > * config/rs6000/morestack.S (R2_SAVE): Define.
> > > (__morestack): Save and restore r2.  Set up r2 for called
> > > functions.
> >
> > Thanks! This patch solved the issue I was seeing.
> >
> > If it gets merged, can this patch be backported to GCC 10 and 11, please?
> >
> > --
> > Tulio Magno
>
> https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573978.html
>
> This patch has now been unreviewed for over two weeks.  I expected a
> rubber stamp style approval;  This assembly file is all mine, I know
> the ABI and how .eh_frame driven exception handling works on powerpc.
> So I'm going to claim the patch is obvious enough to someone with a
> good understanding of what is going on in morestack.S and commit under
> the "obvious" rule after allowing a few more days for comment.

This patch is okay.

Thanks, David


Re: [PATCH v2] Analyze niter for until-wrap condition [PR101145]

2021-07-14 Thread guojiufu via Gcc-patches

Hi,

I would like to have an early ping on this with more mail addresses.

BR,
Jiufu.

On 2021-07-07 20:47, Jiufu Guo wrote:

Changes since v1:
* Update assumptions for niter, add more test cases check
* Use widest_int/wide_int instead mpz to do +-/
* Move some early check for quick return

For code like:
unsigned foo(unsigned val, unsigned start)
{
  unsigned cnt = 0;
  for (unsigned i = start; i > val; ++i)
cnt++;
  return cnt;
}

The number of iterations should be about UINT_MAX - start.

There is function adjust_cond_for_loop_until_wrap which
handles similar work for const bases.
Like adjust_cond_for_loop_until_wrap, this patch enhance
function number_of_iterations_cond/number_of_iterations_lt
to analyze number of iterations for this kind of loop.

Bootstrap and regtest pass on powerpc64le, x86_64 and aarch64.
Is this ok for trunk?

gcc/ChangeLog:

2021-07-07  Jiufu Guo  

PR tree-optimization/101145
* tree-ssa-loop-niter.c (number_of_iterations_until_wrap):
New function.
(number_of_iterations_lt): Invoke above function.
(adjust_cond_for_loop_until_wrap):
Merge to number_of_iterations_until_wrap.
(number_of_iterations_cond): Update invokes for
adjust_cond_for_loop_until_wrap and number_of_iterations_lt.

gcc/testsuite/ChangeLog:

2021-07-07  Jiufu Guo  

PR tree-optimization/101145
* gcc.dg/vect/pr101145.c: New test.
* gcc.dg/vect/pr101145.inc: New test.
* gcc.dg/vect/pr101145_1.c: New test.
* gcc.dg/vect/pr101145_2.c: New test.
* gcc.dg/vect/pr101145_3.c: New test.
* gcc.dg/vect/pr101145inf.c: New test.
* gcc.dg/vect/pr101145inf.inc: New test.
* gcc.dg/vect/pr101145inf_1.c: New test.
---
 gcc/testsuite/gcc.dg/vect/pr101145.c  | 187 ++
 gcc/testsuite/gcc.dg/vect/pr101145.inc|  63 
 gcc/testsuite/gcc.dg/vect/pr101145_1.c|  15 ++
 gcc/testsuite/gcc.dg/vect/pr101145_2.c|  15 ++
 gcc/testsuite/gcc.dg/vect/pr101145_3.c|  15 ++
 gcc/testsuite/gcc.dg/vect/pr101145inf.c   |  25 +++
 gcc/testsuite/gcc.dg/vect/pr101145inf.inc |  28 
 gcc/testsuite/gcc.dg/vect/pr101145inf_1.c |  23 +++
 gcc/tree-ssa-loop-niter.c | 157 ++
 9 files changed, 463 insertions(+), 65 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr101145.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr101145.inc
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr101145_1.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr101145_2.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr101145_3.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr101145inf.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr101145inf.inc
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr101145inf_1.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr101145.c
b/gcc/testsuite/gcc.dg/vect/pr101145.c
new file mode 100644
index 000..74031b031cf
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr101145.c
@@ -0,0 +1,187 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-options "-O3 -fdump-tree-vect-details" } */
+#include 
+
+unsigned __attribute__ ((noinline))
+foo (int *__restrict__ a, int *__restrict__ b, unsigned l, unsigned n)
+{
+  while (n < ++l)
+*a++ = *b++ + 1;
+  return l;
+}
+
+unsigned __attribute__ ((noinline))
+foo_1 (int *__restrict__ a, int *__restrict__ b, unsigned l, unsigned)
+{
+  while (UINT_MAX - 64 < ++l)
+*a++ = *b++ + 1;
+  return l;
+}
+
+unsigned __attribute__ ((noinline))
+foo_2 (int *__restrict__ a, int *__restrict__ b, unsigned l, unsigned 
n)

+{
+  l = UINT_MAX - 32;
+  while (n < ++l)
+*a++ = *b++ + 1;
+  return l;
+}
+
+unsigned __attribute__ ((noinline))
+foo_3 (int *__restrict__ a, int *__restrict__ b, unsigned l, unsigned 
n)

+{
+  while (n <= ++l)
+*a++ = *b++ + 1;
+  return l;
+}
+
+unsigned __attribute__ ((noinline))
+foo_4 (int *__restrict__ a, int *__restrict__ b, unsigned l, unsigned 
n)

+{  // infininate
+  while (0 <= ++l)
+*a++ = *b++ + 1;
+  return l;
+}
+
+unsigned __attribute__ ((noinline))
+foo_5 (int *__restrict__ a, int *__restrict__ b, unsigned l, unsigned 
n)

+{
+  //no loop
+  l = UINT_MAX;
+  while (n < ++l)
+*a++ = *b++ + 1;
+  return l;
+}
+
+unsigned __attribute__ ((noinline))
+bar (int *__restrict__ a, int *__restrict__ b, unsigned l, unsigned n)
+{
+  while (--l < n)
+*a++ = *b++ + 1;
+  return l;
+}
+
+unsigned __attribute__ ((noinline))
+bar_1 (int *__restrict__ a, int *__restrict__ b, unsigned l, unsigned)
+{
+  while (--l < 64)
+*a++ = *b++ + 1;
+  return l;
+}
+
+unsigned __attribute__ ((noinline))
+bar_2 (int *__restrict__ a, int *__restrict__ b, unsigned l, unsigned 
n)

+{
+  l = 32;
+  while (--l < n)
+*a++ = *b++ + 1;
+  return l;
+}
+
+
+int a[3200], b[3200];
+int fail;
+
+int
+main ()
+{
+  unsigned l, n;
+  unsigned res;
+  /* l > n*/
+  n = UINT_MAX - 64;
+  l = n + 32;
+  res = foo (a, b, l, n);
+  if (res != 0)
+fail++;
+
+  l = n;
+

Re: [POWER10] __morestack calls from pcrel code

2021-07-14 Thread Alan Modra via Gcc-patches
On Wed, Jun 30, 2021 at 05:06:30PM -0300, Tulio Magno Quites Machado Filho 
wrote:
> Alan Modra via Gcc-patches  writes:
> 
> > Compiling gcc/testsuite/gcc.dg/split-*.c and others with -mcpu=power10
> > and linking with a non-pcrel libgcc results in crashes due to the
> > power10 pcrel code not having r2 set for the generic-morestack.c
> > functions called from __morestack.  There is also a problem when
> > non-pcrel code calls a pcrel libgcc.  See the patch comments.
> >
> > A similar situation theoretically occurs with ELFv1 multi-toc
> > executables, when __morestack might be located in a different toc
> > group to its caller.  This patch makes no attempt to fix that, since
> > the gold linker does not support multi-toc (gold is needed for proper
> > support of -fsplit-stack code) nor does gcc emit __morestack calls
> > that support multi-toc.
> >
> > Bootstrapped and regression tested power64le-linux with both
> > -mcpu=power10 and -mcpu=power9.  OK for mainline and backporting to
> > gcc-11 and gcc-10?
> >
> > * config/rs6000/morestack.S (R2_SAVE): Define.
> > (__morestack): Save and restore r2.  Set up r2 for called
> > functions.
> 
> Thanks! This patch solved the issue I was seeing.
> 
> If it gets merged, can this patch be backported to GCC 10 and 11, please?
> 
> -- 
> Tulio Magno

https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573978.html

This patch has now been unreviewed for over two weeks.  I expected a
rubber stamp style approval;  This assembly file is all mine, I know
the ABI and how .eh_frame driven exception handling works on powerpc.
So I'm going to claim the patch is obvious enough to someone with a
good understanding of what is going on in morestack.S and commit under
the "obvious" rule after allowing a few more days for comment.

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH 05/55] rs6000: Add helper functions for parsing

2021-07-14 Thread Segher Boessenkool
Hi!

On Thu, Jun 17, 2021 at 10:18:49AM -0500, Bill Schmidt wrote:
>   * config/rs6000/rs6000-gen-builtins.c (consume_whitespace): New
>   function.
>   (advance_line): Likewise.
>   (safe_inc_pos): Likewise.
>   (match_identifier): Likewise.
>   (match_integer): Likewise.
>   (match_to_right_bracket): Likewise.

> +/* Pass over unprintable characters and whitespace (other than a newline,
> +   which terminates the scan).  */

See Will's review :-)

> +  buf[lastpos - startpos + 1] = '\0';

Just "= 0"?  It means exactly the same.

You can write just
  diag ("bla bla bla")
instead of
  (*diag) ("bla bla bla");
btw.

The patch is okay for trunk with whatever you want to do with those
comments (but do fix the consume_whitespace comment please).  Thanks!


Segher


Re: [PATCH 05/55] rs6000: Add helper functions for parsing

2021-07-14 Thread Segher Boessenkool
Hi!

On Fri, Jul 09, 2021 at 02:32:59PM -0500, will schmidt wrote:
> On Thu, 2021-06-17 at 10:18 -0500, Bill Schmidt via Gcc-patches wrote:
> > 2021-06-07  Bill Schmidt  
> > +/* Pass over unprintable characters and whitespace (other than a newline,
> > +   which terminates the scan).  */

> > +static void
> > +consume_whitespace (void)
> > +{
> > +  while (pos < LINELEN && isspace(linebuf[pos]) && linebuf[pos] != '\n')
> > +pos++;
> > +  return;
> > +}

> AFAIK isspace() and thusly this helper only skips whitespace, so
> nothing unprintable is actually handled or skipped here.

Right, and that behaviour would not match with the function name either.
isspace returns true for 0x09..0x0d and 0x20, all of which are
whitespace.


Segher


Re: [committed] input.c: move file caching globals to a new file_cache class

2021-07-14 Thread David Malcolm via Gcc-patches
On Sun, 2021-07-11 at 12:58 -0400, Lewis Hyatt wrote:
> Hi David-
> 
> I thought this might be a good opportunity to ask about the patch
> that
> supports -finput-charset in diagnostic.c please?
> https://gcc.gnu.org/pipermail/gcc-patches/2021-January/564527.html
> 
> The patch will require some work to adapt to the new changes below. 

Sorry about that, I forgot about your patch.

> I
> am happy to do that, but thought I should check first whether you
> have
> any interest in this approach? Thanks!

The basic approach seems good to me, so please do update the patch and
resend it.

Dave



Re: Repost: [PATCH] Fix long double tests when default long double is not IBM.

2021-07-14 Thread Segher Boessenkool
hi!

On Wed, Jul 07, 2021 at 03:58:37PM -0400, Michael Meissner wrote:
> +/* We force the long double type to be IBM 128-bit because the 
> CONVERT_TO_PINF

There is no "forcing" here.  "We use ..." or "We require ..." is fine.

"Force" suggests something tries to prevent you.

"Override" is worse.  What is overridden, who's decision is overridden?

> +# Check if we can explicitly override the long double format to use the IBM
> +# 128-bit extended double format, and GLIBC supports doing this override by
> +# switching the sprintf to handle IBM 128-bit long double.
> +
> +proc add_options_for_ppc_long_double_override_ibm128 { flags } {

So this name does not say what it does at all (it does not say anything
about glibc).

> +if { [istarget powerpc*-*-*] } {
> + return "$flags -mlong-double-128 -Wno-psabi -mabi=ibmlongdouble"
> +}
> +return "$flags"
> +}

(And neither does this code.  The comment is for the *next* function!)

> +proc check_effective_target_ppc_long_double_override_ibm128 { } {

So this returns false if your libc does not handle printf for your
selected long double type.  That is problematic, the name does not
suggest anything like that.

> +return [check_runtime_nocache ppc_long_double_override_ibm128 {
> + #include 
> + #include 
> + volatile __ibm128 a = (__ibm128) 3.0;
> + volatile long double one = 1.0L;
> + volatile long double two = 2.0L;
> + volatile long double b;
> + char buffer[20];
> + int main()
> + {
> +   #if !defined(_ARCH_PPC) || !defined(__LONG_DOUBLE_IBM128__)
> + return 1;

This is only ever called for Power.  Remove the first part please.  When
can the second part trigger?

> +   #else
> + b = one + two;
> + if (memcmp ((void *)&a, (void *)&b, sizeof (long double)) != 0)
> +   return 1;
> + sprintf (buffer, "%lg", b);
> + return strcmp (buffer, "3") != 0;
> +   #endif
> + }
> +} [add_options_for_ppc_long_double_override_ibm128 ""]]
> +}

Those casts are useless btw, don't do that.  And that sizeof *has* to be
16, and so does that of __ibm128, so just write 16 please.

Many more of the same...  Use something like the stuff starting at
foreach { armfunc armflag armdefs } {
(for arm) in target-supports.exp?


Segher


Re: [patch][version 4]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-07-14 Thread Qing Zhao via Gcc-patches


> On Jul 14, 2021, at 4:23 PM, Kees Cook  wrote:
> 
> On Wed, Jul 14, 2021 at 07:30:45PM +, Qing Zhao wrote:
>> Hi, Kees,
>> 
>> 
>>> On Jul 14, 2021, at 2:11 PM, Kees Cook  wrote:
>>> 
>>> On Wed, Jul 14, 2021 at 02:09:50PM +, Qing Zhao wrote:
 Hi, Richard,
 
> On Jul 14, 2021, at 2:14 AM, Richard Biener  
> wrote:
> 
> On Wed, Jul 14, 2021 at 1:17 AM Qing Zhao  wrote:
>> 
>> Hi, Kees,
>> 
>> I took a look at the kernel testing case you attached in the previous 
>> email, and found the testing failed with the following case:
>> 
>> #define INIT_STRUCT_static_all  = { .one = arg->one,\
>>  .two = arg->two,\
>>  .three = arg->three,\
>>  .four = arg->four,  \
>>  }
>> 
>> i.e, when the structure type auto variable has been explicitly 
>> initialized in the source code.  -ftrivial-auto-var-init in the 4th 
>> version
>> does not initialize the paddings for such variables.
>> 
>> But in the previous version of the patches ( 2 or 3), 
>> -ftrivial-auto-var-init initializes the paddings for such variables.
>> 
>> I intended to remove this part of the code from the 4th version of the 
>> patch since the implementation for initializing such paddings is 
>> completely different from
>> the initializing of the whole structure as a whole with memset in this 
>> version of the implementation.
>> 
>> If we really need this functionality, I will add another separate patch 
>> for this additional functionality, but not with this patch.
>> 
>> Richard, what’s your comment and suggestions on this?
> 
> I think this can be addressed in the gimplifier by adjusting
> gimplify_init_constructor to clear
> the object before the initialization (if it's not done via aggregate
> copying).  
 
 I did this in the previous versions of the patch like the following:
 
 @@ -5001,6 +5185,17 @@ gimplify_init_constructor (tree *expr_p, gimple_seq 
 *pre_p, gimple_seq *post_p,
  /* If a single access to the target must be ensured and all elements
 are zero, then it's optimal to clear whatever their number.  */
  cleared = true;
 +  else if (flag_trivial_auto_var_init > AUTO_INIT_UNINITIALIZED
 +   && !TREE_STATIC (object)
 +   && type_has_padding (type))
 +/* If the user requests to initialize automatic variables with
 +   paddings inside the type, we should initialize the paddings too.
 +   C guarantees that brace-init with fewer initializers than members
 +   aggregate will initialize the rest of the aggregate as-if it were
 +   static initialization.  In turn static initialization guarantees
 +   that pad is initialized to zero bits.
 +   So, it's better to clear the whole record under such situation.  */
 +cleared = true;
else
  cleared = false;
 
 Then the paddings are also initialized to zeroes with this option. (Even 
 for -ftrivial-auto-var-init=pattern).
>>> 
>>> Thanks! I've tested with the attached patch to v4 and it passes all my
>>> tests again.
>>> 
 Is the above change Okay? (With this change, when 
 -ftrivial-auto-var-init=pattern, the paddings for the
 structure variables that have explicit initializer will be ZEROed, not 
 0xFE)
>>> 
>>> Padding zeroing in the face of pattern-init is correct (and matches what
>>> Clang does).
>> 
>> During the discussion before the 4th version of the patch, we have agreed 
>> that pattern-init will use 0xFE byte-repeatable patterns 
>> to initialize all the types (this includes the paddings when the structure 
>> type variables are not explicitly initialized). And will not match
>> Clang’s current behavior. 
> 
> Right, that's fine.
> 
>> If we initialize the paddings when the structure type variables are 
>> explicitly initialized to Zeroes, then there will be inconsistency 
>> between values that are used to initialize structure paddings under 
>> different situations, This looks not good to me.
>> 
>> If we have agreed on using 0xFE byte-repeatable patterns for pattern-init, 
>> then all the paddings should be initialized with the same 
>> pattern. 
> 
> Ah! By "situation", you mean how the compiler chooses to initialize the
> structure members?

There are three situations that we should initialize the paddings of a 
structure type auto-variable:

1. When there is no any explicit initializer;
2. When there is an explicit initializer, and the initializer only partially 
initialize the structure variable;
3. When there is an explicit initializer, and the initializer fully initialize 
the structure variable;

The code examples for the abo

Re: [PATCH] Fix regular expression error in PR 100166 patch

2021-07-14 Thread Michael Meissner via Gcc-patches
On Wed, Jul 14, 2021 at 04:51:52PM -0500, Bill Schmidt wrote:
> Hi Mike,
> 
> On 7/14/21 4:42 PM, Michael Meissner wrote:
> >Fix regular expression error in PR 100166 patch
> >
> >In my patch for PR testsuite/100166 (which fixes various tests so that the
> >plxv and pstxv instructions can be counted as legitimate instructions), I
> >had a typo in the pr86731-fwrapv-longlong.c test (using plvx instead of
> >plxv).  This patch fixes this error.
> >
> >Can I apply it to the mainline?  I have tested this on a big endian power8
> >system (using --with-cpu=power8), a little endian power9 system (using
> >--with-cpu=power9), and on a power10 prototype (using --with-cpu=power10).  
> >The
> >pr86731-fwrapv-longlong.c test passes in all cases.
> >
> >2021-07-14  Michael Meissner  
> >
> >gcc/testsuite/
> > PR testsuite/100166
> > * gcc.target/powerpc/pr86731-fwrapv-longlong.c: Fix typo.
> >---
> >  gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> >diff --git a/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c 
> >b/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c
> >index bd1502bb30a..97bc60f7cd6 100644
> >--- a/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c
> >+++ b/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c
> >@@ -30,5 +30,5 @@ vector signed long long splats4(void)
> >  /* { dg-final { scan-assembler-times {\mvspltis[bhw]\M} 0 } } */
> >  /* { dg-final { scan-assembler-times {\mvsl[bhwd]\M} 0 } } */
> >-/* { dg-final { scan-assembler-times {\mp?lxv\M|\mlxv\M|\mlxvd2x\M} 2 } } */
> >+/* { dg-final { scan-assembler-times {\mp?lxv?\M|\mlxvd2x\M} 2 } } */
> 
> That doesn't look like what you meant to do... That would accept
> "lx" among other strings...

Yes, you are right.  I was trying to capture lxvx as per a suggestion from
Segher.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797


[GCC 11] [COMMITTED] Fix build_gt and build_lt for signed 1 bit values.

2021-07-14 Thread Andrew MacLeod via Gcc-patches

Cherry picked from 84f7bab89279ca1234fef88929c74caeda8cb55e

Bootstrapped on x86_64-pc-linux-gnu with no regressions. pushed.

Andrew

>From b977e6b29c67be81df882d1f5cc7eb6a5d8c98a0 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Wed, 30 Jun 2021 14:15:53 -0400
Subject: [PATCH 8/8] Fix build_gt and build_lt for signed 1 bit values.

Signed 1 bit values have a range of [-1, 0] but neither (0 - 1) nor (-1 + 1)
can be represented.  For signed values, add or subtract -1 as appropriate.

	PR tree-optimization/101223
	gcc/
	* range-op.cc (build_lt): Add -1 for signed values.
	(built_gt): Subtract -1 for signed values.

	gcc/testsuite/
	* gcc.dg/pr101223.c: New.

(cherry picked from commit 84f7bab89279ca1234fef88929c74caeda8cb55e)
---
 gcc/range-op.cc | 18 --
 gcc/testsuite/gcc.dg/pr101223.c | 44 +
 2 files changed, 60 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr101223.c

diff --git a/gcc/range-op.cc b/gcc/range-op.cc
index 3a35a2fb25b..c33f4edc2c1 100644
--- a/gcc/range-op.cc
+++ b/gcc/range-op.cc
@@ -562,7 +562,14 @@ static void
 build_lt (irange &r, tree type, const wide_int &val)
 {
   wi::overflow_type ov;
-  wide_int lim = wi::sub (val, 1, TYPE_SIGN (type), &ov);
+  wide_int lim;
+  signop sgn = TYPE_SIGN (type);
+
+  // Signed 1 bit cannot represent 1 for subtraction.
+  if (sgn == SIGNED)
+lim = wi::add (val, -1, sgn, &ov);
+  else
+lim = wi::sub (val, 1, sgn, &ov);
 
   // If val - 1 underflows, check if X < MIN, which is an empty range.
   if (ov)
@@ -585,7 +592,14 @@ static void
 build_gt (irange &r, tree type, const wide_int &val)
 {
   wi::overflow_type ov;
-  wide_int lim = wi::add (val, 1, TYPE_SIGN (type), &ov);
+  wide_int lim;
+  signop sgn = TYPE_SIGN (type);
+
+  // Signed 1 bit cannot represent 1 for addition.
+  if (sgn == SIGNED)
+lim = wi::sub (val, -1, sgn, &ov);
+  else
+lim = wi::add (val, 1, sgn, &ov);
   // If val + 1 overflows, check is for X > MAX, which is an empty range.
   if (ov)
 r.set_undefined ();
diff --git a/gcc/testsuite/gcc.dg/pr101223.c b/gcc/testsuite/gcc.dg/pr101223.c
new file mode 100644
index 000..6d5a247fa6c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr101223.c
@@ -0,0 +1,44 @@
+/* PR tree-optimization/101223 */
+/* { dg-do run } */
+/* { dg-options "-O2 " } */
+
+struct {
+  int a : 1;
+} b;
+int c = 1, d;
+int foo1() {
+  for (; d < 2; d++) {
+int e = ~c, f = 0, g;
+if (e) {
+  f = c;
+  g = b.a;
+  b.a = f;
+  if (b.a >= g)
+__builtin_abort();
+}
+c = f;
+b.a = g;
+  }
+  return 0;
+}
+
+int foo2() {
+  for (; d < 2; d++) {
+int e = ~c, f = 0, g;
+if (e) {
+  f = c;
+  g = b.a;
+  b.a = f;
+  if (g <= b.a)
+__builtin_abort();
+}
+c = f;
+b.a = g;
+  }
+  return 0;
+}
+int main ()
+{
+  return foo1() + foo2();
+}
+  
-- 
2.17.2



[GCC 11] [COMMITTED] Fix build_gt and build_lt for signed 1 bit values.

2021-07-14 Thread Andrew MacLeod via Gcc-patches

Cherry picked from 84f7bab89279ca1234fef88929c74caeda8cb55e

Bootstrapped on x86_64-pc-linux-gnu with no regressions. pushed.

Andrew



[GCC 11] [COMMITTED] Do not continue propagating values which cannot be set, properly.

2021-07-14 Thread Andrew MacLeod via Gcc-patches
If the on-entry cache cannot properly represent a range, do not continue 
trying to propagate it.  Adapted version of a GCC 12 patch which works 
in conjuction with the sparse on entry cache update.


Bootstrapped on x86_64-pc-linux-gnu with no regressions. pushed.

Andrew


>From 85c22c517e9571d1f0f487fd708fbb01f36f172a Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Tue, 22 Jun 2021 17:46:05 -0400
Subject: [PATCH 7/8] Do not continue propagating values which cannot be set
 properly.

If the on-entry cache cannot properly represent a range, do not continue
trying to propagate it.

	PR tree-optimization/101148
	PR tree-optimization/101014
	* gimple-range-cache.cc (ranger_cache::ranger_cache): Adjust.
	(ranger_cache::~ranger_cache): Adjust.
	(ranger_cache::block_range): Check if propagation disallowed.
	(ranger_cache::propagate_cache): Disallow propagation if new value
	can't be stored properly.
	* gimple-range-cache.h (ranger_cache::m_propfail): New member.
---
 gcc/gimple-range-cache.cc | 11 ++-
 gcc/gimple-range-cache.h  |  1 +
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/gcc/gimple-range-cache.cc b/gcc/gimple-range-cache.cc
index 610d4c50531..ff0084545ab 100644
--- a/gcc/gimple-range-cache.cc
+++ b/gcc/gimple-range-cache.cc
@@ -742,10 +742,12 @@ ranger_cache::ranger_cache (gimple_ranger &q) : query (q)
   m_poor_value_list.safe_grow_cleared (20);
   m_poor_value_list.truncate (0);
   m_temporal = new temporal_cache;
+  m_propfail = BITMAP_ALLOC (NULL);
 }
 
 ranger_cache::~ranger_cache ()
 {
+  BITMAP_FREE (m_propfail);
   delete m_temporal;
   m_poor_value_list.release ();
   m_workback.release ();
@@ -958,7 +960,9 @@ ranger_cache::block_range (irange &r, basic_block bb, tree name, bool calc)
 void
 ranger_cache::add_to_update (basic_block bb)
 {
-  if (!m_update_list.contains (bb))
+  // If propagation has failed for BB, or its already in the list, don't
+  // add it again.
+  if (!bitmap_bit_p (m_propfail, bb->index) &&  !m_update_list.contains (bb))
 m_update_list.quick_push (bb);
 }
 
@@ -975,6 +979,7 @@ ranger_cache::propagate_cache (tree name)
   int_range_max current_range;
   int_range_max e_range;
 
+  gcc_checking_assert (bitmap_empty_p (m_propfail));
   // Process each block by seeing if its calculated range on entry is
   // the same as its cached value. If there is a difference, update
   // the cache to reflect the new value, and check to see if any
@@ -1031,6 +1036,9 @@ ranger_cache::propagate_cache (tree name)
   if (new_range != current_range)
 	{
 	  bool ok_p = m_on_entry.set_bb_range (name, bb, new_range);
+	  // If the cache couldn't set the value, mark it as failed.
+	  if (!ok_p)
+	bitmap_set_bit (m_propfail, bb->index);
 	  if (DEBUG_RANGE_CACHE) 
 	{
 	  if (!ok_p)
@@ -1060,6 +1068,7 @@ ranger_cache::propagate_cache (tree name)
   print_generic_expr (dump_file, name, TDF_SLIM);
   fprintf (dump_file, "\n");
 }
+  bitmap_clear (m_propfail);
 }
 
 // Check to see if an update to the value for NAME in BB has any effect
diff --git a/gcc/gimple-range-cache.h b/gcc/gimple-range-cache.h
index f82816f10c1..d536f09940f 100644
--- a/gcc/gimple-range-cache.h
+++ b/gcc/gimple-range-cache.h
@@ -115,6 +115,7 @@ private:
 
   void propagate_updated_value (tree name, basic_block bb);
 
+  bitmap m_propfail;
   vec m_workback;
   vec m_update_list;
 
-- 
2.17.2



[GCC 11] On entry cache cleanups

2021-07-14 Thread Andrew MacLeod via Gcc-patches
The following \3 patches were applied to releases/gcc-11.  They form the 
bulk of the on-entry fixes in trunk


d3344fbe7bc25f414671ad7a37d2e9601942597d
Clean up and virtualize the on-entry cache interface.  cherry picked 
from 14b0f37a644d7b59e1737fb275ec4fff044972a8


4ed9f2e65a69e270ea47221318d64ace3d50848
Implement multi-bit aligned accessors for sparse bitmap. cherry picked 
from commit 5ad089a3c946aec655436fa3b0b50d6574b78197


52f0aa4dee8401ef3958dbf789780b0ee877beab
Implement a sparse bitmap representation for Rangers on-entry cache. 
cherry picked from commit 9858cd1a6827ee7a928318acb5e86389f79b4012


Bootstrapped on x86_64-pc-linux-gnu with no regressions.  pushed.

Andrew




[GCC 11] [COMMITTED] Disable poor value processing in ranger cache.

2021-07-14 Thread Andrew MacLeod via Gcc-patches
We have disabled the poor value cache in trunk as not providing enough 
benefit for the cost. do the same in gcc 11.


Bootstrapped on x86_64-pc-linux-gnu with no regressions.  pushed.

Andrew


>From 86534c07a390e240ffea51653945de85df7a3632 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Wed, 23 Jun 2021 12:55:14 -0400
Subject: [PATCH 5/8] Disable poor value processing in ranger cache.

	* gimple-range-cache.cc (ranger_cache::push_poor_value): Disable
	poor value processing.
---
 gcc/gimple-range-cache.cc | 17 +++--
 1 file changed, 3 insertions(+), 14 deletions(-)

diff --git a/gcc/gimple-range-cache.cc b/gcc/gimple-range-cache.cc
index bc4b557b493..068905d4774 100644
--- a/gcc/gimple-range-cache.cc
+++ b/gcc/gimple-range-cache.cc
@@ -846,20 +846,9 @@ ranger_cache::register_dependency (tree name, tree dep)
 bool
 ranger_cache::push_poor_value (basic_block bb, tree name)
 {
-  if (m_poor_value_list.length ())
-{
-  // Don't push anything else to the same block.  If there are multiple 
-  // things required, another request will come during a later evaluation
-  // and this prevents oscillation building uneccessary depth.
-  if ((m_poor_value_list.last ()).bb == bb)
-	return false;
-}
-
-  struct update_record rec;
-  rec.bb = bb;
-  rec.calc = name;
-  m_poor_value_list.safe_push (rec);
-  return true;
+  // Disable poor value processing for GCC 11.  It has been disabled in GCC 12
+  // as adding too much churn/compile time for too little benefit.
+  return false;
 }
 
 //  Provide lookup for the gori-computes class to access the best known range
-- 
2.17.2



[GCC 11] [COMMITTED] Don't process lookups for debug statements in Ranger.

2021-07-14 Thread Andrew MacLeod via Gcc-patches
Short cut version to disable processing of debug statements by ranger in 
GCC 11.


Although PR 100781 is not an issue in GCC11, its possible that a similar
situation may arise.  The identical fix cannot be easily introduced.
With EVRP always running in hybrid mode, there is no need for ranger to
spawn a lookup for a debug statement in this release.

bootstrapped with no regressions on x86_64-pc-linux-gnu.  pushed.

Andrew


>From 263a7e20c88a35bdfaebfac3c9abb313c5867590 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Tue, 8 Jun 2021 09:43:17 -0400
Subject: [PATCH 4/8] Don't process lookups for debug statements in Ranger.

Although PR 100781 is not an issue in GCC11, its possible that a similar
situation may arise.  The identical fix cannot be easily introduced.
With EVRP always running in hybrid mode, there is no need for ranger to
spawn a lookup for a debug statement in this release.

	* gimple-range.cc (gimple_ranger::range_of_expr): Treat debug statments
	as contextless queries to avoid additional lookups.
---
 gcc/gimple-range.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/gimple-range.cc b/gcc/gimple-range.cc
index 6158a754dd6..fd7fa5e3dbb 100644
--- a/gcc/gimple-range.cc
+++ b/gcc/gimple-range.cc
@@ -945,7 +945,7 @@ gimple_ranger::range_of_expr (irange &r, tree expr, gimple *stmt)
 return get_tree_range (r, expr);
 
   // If there is no statement, just get the global value.
-  if (!stmt)
+  if (!stmt || is_gimple_debug (stmt))
 {
   if (!m_cache.get_global_range (r, expr))
 r = gimple_range_global (expr);
-- 
2.17.2



Re: [PATCH] Fix regular expression error in PR 100166 patch

2021-07-14 Thread Bill Schmidt via Gcc-patches

Hi Mike,

On 7/14/21 4:42 PM, Michael Meissner wrote:

Fix regular expression error in PR 100166 patch

In my patch for PR testsuite/100166 (which fixes various tests so that the
plxv and pstxv instructions can be counted as legitimate instructions), I
had a typo in the pr86731-fwrapv-longlong.c test (using plvx instead of
plxv).  This patch fixes this error.

Can I apply it to the mainline?  I have tested this on a big endian power8
system (using --with-cpu=power8), a little endian power9 system (using
--with-cpu=power9), and on a power10 prototype (using --with-cpu=power10).  The
pr86731-fwrapv-longlong.c test passes in all cases.

2021-07-14  Michael Meissner  

gcc/testsuite/
PR testsuite/100166
* gcc.target/powerpc/pr86731-fwrapv-longlong.c: Fix typo.
---
  gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c 
b/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c
index bd1502bb30a..97bc60f7cd6 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c
@@ -30,5 +30,5 @@ vector signed long long splats4(void)
  
  /* { dg-final { scan-assembler-times {\mvspltis[bhw]\M} 0 } } */

  /* { dg-final { scan-assembler-times {\mvsl[bhwd]\M} 0 } } */
-/* { dg-final { scan-assembler-times {\mp?lxv\M|\mlxv\M|\mlxvd2x\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mp?lxv?\M|\mlxvd2x\M} 2 } } */
  


That doesn't look like what you meant to do... That would accept "lx" 
among other strings...


Thanks,
Bill



[PATCH] Fix regular expression error in PR 100166 patch

2021-07-14 Thread Michael Meissner via Gcc-patches
Fix regular expression error in PR 100166 patch

In my patch for PR testsuite/100166 (which fixes various tests so that the
plxv and pstxv instructions can be counted as legitimate instructions), I
had a typo in the pr86731-fwrapv-longlong.c test (using plvx instead of
plxv).  This patch fixes this error.

Can I apply it to the mainline?  I have tested this on a big endian power8
system (using --with-cpu=power8), a little endian power9 system (using
--with-cpu=power9), and on a power10 prototype (using --with-cpu=power10).  The
pr86731-fwrapv-longlong.c test passes in all cases.

2021-07-14  Michael Meissner  

gcc/testsuite/
PR testsuite/100166
* gcc.target/powerpc/pr86731-fwrapv-longlong.c: Fix typo.
---
 gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c 
b/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c
index bd1502bb30a..97bc60f7cd6 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c
@@ -30,5 +30,5 @@ vector signed long long splats4(void)
 
 /* { dg-final { scan-assembler-times {\mvspltis[bhw]\M} 0 } } */
 /* { dg-final { scan-assembler-times {\mvsl[bhwd]\M} 0 } } */
-/* { dg-final { scan-assembler-times {\mp?lxv\M|\mlxv\M|\mlxvd2x\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mp?lxv?\M|\mlxvd2x\M} 2 } } */
 
-- 
2.31.1


-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797


Re: [patch][version 4]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-07-14 Thread Kees Cook via Gcc-patches
On Wed, Jul 14, 2021 at 07:30:45PM +, Qing Zhao wrote:
> Hi, Kees,
> 
> 
> > On Jul 14, 2021, at 2:11 PM, Kees Cook  wrote:
> > 
> > On Wed, Jul 14, 2021 at 02:09:50PM +, Qing Zhao wrote:
> >> Hi, Richard,
> >> 
> >>> On Jul 14, 2021, at 2:14 AM, Richard Biener  
> >>> wrote:
> >>> 
> >>> On Wed, Jul 14, 2021 at 1:17 AM Qing Zhao  wrote:
>  
>  Hi, Kees,
>  
>  I took a look at the kernel testing case you attached in the previous 
>  email, and found the testing failed with the following case:
>  
>  #define INIT_STRUCT_static_all  = { .one = arg->one,\
>    .two = arg->two,\
>    .three = arg->three,\
>    .four = arg->four,  \
>    }
>  
>  i.e, when the structure type auto variable has been explicitly 
>  initialized in the source code.  -ftrivial-auto-var-init in the 4th 
>  version
>  does not initialize the paddings for such variables.
>  
>  But in the previous version of the patches ( 2 or 3), 
>  -ftrivial-auto-var-init initializes the paddings for such variables.
>  
>  I intended to remove this part of the code from the 4th version of the 
>  patch since the implementation for initializing such paddings is 
>  completely different from
>  the initializing of the whole structure as a whole with memset in this 
>  version of the implementation.
>  
>  If we really need this functionality, I will add another separate patch 
>  for this additional functionality, but not with this patch.
>  
>  Richard, what’s your comment and suggestions on this?
> >>> 
> >>> I think this can be addressed in the gimplifier by adjusting
> >>> gimplify_init_constructor to clear
> >>> the object before the initialization (if it's not done via aggregate
> >>> copying).  
> >> 
> >> I did this in the previous versions of the patch like the following:
> >> 
> >> @@ -5001,6 +5185,17 @@ gimplify_init_constructor (tree *expr_p, gimple_seq 
> >> *pre_p, gimple_seq *post_p,
> >>  /* If a single access to the target must be ensured and all elements
> >> are zero, then it's optimal to clear whatever their number.  */
> >>  cleared = true;
> >> +  else if (flag_trivial_auto_var_init > AUTO_INIT_UNINITIALIZED
> >> +   && !TREE_STATIC (object)
> >> +   && type_has_padding (type))
> >> +/* If the user requests to initialize automatic variables with
> >> +   paddings inside the type, we should initialize the paddings too.
> >> +   C guarantees that brace-init with fewer initializers than members
> >> +   aggregate will initialize the rest of the aggregate as-if it were
> >> +   static initialization.  In turn static initialization guarantees
> >> +   that pad is initialized to zero bits.
> >> +   So, it's better to clear the whole record under such situation.  */
> >> +cleared = true;
> >>else
> >>  cleared = false;
> >> 
> >> Then the paddings are also initialized to zeroes with this option. (Even 
> >> for -ftrivial-auto-var-init=pattern).
> > 
> > Thanks! I've tested with the attached patch to v4 and it passes all my
> > tests again.
> > 
> >> Is the above change Okay? (With this change, when 
> >> -ftrivial-auto-var-init=pattern, the paddings for the
> >> structure variables that have explicit initializer will be ZEROed, not 
> >> 0xFE)
> > 
> > Padding zeroing in the face of pattern-init is correct (and matches what
> > Clang does).
> 
> During the discussion before the 4th version of the patch, we have agreed 
> that pattern-init will use 0xFE byte-repeatable patterns 
> to initialize all the types (this includes the paddings when the structure 
> type variables are not explicitly initialized). And will not match
> Clang’s current behavior. 

Right, that's fine.

> If we initialize the paddings when the structure type variables are 
> explicitly initialized to Zeroes, then there will be inconsistency 
> between values that are used to initialize structure paddings under different 
> situations, This looks not good to me.
> 
> If we have agreed on using 0xFE byte-repeatable patterns for pattern-init, 
> then all the paddings should be initialized with the same 
> pattern. 

Ah! By "situation", you mean how the compiler chooses to initialize the
structure members?

It sounds like for =zero mode, padding will be 0, but for =pattern,
padding may be either 0x00 or 0xFE, depending on which kind of
initialization is internally chosen. Is that right? I'm fine with this
since the =zero case is what I'm primarily focused on being as safe
as possible.

-Kees

-- 
Kees Cook


Re: Repost #2: [PATCH] PR 100170: Fix eq/ne tests on power10.

2021-07-14 Thread Segher Boessenkool
On Wed, Jul 14, 2021 at 01:52:05PM -0400, Michael Meissner wrote:
> This patch updates eq/ne tests in the testsuite to adjust the test if
> power10 code generation is used.

eq0/ne0.

> --- a/gcc/testsuite/gcc.target/powerpc/ppc-eq0-1.c
> +++ b/gcc/testsuite/gcc.target/powerpc/ppc-eq0-1.c

> -/* { dg-final { scan-assembler "cntlzw|isel" } } */
> +/* { dg-final { scan-assembler {\mcntlzw|isel|setbc\M} } } */

This does not do wha you perhaps think it does.  It looks for one of the
three atoms
"\mcntlzw", "isel", or "setbc\M".  You should write
  \m(cntlzw|isel|setbc)\M
or, if you need it to not capture (like in a scan-assembler-times)
  \m(?:cntlzw|isel|setbc)\M

> --- a/gcc/testsuite/gcc.target/powerpc/ppc-ne0-1.c
> +++ b/gcc/testsuite/gcc.target/powerpc/ppc-ne0-1.c

> -/* { dg-final { scan-assembler-times "addic" 4 } } */
> -/* { dg-final { scan-assembler-times "subfe" 1 } } */
> -/* { dg-final { scan-assembler-times "addze" 3 } } */
> +/* { dg-final { scan-assembler-times {\maddic\M}  4 { target { ! 
> has_arch_pwr10 } } } } */
> +/* { dg-final { scan-assembler-times {\msubfe\M}  1 { target { ! 
> has_arch_pwr10 } } } } */
> +/* { dg-final { scan-assembler-times {\maddic\M}  3 { target {   
> has_arch_pwr10 } } } } */
> +/* { dg-final { scan-assembler-not   {\msubfe\M}{ target {   
> has_arch_pwr10 } } } } */
> +/* { dg-final { scan-assembler-times {\msetbcr\M} 1 { target {   
> has_arch_pwr10 } } } } */
> +/* { dg-final { scan-assembler-times {\maddze\M}  3 } } */

It may be easier to split the patch into two, where one part can get the
setbcr (the first, simplest function), and the rest stays the same.

Okay for trunk like that.  Thanks!


Segher


Re: rs6000: Generate an lxvp instead of two adjacent lxv instructions

2021-07-14 Thread Peter Bergner via Gcc-patches
On 7/13/21 5:59 PM, Segher Boessenkool wrote:
>> -  && adjacent_mem_locations (str_mem, str_mem2))
>> +  && adjacent_mem_locations (str_mem, str_mem2) != NULL_RTX)
> 
> ... so don't change this?  Or write != 0 != 0 != 0, if one time is good,
> three times must be better!  :-)

Ok, consider those two changes reverted and will leave the code as is.




> So we are sure we have a hard register here, and it is a VSX register.
> Okay.  Factoring this code would not hurt ;-)

Yes, we have asserts above that ensure we have regs and that they
are of the correct type (ie, FP or VSX register) depending on the
mode.

I'll make the change above and rebuild just to be safe and then commit.
Thanks.

Peter





Re: Repost #2: [PATCH] PR 100170: Fix eq/ne tests on power10.

2021-07-14 Thread Segher Boessenkool
On Wed, Jul 14, 2021 at 03:25:32PM -0500, Segher Boessenkool wrote:
> Please do not send the same patches in a new thread.  It is much more
> work to keep track of.  Just ping patches by replying to them (either
> copy the list or not, either works).  Thanks!

Oh, and do not edit the Subject:.  You managed to have the first 30
characters of it completely useless.  You should never use more than 50
characters total, you use 57 already, although this should be an
unusually *short* subject!  (And subjects are not sentences, do not end
in a full stop.)


Segher


Re: Repost #2: [PATCH] PR 100170: Fix eq/ne tests on power10.

2021-07-14 Thread Segher Boessenkool
Please do not send the same patches in a new thread.  It is much more
work to keep track of.  Just ping patches by replying to them (either
copy the list or not, either works).  Thanks!


Segher


Re: [RFC/PATCH] vect: Recog mul_highpart pattern

2021-07-14 Thread Segher Boessenkool
On Wed, Jul 14, 2021 at 12:32:24PM +0100, Richard Sandiford wrote:
> TBH, 79 vs. 80 isn't normally something I'd worry about when reviewing
> new code.  But I know in the past people have asked for 79 to be used
> for the “end+1” reason, so I don't think we should “fix” existing code
> that honours the 79 limit so that it no longer does, especially when the
> lines surrounding the code aren't changing.

The normal rule is you cannot go over 80.  It is perfectly fine to have
shorter lines, certainly if that is nice for some other reason, so
automatically (by some tool) changing this is Just Wrong.


Segher


Re: [patch][version 4]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-07-14 Thread Qing Zhao via Gcc-patches
Hi, Kees,


> On Jul 14, 2021, at 2:11 PM, Kees Cook  wrote:
> 
> On Wed, Jul 14, 2021 at 02:09:50PM +, Qing Zhao wrote:
>> Hi, Richard,
>> 
>>> On Jul 14, 2021, at 2:14 AM, Richard Biener  
>>> wrote:
>>> 
>>> On Wed, Jul 14, 2021 at 1:17 AM Qing Zhao  wrote:
 
 Hi, Kees,
 
 I took a look at the kernel testing case you attached in the previous 
 email, and found the testing failed with the following case:
 
 #define INIT_STRUCT_static_all  = { .one = arg->one,\
   .two = arg->two,\
   .three = arg->three,\
   .four = arg->four,  \
   }
 
 i.e, when the structure type auto variable has been explicitly initialized 
 in the source code.  -ftrivial-auto-var-init in the 4th version
 does not initialize the paddings for such variables.
 
 But in the previous version of the patches ( 2 or 3), 
 -ftrivial-auto-var-init initializes the paddings for such variables.
 
 I intended to remove this part of the code from the 4th version of the 
 patch since the implementation for initializing such paddings is 
 completely different from
 the initializing of the whole structure as a whole with memset in this 
 version of the implementation.
 
 If we really need this functionality, I will add another separate patch 
 for this additional functionality, but not with this patch.
 
 Richard, what’s your comment and suggestions on this?
>>> 
>>> I think this can be addressed in the gimplifier by adjusting
>>> gimplify_init_constructor to clear
>>> the object before the initialization (if it's not done via aggregate
>>> copying).  
>> 
>> I did this in the previous versions of the patch like the following:
>> 
>> @@ -5001,6 +5185,17 @@ gimplify_init_constructor (tree *expr_p, gimple_seq 
>> *pre_p, gimple_seq *post_p,
>>/* If a single access to the target must be ensured and all elements
>>   are zero, then it's optimal to clear whatever their number.  */
>>cleared = true;
>> +else if (flag_trivial_auto_var_init > AUTO_INIT_UNINITIALIZED
>> + && !TREE_STATIC (object)
>> + && type_has_padding (type))
>> +  /* If the user requests to initialize automatic variables with
>> + paddings inside the type, we should initialize the paddings too.
>> + C guarantees that brace-init with fewer initializers than members
>> + aggregate will initialize the rest of the aggregate as-if it were
>> + static initialization.  In turn static initialization guarantees
>> + that pad is initialized to zero bits.
>> + So, it's better to clear the whole record under such situation.  */
>> +  cleared = true;
>>  else
>>cleared = false;
>> 
>> Then the paddings are also initialized to zeroes with this option. (Even for 
>> -ftrivial-auto-var-init=pattern).
> 
> Thanks! I've tested with the attached patch to v4 and it passes all my
> tests again.
> 
>> Is the above change Okay? (With this change, when 
>> -ftrivial-auto-var-init=pattern, the paddings for the
>> structure variables that have explicit initializer will be ZEROed, not 0xFE)
> 
> Padding zeroing in the face of pattern-init is correct (and matches what
> Clang does).

During the discussion before the 4th version of the patch, we have agreed that 
pattern-init will use 0xFE byte-repeatable patterns 
to initialize all the types (this includes the paddings when the structure type 
variables are not explicitly initialized). And will not match
Clang’s current behavior. 

If we initialize the paddings when the structure type variables are explicitly 
initialized to Zeroes, then there will be inconsistency 
between values that are used to initialize structure paddings under different 
situations, This looks not good to me.

If we have agreed on using 0xFE byte-repeatable patterns for pattern-init, then 
all the paddings should be initialized with the same 
pattern. 

This is the major reason I deleted the change in “gimplify_init_constructor” in 
the 4th version.  And considered a different implementation
for padding initializations with explicitly initialized structure variables. 

Qing

> 
> -Kees
> 
> -- 
> Kees Cook
> 



Re: [patch][version 4]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-07-14 Thread Kees Cook via Gcc-patches
On Wed, Jul 14, 2021 at 02:09:50PM +, Qing Zhao wrote:
> Hi, Richard,
> 
> > On Jul 14, 2021, at 2:14 AM, Richard Biener  
> > wrote:
> > 
> > On Wed, Jul 14, 2021 at 1:17 AM Qing Zhao  wrote:
> >> 
> >> Hi, Kees,
> >> 
> >> I took a look at the kernel testing case you attached in the previous 
> >> email, and found the testing failed with the following case:
> >> 
> >> #define INIT_STRUCT_static_all  = { .one = arg->one,\
> >>.two = arg->two,\
> >>.three = arg->three,\
> >>.four = arg->four,  \
> >>}
> >> 
> >> i.e, when the structure type auto variable has been explicitly initialized 
> >> in the source code.  -ftrivial-auto-var-init in the 4th version
> >> does not initialize the paddings for such variables.
> >> 
> >> But in the previous version of the patches ( 2 or 3), 
> >> -ftrivial-auto-var-init initializes the paddings for such variables.
> >> 
> >> I intended to remove this part of the code from the 4th version of the 
> >> patch since the implementation for initializing such paddings is 
> >> completely different from
> >> the initializing of the whole structure as a whole with memset in this 
> >> version of the implementation.
> >> 
> >> If we really need this functionality, I will add another separate patch 
> >> for this additional functionality, but not with this patch.
> >> 
> >> Richard, what’s your comment and suggestions on this?
> > 
> > I think this can be addressed in the gimplifier by adjusting
> > gimplify_init_constructor to clear
> > the object before the initialization (if it's not done via aggregate
> > copying).  
> 
> I did this in the previous versions of the patch like the following:
> 
> @@ -5001,6 +5185,17 @@ gimplify_init_constructor (tree *expr_p, gimple_seq 
> *pre_p, gimple_seq *post_p,
> /* If a single access to the target must be ensured and all elements
>are zero, then it's optimal to clear whatever their number.  */
> cleared = true;
> + else if (flag_trivial_auto_var_init > AUTO_INIT_UNINITIALIZED
> +  && !TREE_STATIC (object)
> +  && type_has_padding (type))
> +   /* If the user requests to initialize automatic variables with
> +  paddings inside the type, we should initialize the paddings too.
> +  C guarantees that brace-init with fewer initializers than members
> +  aggregate will initialize the rest of the aggregate as-if it were
> +  static initialization.  In turn static initialization guarantees
> +  that pad is initialized to zero bits.
> +  So, it's better to clear the whole record under such situation.  */
> +   cleared = true;
>   else
> cleared = false;
> 
> Then the paddings are also initialized to zeroes with this option. (Even for 
> -ftrivial-auto-var-init=pattern).

Thanks! I've tested with the attached patch to v4 and it passes all my
tests again.

> Is the above change Okay? (With this change, when 
> -ftrivial-auto-var-init=pattern, the paddings for the
> structure variables that have explicit initializer will be ZEROed, not 0xFE)

Padding zeroing in the face of pattern-init is correct (and matches what
Clang does).

-Kees

-- 
Kees Cook
commit 8c52b69540b064e930e4d9e2e3dc011ca002306d
Author: Kees Cook 
AuthorDate: Wed Jul 14 11:17:27 2021 -0700
Commit: Kees Cook 
CommitDate: Wed Jul 14 12:08:56 2021 -0700

Fix padding

Based on v2 and bddde2d2-8594-4bf6-8142-cd09b0ebb...@oracle.com

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 4db53cda77f8..dd3da86d6663 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -5071,6 +5071,18 @@ gimplify_init_constructor (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p,
 	  /* If a single access to the target must be ensured and all elements
 	 are zero, then it's optimal to clear whatever their number.  */
 	  cleared = true;
+	else if (opt_for_fn (current_function_decl, flag_auto_var_init)
+		   > AUTO_INIT_UNINITIALIZED
+		 && !TREE_STATIC (object)
+		 && type_has_padding (type))
+	  /* If the user requests to initialize automatic variables with
+	 paddings inside the type, we should initialize the paddings too.
+	 C guarantees that brace-init with fewer initializers than members
+	 aggregate will initialize the rest of the aggregate as-if it were
+	 static initialization.  In turn static initialization guarantees
+	 that pad is initialized to zero bits.
+	 So, it's better to clear the whole record under such situation.  */
+	  cleared = true;
 	else
 	  cleared = false;
 
diff --git a/gcc/tree.c b/gcc/tree.c
index 1aa6e557a049..7889c10d639f 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -10818,6 +10818,72 @@ lower_bound_in_type (tree outer, tree inner)
 }
 }
 
+/* Returns true when the given TYPE has padding 

[pushed] vec: use auto_vec in a few more places

2021-07-14 Thread Jason Merrill via Gcc-patches
The uses of vec in get_all_loop_exits and process_conditional were memory
leaks, as .release() was never called for them.  The other changes are some
cases that did have proper release handling, but it's simpler to leave
releasing to the auto_vec destructor.

Tested x86_64-pc-linux-gnu, applying to trunk.

gcc/ChangeLog:

* sel-sched-ir.h (get_all_loop_exits): Use auto_vec.

gcc/cp/ChangeLog:

* class.c (struct find_final_overrider_data): Use auto_vec.
(find_final_overrider): Remove explicit release.
* coroutines.cc (process_conditional): Use auto_vec.
* cp-gimplify.c (struct cp_genericize_data): Use auto_vec.
(cp_genericize_tree): Remove explicit release.
* parser.c (cp_parser_objc_at_property_declaration): Use
auto_delete_vec.
* semantics.c (omp_reduction_lookup): Use auto_vec.
---
 gcc/sel-sched-ir.h   | 2 +-
 gcc/cp/class.c   | 4 +---
 gcc/cp/coroutines.cc | 2 +-
 gcc/cp/cp-gimplify.c | 3 +--
 gcc/cp/parser.c  | 6 +-
 gcc/cp/semantics.c   | 3 +--
 6 files changed, 6 insertions(+), 14 deletions(-)

diff --git a/gcc/sel-sched-ir.h b/gcc/sel-sched-ir.h
index 78b2566ad3e..8ee0529d5a8 100644
--- a/gcc/sel-sched-ir.h
+++ b/gcc/sel-sched-ir.h
@@ -1166,7 +1166,7 @@ get_all_loop_exits (basic_block bb)
 || (inner_loop_header_p (e->dest)))
&& loop_depth (e->dest->loop_father) >= this_depth)
  {
-   vec next_exits = get_all_loop_exits (e->dest);
+   auto_vec next_exits = get_all_loop_exits (e->dest);
 
if (next_exits.exists ())
  {
diff --git a/gcc/cp/class.c b/gcc/cp/class.c
index 33093e1e1ef..14db06692dc 100644
--- a/gcc/cp/class.c
+++ b/gcc/cp/class.c
@@ -2391,7 +2391,7 @@ struct find_final_overrider_data {
   /* The candidate overriders.  */
   tree candidates;
   /* Path to most derived.  */
-  vec path;
+  auto_vec path;
 };
 
 /* Add the overrider along the current path to FFOD->CANDIDATES.
@@ -2504,8 +2504,6 @@ find_final_overrider (tree derived, tree binfo, tree fn)
   dfs_walk_all (derived, dfs_find_final_overrider_pre,
dfs_find_final_overrider_post, &ffod);
 
-  ffod.path.release ();
-
   /* If there was no winner, issue an error message.  */
   if (!ffod.candidates || TREE_CHAIN (ffod.candidates))
 return error_mark_node;
diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
index 54ffdc8d062..712a5c0ab37 100644
--- a/gcc/cp/coroutines.cc
+++ b/gcc/cp/coroutines.cc
@@ -3081,7 +3081,7 @@ process_conditional (var_nest_node *n, tree& vlist)
 {
   tree init = n->init;
   hash_map var_flags;
-  vec var_list = vNULL;
+  auto_vec var_list;
   tree new_then = push_stmt_list ();
   handle_nested_conditionals (n->then_cl, var_list, var_flags);
   new_then = pop_stmt_list (new_then);
diff --git a/gcc/cp/cp-gimplify.c b/gcc/cp/cp-gimplify.c
index 00b7772fe0d..de37f2cdfdc 100644
--- a/gcc/cp/cp-gimplify.c
+++ b/gcc/cp/cp-gimplify.c
@@ -807,7 +807,7 @@ omp_cxx_notice_variable (struct cp_genericize_omp_taskreg 
*omp_ctx, tree decl)
 struct cp_genericize_data
 {
   hash_set *p_set;
-  vec bind_expr_stack;
+  auto_vec bind_expr_stack;
   struct cp_genericize_omp_taskreg *omp_ctx;
   tree try_block;
   bool no_sanitize_p;
@@ -1582,7 +1582,6 @@ cp_genericize_tree (tree* t_p, bool 
handle_invisiref_parm_p)
   wtd.handle_invisiref_parm_p = handle_invisiref_parm_p;
   cp_walk_tree (t_p, cp_genericize_r, &wtd, NULL);
   delete wtd.p_set;
-  wtd.bind_expr_stack.release ();
   if (sanitize_flags_p (SANITIZE_VPTR))
 cp_ubsan_instrument_member_accesses (t_p);
 }
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 93698aa14c9..821ce1771a4 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -35247,7 +35247,7 @@ cp_parser_objc_at_property_declaration (cp_parser 
*parser)
   /* Parse the optional attribute list.
 
  A list of parsed, but not verified, attributes.  */
-  vec prop_attr_list = vNULL;
+  auto_delete_vec prop_attr_list;
   location_t loc = cp_lexer_peek_token (parser->lexer)->location;
 
   cp_lexer_consume_token (parser->lexer);  /* Eat '@property'.  */
@@ -35423,10 +35423,6 @@ cp_parser_objc_at_property_declaration (cp_parser 
*parser)
 }
 
   cp_parser_consume_semicolon_at_end_of_statement (parser);
-
-  while (!prop_attr_list.is_empty())
-delete prop_attr_list.pop ();
-  prop_attr_list.release ();
 }
 
 /* Parse an Objective-C++ @synthesize declaration.  The syntax is:
diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index b080259083e..b97dc1f6624 100644
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -5774,7 +5774,7 @@ omp_reduction_lookup (location_t loc, tree id, tree type, 
tree *baselinkp,
 
   if (!id && CLASS_TYPE_P (type) && TYPE_BINFO (type))
 {
-  vec ambiguous = vNULL;
+  auto_vec ambiguous;
   tree binfo = TYPE_BINFO (type), base_binfo, ret = NULL_TREE;
   unsigned int ix;
   if (ambiguousp == NULL)
@@ -5811,7 +5811,6 @@ omp_reduction_lookup (location_t loc, tree id, tree type, 

[pushed] c++: enable -fdelete-dead-exceptions by default

2021-07-14 Thread Jason Merrill via Gcc-patches
As I was discussing with richi, I don't think it makes sense to protect
calls to pure/const functions from DCE just because they aren't explicitly
declared noexcept.  PR100382 indicates that there are different
considerations for Go, which has non-call exceptions.  But still turn the
flag off for that specific testcase.

Tested x86_64-pc-linux-gnu, applying to trunk.

gcc/c-family/ChangeLog:

* c-opts.c (c_common_post_options): Set -fdelete-dead-exceptions.
---
 gcc/doc/invoke.texi | 6 --
 gcc/c-family/c-opts.c   | 4 
 gcc/testsuite/g++.dg/torture/pr100382.C | 1 +
 3 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index e67d47af676..ea8812425e9 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -16335,8 +16335,10 @@ arbitrary signal handlers such as @code{SIGALRM}.
 @opindex fdelete-dead-exceptions
 Consider that instructions that may throw exceptions but don't otherwise
 contribute to the execution of the program can be optimized away.
-This option is enabled by default for the Ada compiler, as permitted by
-the Ada language specification.
+This does not affect calls to functions except those with the
+@code{pure} or @code{const} attributes.
+This option is enabled by default for the Ada and C++ compilers, as permitted 
by
+the language specifications.
 Optimization passes that cause dead exceptions to be removed are enabled 
independently at different optimization levels.
 
 @item -funwind-tables
diff --git a/gcc/c-family/c-opts.c b/gcc/c-family/c-opts.c
index 60b5802722c..1212edd1b28 100644
--- a/gcc/c-family/c-opts.c
+++ b/gcc/c-family/c-opts.c
@@ -1015,6 +1015,10 @@ c_common_post_options (const char **pfilename)
   SET_OPTION_IF_UNSET (&global_options, &global_options_set, flag_finite_loops,
   optimize >= 2 && cxx_dialect >= cxx11);
 
+  /* It's OK to discard calls to pure/const functions that throw.  */
+  SET_OPTION_IF_UNSET (&global_options, &global_options_set,
+  flag_delete_dead_exceptions, true);
+
   if (cxx_dialect >= cxx11)
 {
   /* If we're allowing C++0x constructs, don't warn about C++98
diff --git a/gcc/testsuite/g++.dg/torture/pr100382.C 
b/gcc/testsuite/g++.dg/torture/pr100382.C
index ffc4182cfce..eac5743b956 100644
--- a/gcc/testsuite/g++.dg/torture/pr100382.C
+++ b/gcc/testsuite/g++.dg/torture/pr100382.C
@@ -1,4 +1,5 @@
 // { dg-do run }
+// { dg-additional-options -fno-delete-dead-exceptions }
 
 int x, y;
 int __attribute__((pure,noinline)) foo () { if (x) throw 1; return y; }

base-commit: 6d1cdb27828d2ef1ae1ab0209836646a269b9610
-- 
2.27.0



Re: [PATCH] handle vector and aggregate stores in -Wstringop-overflow [PR 97027]

2021-07-14 Thread Martin Sebor via Gcc-patches

On 7/14/21 1:01 AM, Richard Biener wrote:

On Tue, Jul 13, 2021 at 9:27 PM Martin Sebor via Gcc-patches
 wrote:


An existing, previously xfailed test that I recently removed
the xfail from made me realize that -Wstringop-overflow doesn't
properly detect buffer overflow resulting from vectorized stores.
Because of a difference in the IL the test passes on x86_64 but
fails on targets like aarch64.  Other examples can be constructed
that -Wstringop-overflow fails to diagnose even on x86_64.  For
INSTANCE, the overflow in the following function isn't diagnosed
when the loop is vectorized:

void* f (void)
{
  char *p = __builtin_malloc (8);
  for (int i = 0; i != 16; ++i)
p[i] = 1 << i;
  return p;
}

The attached change enhances the warning to detect those as well.
It found a few bugs in vectorizer tests that the patch corrects.
Tested on x86_64-linux and with an aarch64 cross.


-  dest = gimple_call_arg (stmt, 0);
+  if (gimple_call_builtin_p (stmt, BUILT_IN_NORMAL)
+ && gimple_call_num_args (stmt))
+   dest = gimple_call_arg (stmt, 0);
+  else
+   dest = gimple_call_lhs (stmt);
+
+  if (!dest)
+   return;

so this uses arg0 for memcpy (dst, src, 4) and also for bcopy (src, dst, 4)?


No.  The code is only called for assignments like *p = f () and for
a handful of built-ins (memcpy, strcpy, and memset).

bcopy() returns void and so its result cannot be assigned.  I believe
bcopy() and the other legacy bxxx() functions are also lowered into
memcpy/memmove etc. so we should see no calls to it in the middle end.
In any case, I have adjusted the function as described below to avoid
even this hypothetical issue.


It looks quite fragile to me.  I think you want to use the LHS only if it is
aggregate (and not a pointer or some random other value).  Likewise
you should only use arg0 for a whitelist of builtins, not for any random one.


I've added an argument to the function to make the distinction
between a call result and argument explicit but I haven't been able
to create a test case to exercise it.  For all the built-ins I've
tried in an assignment like:

  extern char a[4];
  *(double*)a = nan ("foo");

the call result ends up assigned to a temporary:

  _1 = __builtin_nan (s_2(D));
  MEM[(double *)&a] = _1;

I can only get a call and assignment in one for user-defined functions
that return an aggregate.



It's bad enough that compute_objsize decides for itself whether it is
passed a pointer or an object rather than the API being explicit about this.

if (VAR_P (exp) || TREE_CODE (exp) == CONST_DECL)
  {
-  exp = ctor_for_folding (exp);
-  if (!exp)
-   return false;
+  /* If EXP can be folded into a constant use the result.  Otherwise
+proceed to use EXP to determine a range of the result.  */
+  if (tree fold_exp = ctor_for_folding (exp))

 ^

+   if (fold_exp != error_mark_node)
+ exp = fold_exp;

fold_exp can be NULL, meaning a zero-initializer but below you'll run into


fold_exp is assigned to exp if it's neither null (as I underlined
above) nor error_mark_node so I think it's correct as is.



   const char *prep = NULL;
   if (TREE_CODE (exp) == STRING_CST)
 {

and crash.  Either you handle a NULL fold_expr explicitely or conservatively
continue to return false.

+  /* The LHS and RHS of the store.  The RHS is null if STMT is a function
+ call.  RHSTYPE is the type of the store.  */
+  tree lhs, rhs, rhstype;
+  if (is_gimple_assign (stmt))
+{
+  lhs = gimple_assign_lhs (stmt);
+  rhs = gimple_assign_rhs1 (stmt);
+  rhstype = TREE_TYPE (rhs);
+}
+  else if (is_gimple_call (stmt))
+{
+  lhs = gimple_call_lhs (stmt);
+  rhs = NULL_TREE;
+  rhstype = TREE_TYPE (gimple_call_fntype (stmt));
+}

The type of the store in a call is better determined from the LHS.
For internal function calls the above will crash.

Otherwise looks like reasonable changes.


Please see the attached revision.

Martin
Detect buffer overflow by aggregate and vector stores [PR97027].

Resolves:
PR middle-end/97027 - missing warning on buffer overflow storing a larger scalar into a smaller array

gcc/ChangeLog:

	PR middle-end/97027
	* tree-ssa-strlen.c (handle_assign): New function.
	(maybe_warn_overflow): Add argument.
	(nonzero_bytes_for_type): New function.
	(count_nonzero_bytes): Handle more tree types.  Call
	nonzero_bytes_for_tye.
	(count_nonzero_bytes): Handle types.
	(handle_store): Handle stores from function calls.
	(strlen_check_and_optimize_call): Move code to handle_assign.  Call
	it for assignments from function calls.

gcc/testsuite/ChangeLog:

	PR middle-end/97027
	* gcc.dg/Wstringop-overflow-15.c: Remove an xfail.
	* gcc.dg/Wstringop-overflow-47.c: Adjust xfails.
	* gcc.dg/torture/pr69170.c: Avoid valid warnings.
	* gcc.dg/torture/pr70025.c: Prune out a false positive.
	* gcc.dg/vect/pr97769.c: Initialize a loop control variable.

Re: [PATCH] rs6000: Support [u]mul3_highpart for vector

2021-07-14 Thread Segher Boessenkool
On Wed, Jul 14, 2021 at 10:12:46AM +0800, Kewen.Lin wrote:
> on 2021/7/14 上午6:07, Segher Boessenkool wrote:
> > Hi!
> > 
> > On Tue, Jul 13, 2021 at 04:58:42PM +0800, Kewen.Lin wrote:
> >> This patch is to make Power10 newly introduced vector
> >> multiply high (part) instructions exploited in vectorized
> >> loops, it renames existing define_insns as standard pattern
> >> names.  It depends on that patch which enables vectorizer
> >> to recog mul_highpart.
> > 
> > It actually is correct already, it will just not be used yet, right?
> 
> Yes, the names are just not standard.  :)

I meant after this patch is applied :-)

Doesn't change much though -- applying it right now is fine, but you can
wait for the generic code to get in first, to make the new tests not
fail.


Segher


Re: [PATCH] c++: CTAD and forwarding references [PR88252]

2021-07-14 Thread Jason Merrill via Gcc-patches

On 7/14/21 1:52 PM, Patrick Palka wrote:

On Wed, 14 Jul 2021, Jason Merrill wrote:


On 7/14/21 11:26 AM, Patrick Palka wrote:

Here we're incorrectly treating T&& as a forwarding reference during
CTAD even though T is a template parameter of the class template.

This happens because the template parameter T in the out-of-line
definition of the constructor doesn't have the flag
TEMPLATE_TYPE_PARM_FOR_CLASS set, and during duplicate_decls the
the redeclaration (which is in terms of this unflagged T) prevails.
To fix this, we could perhaps be more consistent about setting the flag,
but it appears we don't really need the flag to make the determination.

Since the template parameters of an artificial guide consist of the
template parameters of the class template followed by those of the
constructor (if any), it should suffice to look at the index of the
template parameter to determine whether T&& is a forwarding reference or
not.  This patch replaces the TEMPLATE_TYPE_PARM_FOR_CLASS flag with
this approach.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

PR c++/88252

gcc/cp/ChangeLog:

* cp-tree.h (TEMPLATE_TYPE_PARM_FOR_CLASS): Remove.
* pt.c (push_template_decl): Remove TEMPLATE_TYPE_PARM_FOR_CLASS
handling.
(redeclare_class_template): Likewise.
(parm_can_form_fwding_ref_p): Define.
(maybe_adjust_types_for_deduction): Use it instead of
TEMPLATE_TYPE_PARM_FOR_CLASS.  Add tparms parameter.
(unify_one_argument): Pass tparms to
maybe_adjust_types_for_deduction.
(try_one_overload): Likewise.
(unify): Likewise.
(rewrite_template_parm): Remove TEMPLATE_TYPE_PARM_FOR_CLASS
handling.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/class-deduction96.C: New test.
---
   gcc/cp/cp-tree.h  |  6 --
   gcc/cp/pt.c   | 67 ---
   .../g++.dg/cpp1z/class-deduction96.C  | 34 ++
   3 files changed, 75 insertions(+), 32 deletions(-)
   create mode 100644 gcc/testsuite/g++.dg/cpp1z/class-deduction96.C

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index b1cf44ecdb8..f4bcab5b18d 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -443,7 +443,6 @@ extern GTY(()) tree cp_global_trees[CPTI_MAX];
 BLOCK_OUTER_CURLY_BRACE_P (in BLOCK)
 FOLD_EXPR_MODOP_P (*_FOLD_EXPR)
 IF_STMT_CONSTEXPR_P (IF_STMT)
-  TEMPLATE_TYPE_PARM_FOR_CLASS (TEMPLATE_TYPE_PARM)
 DECL_NAMESPACE_INLINE_P (in NAMESPACE_DECL)
 SWITCH_STMT_ALL_CASES_P (in SWITCH_STMT)
 REINTERPRET_CAST_P (in NOP_EXPR)
@@ -5863,11 +5862,6 @@ enum auto_deduction_context
 adc_decomp_type/* Decomposition declaration initializer deduction */
   };
   -/* True if this type-parameter belongs to a class template, used by C++17
-   class template argument deduction.  */
-#define TEMPLATE_TYPE_PARM_FOR_CLASS(NODE) \
-  (TREE_LANG_FLAG_0 (TEMPLATE_TYPE_PARM_CHECK (NODE)))
-
   /* True iff this TEMPLATE_TYPE_PARM represents decltype(auto).  */
   #define AUTO_IS_DECLTYPE(NODE) \
 (TYPE_LANG_FLAG_5 (TEMPLATE_TYPE_PARM_CHECK (NODE)))
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index cf0ce770d52..01ef2984f23 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -154,8 +154,8 @@ static void tsubst_enum (tree, tree, tree);
   static bool check_instantiated_args (tree, tree, tsubst_flags_t);
   static int check_non_deducible_conversion (tree, tree, int, int,
   struct conversion **, bool);
-static int maybe_adjust_types_for_deduction (unification_kind_t, tree*,
tree*,
-tree);
+static int maybe_adjust_types_for_deduction (tree, unification_kind_t,
+tree*, tree*, tree);
   static int type_unification_real (tree, tree, tree, const tree *,
  unsigned int, int, unification_kind_t,
  vec **,
@@ -5801,18 +5801,7 @@ push_template_decl (tree decl, bool is_friend)
}
 else if (DECL_IMPLICIT_TYPEDEF_P (decl)
   && CLASS_TYPE_P (TREE_TYPE (decl)))
-   {
- /* Class template, set TEMPLATE_TYPE_PARM_FOR_CLASS.  */
- tree parms = INNERMOST_TEMPLATE_PARMS (current_template_parms);
- for (int i = 0; i < TREE_VEC_LENGTH (parms); ++i)
-   {
- tree t = TREE_VALUE (TREE_VEC_ELT (parms, i));
- if (TREE_CODE (t) == TYPE_DECL)
-   t = TREE_TYPE (t);
- if (TREE_CODE (t) == TEMPLATE_TYPE_PARM)
-   TEMPLATE_TYPE_PARM_FOR_CLASS (t) = true;
-   }
-   }
+   /* Class template.  */;
 else if (TREE_CODE (decl) == TYPE_DECL
   && TYPE_DECL_ALIAS_P (decl))
/* alias-declaration */
@@ -6292,9 +6281,6 @@ redeclare_class_template (tree type, tree parms, tree
cons)
  gcc_assert (DECL_

Re: [PATCH V2] Use preferred mode for doloop iv [PR61837].

2021-07-14 Thread Segher Boessenkool
Hi!

On Wed, Jul 14, 2021 at 06:26:28PM +0800, guojiufu wrote:
>   PR target/61837

Wrong PR number?

> +@deftypefn {Target Hook} machine_mode TARGET_PREFERRED_DOLOOP_MODE 
> (machine_mode @var{mode})
> +This hook takes a @var{mode} which is the original mode of doloop IV.
> +And if the target prefers other mode for doloop IV, this hook returns 
> the
> +preferred mode.
> +For example, on 64bit target, DImode may be preferred than SImode.
> +This hook could return the original mode itself if the target prefer to
> +keep the original mode.
> +The origianl mode and return mode should be MODE_INT.
> +@end deftypefn

(Typo, "original").  That has all the right contents, but needs someone
who is better at English than me to look at it / improve it.

> +/* { dg-final {scan-rtl-dump-not "zero_extend.*doloop" "loop2_doloop"} 
> } */
> +/* { dg-final {scan-rtl-dump-not "reg:SI.*doloop" "loop2_doloop" { 
> target lp64 } } } */

(Don't use format=flowed in your mails, or certainly not in those
containing patches -- it was rewrapped).

If you use .* in scan REs, you should be aware that "." matches newlines
by default, so you can match "reg:SI" on one line and "doloop" on a
later one, in that second one.

You can write

/* { dg-final {scan-rtl-dump-not {(?p)reg:SI.*doloop} "loop2_doloop" { target 
lp64 } } } */

(note: {} are much more convenient around most REs, you need a lot of
escaping without it) to get "partial newline-sensitive matching", which
is usually what you want (see "man re_syntax" for the details).


The generic changes look fine to me (but what do I know about Gimple!)
The rs6000 changes are fine if the rest is approved (and see the
testcase comments).  Thanks!


Segher


[PATCH][committed] :wqmiddle-end Vect: correct rebase issue

2021-07-14 Thread Tamar Christina via Gcc-patches
Hi All,

The lines being removed have been updated and merged into a new
condition.  But when resolving some conflicts I accidentally
reintroduced them causing some test failes.

This removes them.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Committed as the changes were previously approved in
https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574977.html
but the hunk was misapplied during a rebase.

Thanks,
Tamar

gcc/ChangeLog:

* tree-vect-patterns.c (vect_recog_dot_prod_pattern):
Remove erroneous line.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-reduc-dot-11.c: Expect pass.
* gcc.dg/vect/vect-reduc-dot-15.c: Likewise.
* gcc.dg/vect/vect-reduc-dot-19.c: Likewise.
* gcc.dg/vect/vect-reduc-dot-21.c: Likewise.

--- inline copy of patch -- 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-11.c 
b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-11.c
index 
5e3cfc925105f576adf332df517727c857a4de0f..0f7cbbb87ef028f166366aea55bc4ef49d2f8e9b
 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-11.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-11.c
@@ -9,5 +9,5 @@
 
 #include "vect-reduc-dot-9.c"
 
-/* { dg-final { scan-tree-dump-not "vect_recog_dot_prod_pattern: detected" 
"vect" } } */
+/* { dg-final { scan-tree-dump "vect_recog_dot_prod_pattern: detected" "vect" 
} } */
 /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" { target 
vect_usdot_qi } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-15.c 
b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-15.c
index 
5a6fd1969ce9403cbefb5b55c71bdd40894fc931..dc48f95a32bf76c54a906ee81ddee99b16aea84a
 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-15.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-15.c
@@ -9,5 +9,5 @@
 
 #include "vect-reduc-dot-9.c"
 
-/* { dg-final { scan-tree-dump-not "vect_recog_dot_prod_pattern: detected" 
"vect" } } */
+/* { dg-final { scan-tree-dump "vect_recog_dot_prod_pattern: detected" "vect" 
} } */
 /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" { target 
vect_usdot_qi } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-19.c 
b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-19.c
index 
962b24ec2047b99ebedc116f68e460c6ab6fc1e7..dbeaaec24a1095b7730d9e1262f5a951fd2312fc
 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-19.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-19.c
@@ -49,4 +49,4 @@ main (void)
 __builtin_abort ();
 }
 
-/* { dg-final { scan-tree-dump-not "vect_recog_dot_prod_pattern: detected" 
"vect" } } */
+/* { dg-final { scan-tree-dump "vect_recog_dot_prod_pattern: detected" "vect" 
} } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-21.c 
b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-21.c
index 
b5754bf7dde29b5944bbc72919ea03f65b9bd7ad..6d08bf4478be83de86b0975524687a75d025123e
 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-21.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-21.c
@@ -49,4 +49,4 @@ main (void)
 __builtin_abort ();
 }
 
-/* { dg-final { scan-tree-dump-not "vect_recog_dot_prod_pattern: detected" 
"vect" } } */
+/* { dg-final { scan-tree-dump "vect_recog_dot_prod_pattern: detected" "vect" 
} } */
diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
index 
71533e61c934c63dd05a33c8f7159185e9b11a1b..53ced5d08fbf52094eb375d5d4cde179fd741a17
 100644
--- a/gcc/tree-vect-patterns.c
+++ b/gcc/tree-vect-patterns.c
@@ -1039,12 +1039,6 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
: TYPE_SIGN (unprom_mult.type) != TYPE_SIGN (half_type)))
 return NULL;
 
-  /* If there are two widening operations, make sure they agree on
- the sign of the extension.  */
-  if (TYPE_PRECISION (unprom_mult.type) != TYPE_PRECISION (type)
-  && TYPE_SIGN (unprom_mult.type) != TYPE_SIGN (half_type))
-return NULL;
-
   vect_pattern_detected ("vect_recog_dot_prod_pattern", last_stmt);
 
   tree half_vectype;


-- 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-11.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-11.c
index 5e3cfc925105f576adf332df517727c857a4de0f..0f7cbbb87ef028f166366aea55bc4ef49d2f8e9b 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-11.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-11.c
@@ -9,5 +9,5 @@
 
 #include "vect-reduc-dot-9.c"
 
-/* { dg-final { scan-tree-dump-not "vect_recog_dot_prod_pattern: detected" "vect" } } */
+/* { dg-final { scan-tree-dump "vect_recog_dot_prod_pattern: detected" "vect" } } */
 /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" { target vect_usdot_qi } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-15.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-15.c
index 5a6fd1969ce9403cbefb5b55c71bdd40894fc931..dc48f95a32bf76c54a906ee81ddee99b16aea84a 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-15.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-15.c
@@ -9,5 +9,5 @@
 
 #include "vect-reduc-dot-9.c"
 
-/* { dg-final { scan-tree-dump-not "vect_recog_dot_prod_pattern: detected" 

Re: [PATCH] c++: CTAD and forwarding references [PR88252]

2021-07-14 Thread Patrick Palka via Gcc-patches
On Wed, 14 Jul 2021, Jason Merrill wrote:

> On 7/14/21 11:26 AM, Patrick Palka wrote:
> > Here we're incorrectly treating T&& as a forwarding reference during
> > CTAD even though T is a template parameter of the class template.
> > 
> > This happens because the template parameter T in the out-of-line
> > definition of the constructor doesn't have the flag
> > TEMPLATE_TYPE_PARM_FOR_CLASS set, and during duplicate_decls the
> > the redeclaration (which is in terms of this unflagged T) prevails.
> > To fix this, we could perhaps be more consistent about setting the flag,
> > but it appears we don't really need the flag to make the determination.
> > 
> > Since the template parameters of an artificial guide consist of the
> > template parameters of the class template followed by those of the
> > constructor (if any), it should suffice to look at the index of the
> > template parameter to determine whether T&& is a forwarding reference or
> > not.  This patch replaces the TEMPLATE_TYPE_PARM_FOR_CLASS flag with
> > this approach.
> > 
> > Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
> > trunk?
> > 
> > PR c++/88252
> > 
> > gcc/cp/ChangeLog:
> > 
> > * cp-tree.h (TEMPLATE_TYPE_PARM_FOR_CLASS): Remove.
> > * pt.c (push_template_decl): Remove TEMPLATE_TYPE_PARM_FOR_CLASS
> > handling.
> > (redeclare_class_template): Likewise.
> > (parm_can_form_fwding_ref_p): Define.
> > (maybe_adjust_types_for_deduction): Use it instead of
> > TEMPLATE_TYPE_PARM_FOR_CLASS.  Add tparms parameter.
> > (unify_one_argument): Pass tparms to
> > maybe_adjust_types_for_deduction.
> > (try_one_overload): Likewise.
> > (unify): Likewise.
> > (rewrite_template_parm): Remove TEMPLATE_TYPE_PARM_FOR_CLASS
> > handling.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/cpp1z/class-deduction96.C: New test.
> > ---
> >   gcc/cp/cp-tree.h  |  6 --
> >   gcc/cp/pt.c   | 67 ---
> >   .../g++.dg/cpp1z/class-deduction96.C  | 34 ++
> >   3 files changed, 75 insertions(+), 32 deletions(-)
> >   create mode 100644 gcc/testsuite/g++.dg/cpp1z/class-deduction96.C
> > 
> > diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
> > index b1cf44ecdb8..f4bcab5b18d 100644
> > --- a/gcc/cp/cp-tree.h
> > +++ b/gcc/cp/cp-tree.h
> > @@ -443,7 +443,6 @@ extern GTY(()) tree cp_global_trees[CPTI_MAX];
> > BLOCK_OUTER_CURLY_BRACE_P (in BLOCK)
> > FOLD_EXPR_MODOP_P (*_FOLD_EXPR)
> > IF_STMT_CONSTEXPR_P (IF_STMT)
> > -  TEMPLATE_TYPE_PARM_FOR_CLASS (TEMPLATE_TYPE_PARM)
> > DECL_NAMESPACE_INLINE_P (in NAMESPACE_DECL)
> > SWITCH_STMT_ALL_CASES_P (in SWITCH_STMT)
> > REINTERPRET_CAST_P (in NOP_EXPR)
> > @@ -5863,11 +5862,6 @@ enum auto_deduction_context
> > adc_decomp_type/* Decomposition declaration initializer deduction */
> >   };
> >   -/* True if this type-parameter belongs to a class template, used by C++17
> > -   class template argument deduction.  */
> > -#define TEMPLATE_TYPE_PARM_FOR_CLASS(NODE) \
> > -  (TREE_LANG_FLAG_0 (TEMPLATE_TYPE_PARM_CHECK (NODE)))
> > -
> >   /* True iff this TEMPLATE_TYPE_PARM represents decltype(auto).  */
> >   #define AUTO_IS_DECLTYPE(NODE) \
> > (TYPE_LANG_FLAG_5 (TEMPLATE_TYPE_PARM_CHECK (NODE)))
> > diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
> > index cf0ce770d52..01ef2984f23 100644
> > --- a/gcc/cp/pt.c
> > +++ b/gcc/cp/pt.c
> > @@ -154,8 +154,8 @@ static void tsubst_enum (tree, tree, tree);
> >   static bool check_instantiated_args (tree, tree, tsubst_flags_t);
> >   static int check_non_deducible_conversion (tree, tree, int, int,
> >struct conversion **, bool);
> > -static int maybe_adjust_types_for_deduction (unification_kind_t, tree*,
> > tree*,
> > -tree);
> > +static int maybe_adjust_types_for_deduction (tree, unification_kind_t,
> > +tree*, tree*, tree);
> >   static int type_unification_real (tree, tree, tree, const tree *,
> >   unsigned int, int, unification_kind_t,
> >   vec **,
> > @@ -5801,18 +5801,7 @@ push_template_decl (tree decl, bool is_friend)
> > }
> > else if (DECL_IMPLICIT_TYPEDEF_P (decl)
> >&& CLASS_TYPE_P (TREE_TYPE (decl)))
> > -   {
> > - /* Class template, set TEMPLATE_TYPE_PARM_FOR_CLASS.  */
> > - tree parms = INNERMOST_TEMPLATE_PARMS (current_template_parms);
> > - for (int i = 0; i < TREE_VEC_LENGTH (parms); ++i)
> > -   {
> > - tree t = TREE_VALUE (TREE_VEC_ELT (parms, i));
> > - if (TREE_CODE (t) == TYPE_DECL)
> > -   t = TREE_TYPE (t);
> > - if (TREE_CODE (t) == TEMPLATE_TYPE_PARM)
> > -   TEMPLATE_TYPE_PARM_FOR_CLASS (t) = true;
> > -   }
> > -   }
> > +   /* Class template.  */;
> > else if (TREE

Repost #2: [PATCH] PR 100170: Fix eq/ne tests on power10.

2021-07-14 Thread Michael Meissner via Gcc-patches
I forgot to add the patch when I reposted this.

PR 100170: Fix eq/ne tests on power10.

This patch updates eq/ne tests in the testsuite to adjust the test if
power10 code generation is used.

2021-07-04  Michael Meissner  

gcc/testsuite/
PR testsuite/100170
* gcc.target/powerpc/ppc-eq0-1.c: Add support for the setbc
instruction.
* gcc.target/powerpc/ppc-ne0-1.c: Update instruction counts on
power10.
---
 gcc/testsuite/gcc.target/powerpc/ppc-eq0-1.c | 2 +-
 gcc/testsuite/gcc.target/powerpc/ppc-ne0-1.c | 9 ++---
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/ppc-eq0-1.c 
b/gcc/testsuite/gcc.target/powerpc/ppc-eq0-1.c
index 496a6e340c0..bbdc7e00101 100644
--- a/gcc/testsuite/gcc.target/powerpc/ppc-eq0-1.c
+++ b/gcc/testsuite/gcc.target/powerpc/ppc-eq0-1.c
@@ -7,4 +7,4 @@ int foo(int x)
   return x == 0;
 }
 
-/* { dg-final { scan-assembler "cntlzw|isel" } } */
+/* { dg-final { scan-assembler {\mcntlzw|isel|setbc\M} } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/ppc-ne0-1.c 
b/gcc/testsuite/gcc.target/powerpc/ppc-ne0-1.c
index 63c4b6087df..34c6de3874d 100644
--- a/gcc/testsuite/gcc.target/powerpc/ppc-ne0-1.c
+++ b/gcc/testsuite/gcc.target/powerpc/ppc-ne0-1.c
@@ -2,9 +2,12 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -mno-isel" } */
 
-/* { dg-final { scan-assembler-times "addic" 4 } } */
-/* { dg-final { scan-assembler-times "subfe" 1 } } */
-/* { dg-final { scan-assembler-times "addze" 3 } } */
+/* { dg-final { scan-assembler-times {\maddic\M}  4 { target { ! 
has_arch_pwr10 } } } } */
+/* { dg-final { scan-assembler-times {\msubfe\M}  1 { target { ! 
has_arch_pwr10 } } } } */
+/* { dg-final { scan-assembler-times {\maddic\M}  3 { target {   
has_arch_pwr10 } } } } */
+/* { dg-final { scan-assembler-not   {\msubfe\M}{ target {   
has_arch_pwr10 } } } } */
+/* { dg-final { scan-assembler-times {\msetbcr\M} 1 { target {   
has_arch_pwr10 } } } } */
+/* { dg-final { scan-assembler-times {\maddze\M}  3 } } */
 
 long ne0(long a)
 {
-- 
2.31.1


-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797


Re: avoid early reference to debug-only symbol

2021-07-14 Thread Alexandre Oliva
On Jul 14, 2021, Richard Biener via Gcc-patches  wrote:

> Can you put the above (and maybe your patch) into a PR so it doesn't
> get lost?

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101454

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


[COMMITTED] Turn hybrid mode off, default to ranger-only mode for EVRP.

2021-07-14 Thread Andrew MacLeod via Gcc-patches
With the integration of the relational oracle, Ranger is now pretty much 
at parity in getting all the cases EVRP use to, and runs the testsuite 
cleanly.


This patch turns hybrid mode off, running EVRP in ranger-only mode.  It 
requires a tweak to one test case which was XFAILing before, but now 
passes. This will hopefully be transparent to everyone.


With legacy evrp off, we'll see if anything shows up that I have missed 
in my analysis. Its certainly easy enough to turn back on if need be.  I 
will continue to locally monitor hybrid mode to ensure we don't lose any 
ground.


I have bootstrapped this on 3 arch's :  x86_64,  aarch64,  and 
powerpc64, and all run regression free through the testsuite.


pushed.

Andrew





commit 398572c1544d8b7541862401b985ae7e855cb8fb
Author: Andrew MacLeod 
Date:   Wed Jul 14 12:47:10 2021 -0400

Turn hybrid mode off, default to ranger-only mode for EVRP.

Change the default EVRP mode to ranger-only.

gcc/
* params.opt (param_evrp_mode): Change default.

gcc/testsuite/
* gcc.dg/pr80776-1.c: Remove xfail.

diff --git a/gcc/params.opt b/gcc/params.opt
index 577cd42c173..92b003e38cb 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -131,7 +131,7 @@ Common Joined UInteger Var(param_evrp_sparse_threshold) Init(800) Optimization P
 Maximum number of basic blocks before EVRP uses a sparse cache.
 
 -param=evrp-mode=
-Common Joined Var(param_evrp_mode) Enum(evrp_mode) Init(EVRP_MODE_EVRP_FIRST) Param Optimization
+Common Joined Var(param_evrp_mode) Enum(evrp_mode) Init(EVRP_MODE_RVRP_ONLY) Param Optimization
 --param=evrp-mode=[legacy|ranger|legacy-first|ranger-first|ranger-trace|ranger-debug|trace|debug] Specifies the mode Early VRP should operate in.
 
 Enum
diff --git a/gcc/testsuite/gcc.dg/pr80776-1.c b/gcc/testsuite/gcc.dg/pr80776-1.c
index eca5e805ae2..b9bce62d982 100644
--- a/gcc/testsuite/gcc.dg/pr80776-1.c
+++ b/gcc/testsuite/gcc.dg/pr80776-1.c
@@ -27,5 +27,5 @@ Foo (void)
  Setting these ranges at the definition site, causes VRP to remove the
  unreachable code altogether, leaving the following sprintf unguarded.  This
  causes the bogus warning below.  */
-  sprintf (number, "%d", i); /* { dg-bogus "writing" "" { xfail *-*-* } } */
+  sprintf (number, "%d", i); /* { dg-bogus "writing" "" } */
 }


RE: [PATCH] Generate gimple-match.c and generic-match.c earlier

2021-07-14 Thread Tamar Christina via Gcc-patches


> -Original Message-
> From: Bernd Edlinger 
> Sent: Wednesday, July 14, 2021 4:56 PM
> To: Tamar Christina ; Michael Matz
> 
> Cc: gcc-patches@gcc.gnu.org; Richard Biener 
> Subject: Re: [PATCH] Generate gimple-match.c and generic-match.c earlier
> 
> On 7/14/21 2:47 PM, Tamar Christina wrote:
> > Hi,
> >
> > Ever since this commit
> >
> > commit c9114f2804b91690e030383de15a24e0b738e856
> > Author: Bernd Edlinger 
> > Date:   Fri May 28 06:27:27 2021 +0200
> >
> > Various tools have been having trouble with cross compilation
> > resulting in
> >
> > make[2]: *** No rule to make target '../build-x86_64-build_pc-linux-
> gnu/libcpp/libcpp.a', needed by 'build/genmatch'.
> >
> > (took a while to track down).  I don't understand this part of the build
> system well enough to know how to fix this.
> > It looks like `libcpp.a` has special handling for cross compilers which now
> seems to be broken.
> >
> > I can't reproduce it with our normal cross compiler scripts. Which handles
> the stages on its own, but e.g.
> > https://github.com/crosstool-ng/crosstool-ng does reproduce the failure.
> >
> 
> Sorry for the breakage!
> 
> I do not know this tool at all, but this here looks suspicious, as it it 
> by-passes
> the dependencies in the top-level Makefile:
> 
> https://github.com/crosstool-ng/crosstool-
> ng/blob/755850d07ec4e8dc44787d1a0e35fe19507f17f6/scripts/build/cc/gcc.s
> h#L682-L683
> CT_DoExecLog CFG make ${CT_JOBSFLAGS} configure-gcc configure-
> libcpp configure-build-libiberty
> CT_DoExecLog ALL make ${CT_JOBSFLAGS} all-libcpp all-build-
> libiberty ...
> https://github.com/crosstool-ng/crosstool-
> ng/blob/755850d07ec4e8dc44787d1a0e35fe19507f17f6/scripts/build/cc/gcc.s
> h#L711-L712
> CT_DoExecLog ALL make ${CT_JOBSFLAGS} -C gcc ${libgcc_rule} \
>   ${repair_cc}
> 
> 
> but the top-level Makefile has also a dependency to all-build-libcpp:
> 
> dependencies = { module=all-gcc; on=all-build-libcpp; }; dependencies =
> { module=all-gcc; on=all-libcpp; hard=true; };
> 
> Maybe this just worked by chance, when building with "make -j" started a
> parallel build, might build the build-libcpp dependency eventually, but due to
> the patch it is needed earlier?

Ah I didn't notice they bypassed the top level makefile, this would explain why 
I can't reproduce it with my own scripts..

Thanks! I'll file a bug with them then.

Kind Regards,
Tamar

> 
> 
> Bernd.
> 
> 
> 
> > Any ideas what's going on?
> >
> > Kind Regards,
> > Tamar
> >
> >> -Original Message-
> >> From: Gcc-patches  On Behalf Of
> >> Michael Matz
> >> Sent: Friday, May 28, 2021 4:33 PM
> >> To: Bernd Edlinger 
> >> Cc: gcc-patches@gcc.gnu.org; Richard Biener 
> >> Subject: Re: [PATCH] Generate gimple-match.c and generic-match.c
> >> earlier
> >>
> >> Hello,
> >>
> >> On Fri, 28 May 2021, Bernd Edlinger wrote:
> >>
> > I was wondering, why gimple-match.c and generic-match.c are not
> > built early but always last, which slows down parallel makes
> > significantly.
> >
> > The reason seems to be that generated_files does not mention
> > gimple-match.c and generic-match.c.
> >
> > This comment in Makefile.in says it all:
> >
> > $(ALL_HOST_OBJS) : | $(generated_files)
> >
> > So this patch adds gimple-match.c generic-match.c to generated_files.
> >
> >
> > Tested on x86_64-pc-linux-gnu.
> > Is it OK for trunk?
> 
>  This should help for what I was complaining about in
>  https://gcc.gnu.org/pipermail/gcc/2021-May/235963.html . I build
>  with
>  -j24 and it was stalling on compiling gimple-match.c for me.
>  Looks like insn-attrtab.c is missed too; I saw genattrtab was
>  running last
> >> too.
> 
> >>>
> >>> Yeah, probably insn-automata.c as well, sometimes it is picked up
> >>> early sometimes not. maybe $(simple_generated_c) should be added to
> >>> generated_files, but insn-attrtab.c is yet another exception.
> >>
> >> You can't put files in there that are sometimes slow to generate
> >> (which insn- {attrtab,automata}.c are on some targets), as
> >> _everything_ then waits for them to be created first.
> >>
> >> Ideally there would be a way for gnumake to mark some targets as
> >> "ugh- slow" and back-propagate this to all dependencies so that those
> >> are put in front of the work queue in a parallel make.  Alas,
> >> something like that never came into existence :-/  (When order-only
> >> deps were introduced I got excited, but then came to realize that
> >> that wasn't what was really needed for this case, a "weak" version of
> >> it would be required at least, or better yet a specific facility to
> >> impose a cost with a target)
> >>
> >>
> >> Ciao,
> >> Michael.
> >>
> >>>
> >>>
> >>> Bernd.
> >>>
>  Thanks,
>  Andrew
> 
> >
> >
> > Thanks
> > Bernd.
> >
> >
> > 2021-05-28  Bernd Edlinger  
> >
> > * Makefile.

Re: [Questions] Is there any bit in gimple/rtl to indicate this IR support fast-math or not?

2021-07-14 Thread Segher Boessenkool
On Wed, Jul 14, 2021 at 09:39:42AM +0200, Richard Biener via Gcc-help wrote:
> On Wed, Jul 14, 2021 at 9:00 AM Hongtao Liu via Gcc-patches
>  wrote:
> >
> > On Wed, Jul 14, 2021 at 2:39 PM Matthias Kretz  wrote:
> > >
> > > On Wednesday, 14 July 2021 07:18:29 CEST Hongtao Liu via Gcc-help wrote:
> > > > On Wed, Jul 14, 2021 at 1:15 PM Hongtao Liu  wrote:
> > > > > Hi:
> > > > >   The original problem was that some users wanted the cmdline option
> > > > >
> > > > > -ffast-math not to act on intrinsic production code.
> > >
> > > This sounds like the users want intrinsics to map *directly* to the
> > Thanks for the reply.
> > I think the users want the mixed usage of fast-math and no-fast-math.
> > > corresponding instruction. If that's the case such users should use inline
> > > assembly, IMHO. If you compile a TU with -ffast-math then *all* 
> > > floating-point
> > > operations are affected. Yes, more control over where to use fast-math 
> > > and the
> > > ability to mix fast-math and no-fast-math without risking ODR violations 
> > > would
> > > be great. But that's a larger issue, and one that would ideally be solved 
> > > in
> > > WG14/WG21.
> > hmm, guess it would need a lot of work.

Yes.  And the biggest part of that is defining what the actual semantics
of this should be!  Code compiled with -ffast-math is allowed to do
completely *anything* if ever it encounters an infinity or denormal,
and the sign of zeroes can flip randomly, etc.  It cannot interoperate
with code compiled with -fno-fast-math in general, unless much care is
taken.

> -ffast-math decomposes to quite some flag_* and those generally are not
> reflected into the IL but can be different per function (and then
> prevent inlining).

Yeah.  And for most of those sub-flags you have the problems I talked
about above.

> Note some people do like to have their intrisic code optimized, so there's
> likely conflicting interest here.

If people do not want anything optimised they should use -O0, or write
assembler code instead.  GCC in general optimises where it can, as any
good compiler should.  There are various facilities for preventing
optimisations in much more targeted places of course -- and most of
those do not tell the compiler "do not do X", but they instead say
"here Y happens, and you do not know the details of that", effectively
telling the compiler "hands off!"


Segher


Re: fix typo in attr_fnspec::verify

2021-07-14 Thread Alexandre Oliva
On Jul 13, 2021, Alexandre Oliva  wrote:

> On Jul 13, 2021, Richard Biener  wrote:
>> oops - also worth backporting to affected branches.

> Thanks, I took that as explicit approval and put it in.

> attr fnspec is new in gcc-11, not present in gcc-10, so I'm testing a
> trivial backport, just to be sure...  Will install in gcc-11 when done.

Here's what I've just installed.


fix typo in attr_fnspec::verify

Odd-numbered indices describing argument access sizes in the fnspec
string can only hold 't' or a digit, as tested in the beginning of the
case.  When checking that the size-supplying argument does not have
additional information associated with it, the test that excludes the
't' possibility looks for it at the even position in the fnspec
string.  Oops.

This might yield false positives and negatives if a function has a
fnspec in which an argument uses a 't' access-size, and ('t' - '1')
happens to be the index of an argument described in an fnspec string.
Assuming ASCII encoding, it would take a function with at least 68
arguments described in fnspec.  Still, probably worth fixing.


for  gcc/ChangeLog

* tree-ssa-alias.c (attr_fnspec::verify): Fix index in
non-'t'-sized arg check.

(cherry picked from commit a7098d6ef4e4e799dab8ef925c62b199d707694b)
---
 gcc/tree-ssa-alias.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/tree-ssa-alias.c b/gcc/tree-ssa-alias.c
index ebb3f49c86c66..3e578e5d05f49 100644
--- a/gcc/tree-ssa-alias.c
+++ b/gcc/tree-ssa-alias.c
@@ -3868,7 +3868,7 @@ attr_fnspec::verify ()
&& str[idx] != 'w' && str[idx] != 'W'
&& str[idx] != 'o' && str[idx] != 'O')
  err = true;
-   if (str[idx] != 't'
+   if (str[idx + 1] != 't'
/* Size specified is scalar, so it should be described
   by ". " if specified at all.  */
&& (arg_specified_p (str[idx + 1] - '1')


-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


Re: Repost: [PATCH] Fix long double tests when default long double is not IBM.

2021-07-14 Thread Michael Meissner via Gcc-patches
On Wed, Jul 14, 2021 at 11:11:29AM -0500, Bill Schmidt wrote:
> Just for my edification, can you remind me why we need -Wno-psabi?
> What warning are we disabling?  Same question for ieee variant.
> 
> LGTM in any event.  Recommend approval by maintainers...

Unless you configured GCC with a 2.32 or new glibc (such as in Advance
Toolchain 14), when you change the default long double representation, the
compiler gives a warning that you can't depend on GLIBC to have all of the
necessary support for __float128.  Using -Wno-psabi supresses this warning.
For assembly tests, we don't care if GLIBC supports it or not, but we don't
want the warning.

Note, there is still a hole with Fortran, in that it doesn't support multiple
long double types like C/C++ do, so you can't really change the long double
behavior and expect your program to work.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797


Re: [PATCH] c++: CTAD and forwarding references [PR88252]

2021-07-14 Thread Jason Merrill via Gcc-patches

On 7/14/21 11:26 AM, Patrick Palka wrote:

Here we're incorrectly treating T&& as a forwarding reference during
CTAD even though T is a template parameter of the class template.

This happens because the template parameter T in the out-of-line
definition of the constructor doesn't have the flag
TEMPLATE_TYPE_PARM_FOR_CLASS set, and during duplicate_decls the
the redeclaration (which is in terms of this unflagged T) prevails.
To fix this, we could perhaps be more consistent about setting the flag,
but it appears we don't really need the flag to make the determination.

Since the template parameters of an artificial guide consist of the
template parameters of the class template followed by those of the
constructor (if any), it should suffice to look at the index of the
template parameter to determine whether T&& is a forwarding reference or
not.  This patch replaces the TEMPLATE_TYPE_PARM_FOR_CLASS flag with
this approach.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

PR c++/88252

gcc/cp/ChangeLog:

* cp-tree.h (TEMPLATE_TYPE_PARM_FOR_CLASS): Remove.
* pt.c (push_template_decl): Remove TEMPLATE_TYPE_PARM_FOR_CLASS
handling.
(redeclare_class_template): Likewise.
(parm_can_form_fwding_ref_p): Define.
(maybe_adjust_types_for_deduction): Use it instead of
TEMPLATE_TYPE_PARM_FOR_CLASS.  Add tparms parameter.
(unify_one_argument): Pass tparms to
maybe_adjust_types_for_deduction.
(try_one_overload): Likewise.
(unify): Likewise.
(rewrite_template_parm): Remove TEMPLATE_TYPE_PARM_FOR_CLASS
handling.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/class-deduction96.C: New test.
---
  gcc/cp/cp-tree.h  |  6 --
  gcc/cp/pt.c   | 67 ---
  .../g++.dg/cpp1z/class-deduction96.C  | 34 ++
  3 files changed, 75 insertions(+), 32 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp1z/class-deduction96.C

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index b1cf44ecdb8..f4bcab5b18d 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -443,7 +443,6 @@ extern GTY(()) tree cp_global_trees[CPTI_MAX];
BLOCK_OUTER_CURLY_BRACE_P (in BLOCK)
FOLD_EXPR_MODOP_P (*_FOLD_EXPR)
IF_STMT_CONSTEXPR_P (IF_STMT)
-  TEMPLATE_TYPE_PARM_FOR_CLASS (TEMPLATE_TYPE_PARM)
DECL_NAMESPACE_INLINE_P (in NAMESPACE_DECL)
SWITCH_STMT_ALL_CASES_P (in SWITCH_STMT)
REINTERPRET_CAST_P (in NOP_EXPR)
@@ -5863,11 +5862,6 @@ enum auto_deduction_context
adc_decomp_type/* Decomposition declaration initializer deduction */
  };
  
-/* True if this type-parameter belongs to a class template, used by C++17

-   class template argument deduction.  */
-#define TEMPLATE_TYPE_PARM_FOR_CLASS(NODE) \
-  (TREE_LANG_FLAG_0 (TEMPLATE_TYPE_PARM_CHECK (NODE)))
-
  /* True iff this TEMPLATE_TYPE_PARM represents decltype(auto).  */
  #define AUTO_IS_DECLTYPE(NODE) \
(TYPE_LANG_FLAG_5 (TEMPLATE_TYPE_PARM_CHECK (NODE)))
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index cf0ce770d52..01ef2984f23 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -154,8 +154,8 @@ static void tsubst_enum (tree, tree, tree);
  static bool check_instantiated_args (tree, tree, tsubst_flags_t);
  static int check_non_deducible_conversion (tree, tree, int, int,
   struct conversion **, bool);
-static int maybe_adjust_types_for_deduction (unification_kind_t, tree*, tree*,
-tree);
+static int maybe_adjust_types_for_deduction (tree, unification_kind_t,
+tree*, tree*, tree);
  static int type_unification_real (tree, tree, tree, const tree *,
  unsigned int, int, unification_kind_t,
  vec **,
@@ -5801,18 +5801,7 @@ push_template_decl (tree decl, bool is_friend)
}
else if (DECL_IMPLICIT_TYPEDEF_P (decl)
   && CLASS_TYPE_P (TREE_TYPE (decl)))
-   {
- /* Class template, set TEMPLATE_TYPE_PARM_FOR_CLASS.  */
- tree parms = INNERMOST_TEMPLATE_PARMS (current_template_parms);
- for (int i = 0; i < TREE_VEC_LENGTH (parms); ++i)
-   {
- tree t = TREE_VALUE (TREE_VEC_ELT (parms, i));
- if (TREE_CODE (t) == TYPE_DECL)
-   t = TREE_TYPE (t);
- if (TREE_CODE (t) == TEMPLATE_TYPE_PARM)
-   TEMPLATE_TYPE_PARM_FOR_CLASS (t) = true;
-   }
-   }
+   /* Class template.  */;
else if (TREE_CODE (decl) == TYPE_DECL
   && TYPE_DECL_ALIAS_P (decl))
/* alias-declaration */
@@ -6292,9 +6281,6 @@ redeclare_class_template (tree type, tree parms, tree 
cons)
  gcc_assert (DECL_CONTEXT (parm) == NULL_TREE);
  DECL_CONTEXT (parm) = tmpl;
}
-
-  if (TREE_CODE (par

Re: [PING][PATCH] define auto_vec copy ctor and assignment (PR 90904)

2021-07-14 Thread Jason Merrill via Gcc-patches

On 7/14/21 10:46 AM, Martin Sebor wrote:

On 7/13/21 9:39 PM, Jason Merrill wrote:

On 7/13/21 4:02 PM, Martin Sebor wrote:

On 7/13/21 12:37 PM, Jason Merrill wrote:

On 7/13/21 10:08 AM, Jonathan Wakely wrote:

On Mon, 12 Jul 2021 at 12:02, Richard Biener wrote:

Somebody with more C++ knowledge than me needs to approve the
vec.h changes - I don't feel competent to assess all effects of 
the change.


They look OK to me except for:

-extern vnull vNULL;
+static constexpr vnull vNULL{ };

Making vNULL have static linkage can make it an ODR violation to use
vNULL in templates and inline functions, because different
instantiations will refer to a different "vNULL" in each translation
unit.


The ODR says this is OK because it's a literal constant with the 
same value (6.2/12.2.1).


But it would be better without the explicit 'static'; then in C++17 
it's implicitly inline instead of static.


I'll remove the static.



But then, do we really want to keep vNULL at all?  It's a weird 
blurring of the object/pointer boundary that is also dependent on 
vec being a thin wrapper around a pointer.  In almost all cases it 
can be replaced with {}; one exception is == comparison, where it 
seems to be testing that the embedded pointer is null, which is a 
weird thing to want to test.


The one use case I know of for vNULL where I can't think of
an equally good substitute is in passing a vec as an argument by
value.  The only way to do that that I can think of is to name
the full vec type (i.e., the specialization) which is more typing
and less generic than vNULL.  I don't use vNULL myself so I wouldn't
miss this trick if it were to be removed but others might feel
differently.


In C++11, it can be replaced by {} in that context as well.


Cool.  I thought I'd tried { } here but I guess not.




If not, I'm all for getting rid of vNULL but with over 350 uses
of it left, unless there's some clever trick to make the removal
(mostly) effortless and seamless, I'd much rather do it independently
of this initial change. I also don't know if I can commit to making
all this cleanup.


I already have a patch to replace all but one use of vNULL, but I'll 
hold off with it until after your patch.


So what's the next step?  The patch only removes a few uses of vNULL
but doesn't add any.  Is it good to go as is (without the static and
with the additional const changes Richard suggested)?  This patch is
attached to my reply to Richard:
https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575199.html


As Richard wrote:


The pieces where you change vec<> passing to const vec<>& and the few
where you change vec<> * to const vec<> * are OK - this should make the
rest a smaller piece to review.


Please go ahead and apply those changes and send a new patch with the 
remainder of the changes.


A few other comments:


-  omp_declare_simd_clauses);
+  *omp_declare_simd_clauses);


Instead of doing this indirection in all of the callers, let's change 
c_finish_omp_declare_simd to take a pointer as well, and do the 
indirection in initializing a reference variable at the top of the function.



+sched_init_luids (bbs.to_vec ());
+haifa_init_h_i_d (bbs.to_vec ());


Why are these to_vec changes needed when you are also changing the 
functions to take const&?



-  vec checks = LOOP_VINFO_CHECK_NONZERO (loop_vinfo);
+  vec checks = LOOP_VINFO_CHECK_NONZERO (loop_vinfo).to_vec ();


Why not use a reference here and in other similar spots?

Jason



Re: Repost: [PATCH] Fix long double tests when default long double is not IBM.

2021-07-14 Thread Bill Schmidt via Gcc-patches

Hi Mike,

On 7/7/21 2:58 PM, Michael Meissner wrote:

[PATCH] Fix long double tests when default long double is not IBM.

This patch adds 3 more selections to target-supports.exp to see if we can force
the compiler to use a particular long double format (IEEE 128-bit, IBM extended
double, 64-bit), and the library support will track the changes for the long
double.  This is needed because two of the tests in the test suite use long
double, and they are actually testing IBM extended double.

This patch also forces the two tests that explicitly require long double
to use the IBM double-double encoding to explicitly run the test.  This
requires GLIBC 2.32 or greater in order to do the switch.

I have run tests on a little endian power9 system with 3 compilers.  There were
no regressions with these patches, and the two tests in the following patches
now work if the default long double is not IBM 128-bit:

 *  One compiler used the default IBM 128-bit format;
 *  One compiler used the IEEE 128-bit format; (and)
 *  One compiler used 64-bit long doubles.

I have also tested compilers on a big endian power8 system with a compiler
defaulting to power8 code generation and another with the default cpu
set.  There were no regressions.

Can I check this patch into the master branch?

2021-07-07  Michael Meissner  

gcc/testsuite/
PR target/70117
* gcc.target/powerpc/pr70117.c: Force the long double type to use
the IBM 128-bit format.
* c-c++-common/dfp/convert-bfp-11.c: Force using IBM 128-bit long
double.  Remove check for 64-bit long double.
* lib/target-supports.exp
(add_options_for_ppc_long_double_override_ibm128): New function.
(check_effective_target_ppc_long_double_override_ibm128): New
function.
(add_options_for_ppc_long_double_override_ieee128): New function.
(check_effective_target_ppc_long_double_override_ieee128): New
function.
(add_options_for_ppc_long_double_override_64bit): New function.
(check_effective_target_ppc_long_double_override_64bit): New
function.
---
  .../c-c++-common/dfp/convert-bfp-11.c |  18 +--
  gcc/testsuite/gcc.target/powerpc/pr70117.c|   6 +-
  gcc/testsuite/lib/target-supports.exp | 107 ++
  3 files changed, 121 insertions(+), 10 deletions(-)

diff --git a/gcc/testsuite/c-c++-common/dfp/convert-bfp-11.c 
b/gcc/testsuite/c-c++-common/dfp/convert-bfp-11.c
index 95c433d2c24..35da07d1fa4 100644
--- a/gcc/testsuite/c-c++-common/dfp/convert-bfp-11.c
+++ b/gcc/testsuite/c-c++-common/dfp/convert-bfp-11.c
@@ -1,9 +1,14 @@
-/* { dg-skip-if "" { ! "powerpc*-*-linux*" } } */


I suppose since the compile to check this in the effective target check
is cached, this doesn't hurt much.  Ok.


+/* { dg-require-effective-target dfp } */
+/* { dg-require-effective-target ppc_long_double_override_ibm128 } */
+/* { dg-add-options ppc_long_double_override_ibm128 } */
  
-/* Test decimal float conversions to and from IBM 128-bit long double.

-   Checks are skipped at runtime if long double is not 128 bits.
-   Don't force 128-bit long doubles because runtime support depends
-   on glibc.  */
+/* We force the long double type to be IBM 128-bit because the CONVERT_TO_PINF
+   tests will fail if we use IEEE 128-bit floating point.  This is due to IEEE
+   128-bit having a larger exponent range than IBM 128-bit extended double.  So
+   tests that would generate an infinity with IBM 128-bit will generate a
+   normal number with IEEE 128-bit.  */
+
+/* Test decimal float conversions to and from IBM 128-bit long double.   */
  
  #include "convert.h"
  
@@ -36,9 +41,6 @@ CONVERT_TO_PINF (312, tf, sd, 1.6e+308L, d32)

  int
  main ()
  {
-  if (sizeof (long double) != 16)
-return 0;
-
convert_101 ();
convert_102 ();
  
diff --git a/gcc/testsuite/gcc.target/powerpc/pr70117.c b/gcc/testsuite/gcc.target/powerpc/pr70117.c

index 3bbd2c595e0..8a5fad1dee0 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr70117.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr70117.c
@@ -1,5 +1,7 @@
-/* { dg-do run { target { powerpc*-*-linux* powerpc*-*-darwin* powerpc*-*-aix* 
rs6000-*-* } } } */
-/* { dg-options "-std=c99 -mlong-double-128 -O2" } */
+/* { dg-do run } */
+/* { dg-require-effective-target ppc_long_double_override_ibm128 } */
+/* { dg-options "-std=c99 -O2" } */
+/* { dg-add-options ppc_long_double_override_ibm128 } */
  
  #include 
  
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp

index 789723fb287..0a392cb0fd5 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -2360,6 +2360,113 @@ proc check_effective_target_ppc_ieee128_ok { } {
  }]
  }
  
+# Check if we can explicitly override the long double format to use the IBM

+# 128-bit extended double format, and GLIBC supports doing this override by
+# switching the sprintf to handle IBM 128-bit long double.

Re: [PATCH] Generate gimple-match.c and generic-match.c earlier

2021-07-14 Thread Bernd Edlinger
On 7/14/21 2:47 PM, Tamar Christina wrote:
> Hi,
> 
> Ever since this commit 
> 
> commit c9114f2804b91690e030383de15a24e0b738e856
> Author: Bernd Edlinger 
> Date:   Fri May 28 06:27:27 2021 +0200
> 
> Various tools have been having trouble with cross compilation resulting in
> 
> make[2]: *** No rule to make target 
> '../build-x86_64-build_pc-linux-gnu/libcpp/libcpp.a', needed by 
> 'build/genmatch'.
> 
> (took a while to track down).  I don't understand this part of the build 
> system well enough to know how to fix this.
> It looks like `libcpp.a` has special handling for cross compilers which now 
> seems to be broken.
> 
> I can't reproduce it with our normal cross compiler scripts. Which handles 
> the stages on its own, but e.g.
> https://github.com/crosstool-ng/crosstool-ng does reproduce the failure.
> 

Sorry for the breakage!

I do not know this tool at all, but this here looks suspicious,
as it it by-passes the dependencies in the top-level Makefile:

https://github.com/crosstool-ng/crosstool-ng/blob/755850d07ec4e8dc44787d1a0e35fe19507f17f6/scripts/build/cc/gcc.sh#L682-L683
CT_DoExecLog CFG make ${CT_JOBSFLAGS} configure-gcc 
configure-libcpp configure-build-libiberty
CT_DoExecLog ALL make ${CT_JOBSFLAGS} all-libcpp all-build-libiberty
...
https://github.com/crosstool-ng/crosstool-ng/blob/755850d07ec4e8dc44787d1a0e35fe19507f17f6/scripts/build/cc/gcc.sh#L711-L712
CT_DoExecLog ALL make ${CT_JOBSFLAGS} -C gcc ${libgcc_rule} \
  ${repair_cc}


but the top-level Makefile has also a dependency to all-build-libcpp:

dependencies = { module=all-gcc; on=all-build-libcpp; };
dependencies = { module=all-gcc; on=all-libcpp; hard=true; };

Maybe this just worked by chance, when building with "make -j" started a 
parallel build,
might build the build-libcpp dependency eventually, but due to the patch it is
needed earlier?


Bernd.



> Any ideas what's going on?
> 
> Kind Regards,
> Tamar
> 
>> -Original Message-
>> From: Gcc-patches  On Behalf Of
>> Michael Matz
>> Sent: Friday, May 28, 2021 4:33 PM
>> To: Bernd Edlinger 
>> Cc: gcc-patches@gcc.gnu.org; Richard Biener 
>> Subject: Re: [PATCH] Generate gimple-match.c and generic-match.c earlier
>>
>> Hello,
>>
>> On Fri, 28 May 2021, Bernd Edlinger wrote:
>>
> I was wondering, why gimple-match.c and generic-match.c are not
> built early but always last, which slows down parallel makes
> significantly.
>
> The reason seems to be that generated_files does not mention
> gimple-match.c and generic-match.c.
>
> This comment in Makefile.in says it all:
>
> $(ALL_HOST_OBJS) : | $(generated_files)
>
> So this patch adds gimple-match.c generic-match.c to generated_files.
>
>
> Tested on x86_64-pc-linux-gnu.
> Is it OK for trunk?

 This should help for what I was complaining about in
 https://gcc.gnu.org/pipermail/gcc/2021-May/235963.html . I build
 with
 -j24 and it was stalling on compiling gimple-match.c for me.
 Looks like insn-attrtab.c is missed too; I saw genattrtab was running last
>> too.

>>>
>>> Yeah, probably insn-automata.c as well, sometimes it is picked up
>>> early sometimes not. maybe $(simple_generated_c) should be added to
>>> generated_files, but insn-attrtab.c is yet another exception.
>>
>> You can't put files in there that are sometimes slow to generate (which insn-
>> {attrtab,automata}.c are on some targets), as _everything_ then waits for
>> them to be created first.
>>
>> Ideally there would be a way for gnumake to mark some targets as "ugh-
>> slow" and back-propagate this to all dependencies so that those are put in
>> front of the work queue in a parallel make.  Alas, something like that never
>> came into existence :-/  (When order-only deps were introduced I got
>> excited, but then came to realize that that wasn't what was really needed for
>> this case, a "weak" version of it would be required at least, or better yet a
>> specific facility to impose a cost with a target)
>>
>>
>> Ciao,
>> Michael.
>>
>>>
>>>
>>> Bernd.
>>>
 Thanks,
 Andrew

>
>
> Thanks
> Bernd.
>
>
> 2021-05-28  Bernd Edlinger  
>
> * Makefile.in (generated_files): Add gimple-match.c and
> generic-match.c
>>>


Re: [PATCH] driver/101383 - handle -gtoggle in driver

2021-07-14 Thread Joseph Myers
This patch is OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [llvm-dev] [PATCH 0/2] Initial support for AVX512FP16

2021-07-14 Thread Craig Topper via Gcc-patches
On Wed, Jul 14, 2021 at 12:45 AM Hongtao Liu via llvm-dev <
llvm-...@lists.llvm.org> wrote:

> > >
> > Set excess_precision_type to FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 to
> > round after each operation could keep semantics right.
> > And I'll document the behavior difference between soft-fp and
> > AVX512FP16 instruction for exceptions.
> I got some feedback from my colleague who's working on supporting
> _Float16 for llvm.
> The LLVM side wants to set  FLT_EVAL_METHOD_PROMOTE_TO_FLOAT for
> soft-fp so that codes can be more efficient.
> i.e.
> _Float16 a, b, c, d;
> d = a + b + c;
>
> would be transformed to
> float tmp, tmp1, a1, b1, c1;
> a1 = (float) a;
> b1 = (float) b;
> c1 = (float) c;
> tmp = a1 + b1;
> tmp1 = tmp + c1;
> d = (_Float16) tmp;
>
> so there's only 1 truncation in the end.
>
> if users want to round back after every operation. codes should be
> explicitly written as
> _Float16 a, b, c, d, e;
> e = a + b;
> d = e + c;
>
> That's what Clang does, quote from [1]
>  _Float16 arithmetic will be performed using native half-precision
> support when available on the target (e.g. on ARMv8.2a); otherwise it
> will be performed at a higher precision (currently always float) and
> then truncated down to _Float16. Note that C and C++ allow
> intermediate floating-point operands of an expression to be computed
> with greater precision than is expressible in their type, so Clang may
> avoid intermediate truncations in certain cases; this may lead to
> results that are inconsistent with native arithmetic.
>

Clang for AArch64 promotes each individual operation and rounds immediately
afterwards. https://godbolt.org/z/qzGfv6nvo note the fcvts between the two
fadd operations. It's implemented in the LLVM backend where we can't see
what was originally a single expression.


>
> and so does arm gcc
> quote from arm.c
>
> /* We can calculate either in 16-bit range and precision or
>32-bit range and precision.  Make that decision based on whether
>we have native support for the ARMv8.2-A 16-bit floating-point
>instructions or not.  */
> return (TARGET_VFP_FP16INST
> ? FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16
> : FLT_EVAL_METHOD_PROMOTE_TO_FLOAT);
>
>
> [1]https://clang.llvm.org/docs/LanguageExtensions.html
> > > --
> > > Joseph S. Myers
> > > jos...@codesourcery.com
> >
> >
> >
> > --
> > BR,
> > Hongtao
>
>
>
> --
> BR,
> Hongtao
> ___
> LLVM Developers mailing list
> llvm-...@lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>


[PATCH] c++: CTAD and forwarding references [PR88252]

2021-07-14 Thread Patrick Palka via Gcc-patches
Here we're incorrectly treating T&& as a forwarding reference during
CTAD even though T is a template parameter of the class template.

This happens because the template parameter T in the out-of-line
definition of the constructor doesn't have the flag
TEMPLATE_TYPE_PARM_FOR_CLASS set, and during duplicate_decls the
the redeclaration (which is in terms of this unflagged T) prevails.
To fix this, we could perhaps be more consistent about setting the flag,
but it appears we don't really need the flag to make the determination.

Since the template parameters of an artificial guide consist of the
template parameters of the class template followed by those of the
constructor (if any), it should suffice to look at the index of the
template parameter to determine whether T&& is a forwarding reference or
not.  This patch replaces the TEMPLATE_TYPE_PARM_FOR_CLASS flag with
this approach.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

PR c++/88252

gcc/cp/ChangeLog:

* cp-tree.h (TEMPLATE_TYPE_PARM_FOR_CLASS): Remove.
* pt.c (push_template_decl): Remove TEMPLATE_TYPE_PARM_FOR_CLASS
handling.
(redeclare_class_template): Likewise.
(parm_can_form_fwding_ref_p): Define.
(maybe_adjust_types_for_deduction): Use it instead of
TEMPLATE_TYPE_PARM_FOR_CLASS.  Add tparms parameter.
(unify_one_argument): Pass tparms to
maybe_adjust_types_for_deduction.
(try_one_overload): Likewise.
(unify): Likewise.
(rewrite_template_parm): Remove TEMPLATE_TYPE_PARM_FOR_CLASS
handling.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/class-deduction96.C: New test.
---
 gcc/cp/cp-tree.h  |  6 --
 gcc/cp/pt.c   | 67 ---
 .../g++.dg/cpp1z/class-deduction96.C  | 34 ++
 3 files changed, 75 insertions(+), 32 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1z/class-deduction96.C

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index b1cf44ecdb8..f4bcab5b18d 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -443,7 +443,6 @@ extern GTY(()) tree cp_global_trees[CPTI_MAX];
   BLOCK_OUTER_CURLY_BRACE_P (in BLOCK)
   FOLD_EXPR_MODOP_P (*_FOLD_EXPR)
   IF_STMT_CONSTEXPR_P (IF_STMT)
-  TEMPLATE_TYPE_PARM_FOR_CLASS (TEMPLATE_TYPE_PARM)
   DECL_NAMESPACE_INLINE_P (in NAMESPACE_DECL)
   SWITCH_STMT_ALL_CASES_P (in SWITCH_STMT)
   REINTERPRET_CAST_P (in NOP_EXPR)
@@ -5863,11 +5862,6 @@ enum auto_deduction_context
   adc_decomp_type/* Decomposition declaration initializer deduction */
 };
 
-/* True if this type-parameter belongs to a class template, used by C++17
-   class template argument deduction.  */
-#define TEMPLATE_TYPE_PARM_FOR_CLASS(NODE) \
-  (TREE_LANG_FLAG_0 (TEMPLATE_TYPE_PARM_CHECK (NODE)))
-
 /* True iff this TEMPLATE_TYPE_PARM represents decltype(auto).  */
 #define AUTO_IS_DECLTYPE(NODE) \
   (TYPE_LANG_FLAG_5 (TEMPLATE_TYPE_PARM_CHECK (NODE)))
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index cf0ce770d52..01ef2984f23 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -154,8 +154,8 @@ static void tsubst_enum (tree, tree, tree);
 static bool check_instantiated_args (tree, tree, tsubst_flags_t);
 static int check_non_deducible_conversion (tree, tree, int, int,
   struct conversion **, bool);
-static int maybe_adjust_types_for_deduction (unification_kind_t, tree*, tree*,
-tree);
+static int maybe_adjust_types_for_deduction (tree, unification_kind_t,
+tree*, tree*, tree);
 static int type_unification_real (tree, tree, tree, const tree *,
  unsigned int, int, unification_kind_t,
  vec **,
@@ -5801,18 +5801,7 @@ push_template_decl (tree decl, bool is_friend)
}
   else if (DECL_IMPLICIT_TYPEDEF_P (decl)
   && CLASS_TYPE_P (TREE_TYPE (decl)))
-   {
- /* Class template, set TEMPLATE_TYPE_PARM_FOR_CLASS.  */
- tree parms = INNERMOST_TEMPLATE_PARMS (current_template_parms);
- for (int i = 0; i < TREE_VEC_LENGTH (parms); ++i)
-   {
- tree t = TREE_VALUE (TREE_VEC_ELT (parms, i));
- if (TREE_CODE (t) == TYPE_DECL)
-   t = TREE_TYPE (t);
- if (TREE_CODE (t) == TEMPLATE_TYPE_PARM)
-   TEMPLATE_TYPE_PARM_FOR_CLASS (t) = true;
-   }
-   }
+   /* Class template.  */;
   else if (TREE_CODE (decl) == TYPE_DECL
   && TYPE_DECL_ALIAS_P (decl))
/* alias-declaration */
@@ -6292,9 +6281,6 @@ redeclare_class_template (tree type, tree parms, tree 
cons)
  gcc_assert (DECL_CONTEXT (parm) == NULL_TREE);
  DECL_CONTEXT (parm) = tmpl;
}
-
-  if (TREE_CODE (parm) == TYPE_DECL)
-   TEMPLATE_TYPE_PARM_FOR_CLASS (TREE_TYPE (p

Re: [PATCH] c++: constexpr array reference and value-initialization [PR101371]

2021-07-14 Thread Jason Merrill via Gcc-patches

On 7/14/21 9:56 AM, Marek Polacek wrote:

On Wed, Jul 14, 2021 at 12:15:48AM -0400, Jason Merrill wrote:

On 7/13/21 8:15 PM, Marek Polacek wrote:

This PR gave me a hard time: I saw multiple issues starting with
different revisions.  But ultimately the root cause seems to be
the following, and the attached patch fixes all issues I've found
here.

In cxx_eval_array_reference we create a new constexpr context for the
CP_AGGREGATE_TYPE_P case, but we also have to create it for the
non-aggregate case.


But not for the scalar case, surely?  Other similar places check
AGGREGATE_TYPE_P || VECTOR_TYPE_P, or !SCALAR_TYPE_P.


Yea, I suppose I should avoid doing any extra work for scalars.
  

In this test, we are evaluating

((B *)this)->a = rhs->a

which means that we set ctx.object to ((B *)this)->a.  Then we proceed
to evaluate the initializer, rhs->a.  For *rhs, we eval rhs, a PARM_DECL,
for which we have (const B &) &c.arr[0] in the hash table.  Then
cxx_fold_indirect_ref gives us c.arr[0].  c is evaluated to {.arr={}} so
c.arr is {}.  Now we want c.arr[0], so we end up in cxx_eval_array_reference
and since we're initializing from {}, we call build_value_init which
gives us an AGGR_INIT_EXPR that calls 'constexpr B::B()'.  Then we
evaluate this AGGR_INIT_EXPR and since its first argument is dummy,
we take ctx.object instead.  But that is the wrong object, we're not
initializing ((B *)this)->a here.  And so we wound up with an
initializer for A, and then crash in cxx_eval_component_reference:

gcc_assert (DECL_CONTEXT (part) == TYPE_MAIN_VARIANT (TREE_TYPE (whole)));

where DECL_CONTEXT (part) is B (as it should be) but the type of whole
was A.

With that in mind, the fix is straightforward, except that when the
value-init produced an AGGR_INIT_EXPR, we shouldn't set ctx.object so
that

2508   if (DECL_CONSTRUCTOR_P (fun) && !ctx->object
2509   && TREE_CODE (t) == AGGR_INIT_EXPR)
2510 {
2511   /* We want to have an initialization target for an AGGR_INIT_EXPR.
2512  If we don't already have one in CTX, use the AGGR_INIT_EXPR_SLOT. 
 */
2513   new_ctx.object = AGGR_INIT_EXPR_SLOT (t);

comes into play.


Hmm, setting new_ctx.object to t here looks like it should be the correct
c.arr[0], not ((B*)this)->a.  It was wrong in the current code because we
weren't setting up new_ctx at all, but once that's fixed I don't think you
need special AGGR_INIT_EXPR handling.


If you don't want the special AGGR_INIT_EXPR handling, we could do something
like the following.  That any better?

Full testing in progress.


OK if testing succeeds.


-- >8 --
This PR gave me a hard time: I saw multiple issues starting with
different revisions.  But ultimately the root cause seems to be
the following, and the attached patch fixes all issues I've found
here.

In cxx_eval_array_reference we create a new constexpr context for the
CP_AGGREGATE_TYPE_P case, but we also have to create it for the
non-aggregate case.  In this test, we are evaluating

   ((B *)this)->a = rhs->a

which means that we set ctx.object to ((B *)this)->a.  Then we proceed
to evaluate the initializer, rhs->a.  For *rhs, we eval rhs, a PARM_DECL,
for which we have (const B &) &c.arr[0] in the hash table.  Then
cxx_fold_indirect_ref gives us c.arr[0].  c is evaluated to {.arr={}} so
c.arr is {}.  Now we want c.arr[0], so we end up in cxx_eval_array_reference
and since we're initializing from {}, we call build_value_init which
gives us an AGGR_INIT_EXPR that calls 'constexpr B::B()'.  Then we
evaluate this AGGR_INIT_EXPR and since its first argument is dummy,
we take ctx.object instead.  But that is the wrong object, we're not
initializing ((B *)this)->a here.  And so we wound up with an
initializer for A, and then crash in cxx_eval_component_reference:

   gcc_assert (DECL_CONTEXT (part) == TYPE_MAIN_VARIANT (TREE_TYPE (whole)));

where DECL_CONTEXT (part) is B (as it should be) but the type of whole
was A.

So create a new object, if there already was one, and the element type
is not a scalar.

PR c++/101371

gcc/cp/ChangeLog:

* constexpr.c (cxx_eval_array_reference): Create a new .object
and .ctor for the non-aggregate non-scalar case too when
value-initializing.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/constexpr-101371-2.C: New test.
* g++.dg/cpp1y/constexpr-101371.C: New test.
---
  gcc/cp/constexpr.c| 15 +++---
  .../g++.dg/cpp1y/constexpr-101371-2.C | 23 +++
  gcc/testsuite/g++.dg/cpp1y/constexpr-101371.C | 29 +++
  3 files changed, 63 insertions(+), 4 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-101371-2.C
  create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-101371.C

diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index 39787f3f5d5..31fa5b66865 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -3851,16 +3851,23 @@ cxx_eval_array_reference (const constexpr_ctx *ctx, 
t

Re: [PING][PATCH] define auto_vec copy ctor and assignment (PR 90904)

2021-07-14 Thread Martin Sebor via Gcc-patches

On 7/13/21 9:39 PM, Jason Merrill wrote:

On 7/13/21 4:02 PM, Martin Sebor wrote:

On 7/13/21 12:37 PM, Jason Merrill wrote:

On 7/13/21 10:08 AM, Jonathan Wakely wrote:

On Mon, 12 Jul 2021 at 12:02, Richard Biener wrote:

Somebody with more C++ knowledge than me needs to approve the
vec.h changes - I don't feel competent to assess all effects of the 
change.


They look OK to me except for:

-extern vnull vNULL;
+static constexpr vnull vNULL{ };

Making vNULL have static linkage can make it an ODR violation to use
vNULL in templates and inline functions, because different
instantiations will refer to a different "vNULL" in each translation
unit.


The ODR says this is OK because it's a literal constant with the same 
value (6.2/12.2.1).


But it would be better without the explicit 'static'; then in C++17 
it's implicitly inline instead of static.


I'll remove the static.



But then, do we really want to keep vNULL at all?  It's a weird 
blurring of the object/pointer boundary that is also dependent on vec 
being a thin wrapper around a pointer.  In almost all cases it can be 
replaced with {}; one exception is == comparison, where it seems to 
be testing that the embedded pointer is null, which is a weird thing 
to want to test.


The one use case I know of for vNULL where I can't think of
an equally good substitute is in passing a vec as an argument by
value.  The only way to do that that I can think of is to name
the full vec type (i.e., the specialization) which is more typing
and less generic than vNULL.  I don't use vNULL myself so I wouldn't
miss this trick if it were to be removed but others might feel
differently.


In C++11, it can be replaced by {} in that context as well.


Cool.  I thought I'd tried { } here but I guess not.




If not, I'm all for getting rid of vNULL but with over 350 uses
of it left, unless there's some clever trick to make the removal
(mostly) effortless and seamless, I'd much rather do it independently
of this initial change. I also don't know if I can commit to making
all this cleanup.


I already have a patch to replace all but one use of vNULL, but I'll 
hold off with it until after your patch.


So what's the next step?  The patch only removes a few uses of vNULL
but doesn't add any.  Is it good to go as is (without the static and
with the additional const changes Richard suggested)?  This patch is
attached to my reply to Richard:
https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575199.html

Martin



Somewhat relatedly, use of vec variables or fields seems almost 
always a mistake, as they need explicit .release() that could be 
automatic with auto_vec, and is easy to forget.  For instance, the 
recursive call in get_all_loop_exits returns a vec that is never 
released.  And I see a couple of leaks in the C++ front end as well.


I agree.  The challenge I ran into with changing vec fields is with
code that uses the vec member as a reference to auto_vec.  This is
the case in gcc/ipa-prop.h, for instance.  Those instances could
be changed to auto_vec references or pointers but again it's a more
intrusive change than the simple replacements I was planning to make
in this first iteration.

So in summary, I agree with the changes you suggest.  Given their
scope I'd prefer not to make them in the same patch, and rather make
them at some point in the future when I or someone else has the time
and energy.  I'm running out.


Oh, absolutely.

Jason





Re: [PING][PATCH] define auto_vec copy ctor and assignment (PR 90904)

2021-07-14 Thread Martin Sebor via Gcc-patches

On 7/12/21 5:02 AM, Richard Biener wrote:

On Wed, Jul 7, 2021 at 4:37 PM Martin Sebor  wrote:


On 7/7/21 1:28 AM, Richard Biener wrote:

On Tue, Jul 6, 2021 at 5:06 PM Martin Sebor  wrote:


Ping: https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573968.html

Any questions/suggestions on the final patch or is it okay to commit?


I don't remember seeing one (aka saying "bootstrapped/tested, OK to commit?"
or so) - and the link above doesn't have one.

So, can you re-post it please?


The patch is attached to the email above with the following text
at the end:

Attached is a revised patch with these changes (a superset of
those I sent in response to Jason's question), tested on x86_64.

I've also attached it to this reply.


Thanks - I was confused about the pipermail way of referencing attachments ...

The pieces where you change vec<> passing to const vec<>& and the few
where you change vec<> * to const vec<> * are OK - this should make the
rest a smaller piece to review.  In general const correctness changes should
be considered obvious (vec<> to const vec<>& passing isn't quite obvious
so I acked the cases explicitely).


Okay.



I think the vec<> -> vec<>& cases would either benefit from constification
of callers that make using const vec<>& not possible or from a change to
pass array_slice<> (not array_slice<>&), noting that the vec<> contents
are mutated but the vec<> size does not change.


I've reviewed the patch and found a handful of instances I had missed
where the vec& could be made const.   The rest of the vec<> -> vec<>&
changes are all to functions that modify the vec, either directly or
by passing it to other functions that do.  (I you see some you want
me to double-check let me know.)

As a reminder, there may still be APIs where an existing by-value or
by-reference vec could be made const vec& if they don't have to be
touched for other reasons (i.e., passing an auto_vec as an argument).
Those should also be reviewed at some point.



Somebody with more C++ knowledge than me needs to approve the
vec.h changes - I don't feel competent to assess all effects of the change.


Ack.

Attached is an updated patch with a few of the vec& -> const vec&
changes and the removal of the static specifier on vNULL.

Martin
Disable implicit conversion from auto_vec to vec.


	* c-common.c (c_build_shufflevector): Adjust to vec change.
	* c-common.h (c_build_shufflevector): Same.

gcc/c/ChangeLog:

	* c-parser.c (c_finish_omp_declare_simd): Adjust to vec change.
	(c_parser_omp_declare_simd): Same.
	* c-tree.h (c_build_function_call_vec): Same.
	* c-typeck.c (c_build_function_call_vec): Same.

gcc/ChangeLog:

	* cfgloop.h (single_likely_exit): Adjust to vec change.
	* cfgloopanal.c (single_likely_exit): Same.
	* cgraph.h (struct cgraph_node): Same.
	* cgraphclones.c (cgraph_node::create_virtual_clone): Same.
	* dominance.c (prune_bbs_to_update_dominators): Same.
	(iterate_fix_dominators): Same.
	* dominance.h (iterate_fix_dominators): Same.
	* genautomata.c (merge_states): Same.
	* genextract.c (VEC_char_to_string): Same.
	* genmatch.c (dt_node::gen_kids_1): Same.
	(walk_captures): Same.
	* gimple-ssa-store-merging.c (check_no_overlap): Same.
	* gimple.c (gimple_build_call_vec): Same.
	(gimple_build_call_internal_vec): Same.
	(gimple_build_switch): Same.
	(sort_case_labels): Same.
	(preprocess_case_label_vec_for_gimple): Same.
	* gimple.h (gimple_build_call_vec): Same.
	(gimple_build_call_internal_vec): Same.
	(gimple_build_switch): Same.
	(sort_case_labels): Same.
	(preprocess_case_label_vec_for_gimple): Same.
	* haifa-sched.c (calc_priorities): Same.
	(haifa_sched_init): Same.
	(sched_init_luids): Same.
	(haifa_init_h_i_d): Same.
	* ipa-cp.c (ipa_get_indirect_edge_target_1): Same.
	(adjust_callers_for_value_intersection): Same.
	(find_more_scalar_values_for_callers_subset): Same.
	(find_more_contexts_for_caller_subset): Same.
	(find_aggregate_values_for_callers_subset): Same.
	(copy_useful_known_contexts): Same.
	* ipa-fnsummary.c (remap_edge_summaries): Same.
	(remap_freqcounting_predicate): Same.
	* ipa-inline.c (add_new_edges_to_heap): Same.
	* ipa-predicate.c (predicate::remap_after_inlining): Same.
	* ipa-predicate.h:
	* ipa-prop.c (ipa_find_agg_cst_for_param): Same.
	* ipa-prop.h (ipa_find_agg_cst_for_param): Same.
	* ira-build.c (ira_loop_tree_body_rev_postorder): Same.
	* read-rtl.c (apply_iterators): Same.
	* rtl.h (native_decode_rtx): Same.
	(native_decode_vector_rtx): Same.
	* sched-int.h (sched_init_luids): Same.
	(haifa_init_h_i_d): Same.
	* simplify-rtx.c (native_decode_vector_rtx): Same.
	(native_decode_rtx): Same.
	* tree-call-cdce.c (gen_shrink_wrap_conditions): Same.
	(shrink_wrap_one_built_in_call_with_conds): Same.
	(shrink_wrap_conditional_dead_built_in_calls): Same.
	* tree-data-ref.c (create_runtime_alias_checks): Same.
	(compute_all_dependences): Same.
	* tree-data-ref.h (compute_all_dependences): Same.
	(create_runtime_alias_checks): Same.
	(index_in_loop_nest): Same.

Re: [patch][version 4]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-07-14 Thread Qing Zhao via Gcc-patches
Hi, Richard,

> On Jul 14, 2021, at 2:14 AM, Richard Biener  
> wrote:
> 
> On Wed, Jul 14, 2021 at 1:17 AM Qing Zhao  wrote:
>> 
>> Hi, Kees,
>> 
>> I took a look at the kernel testing case you attached in the previous email, 
>> and found the testing failed with the following case:
>> 
>> #define INIT_STRUCT_static_all  = { .one = arg->one,\
>>.two = arg->two,\
>>.three = arg->three,\
>>.four = arg->four,  \
>>}
>> 
>> i.e, when the structure type auto variable has been explicitly initialized 
>> in the source code.  -ftrivial-auto-var-init in the 4th version
>> does not initialize the paddings for such variables.
>> 
>> But in the previous version of the patches ( 2 or 3), 
>> -ftrivial-auto-var-init initializes the paddings for such variables.
>> 
>> I intended to remove this part of the code from the 4th version of the patch 
>> since the implementation for initializing such paddings is completely 
>> different from
>> the initializing of the whole structure as a whole with memset in this 
>> version of the implementation.
>> 
>> If we really need this functionality, I will add another separate patch for 
>> this additional functionality, but not with this patch.
>> 
>> Richard, what’s your comment and suggestions on this?
> 
> I think this can be addressed in the gimplifier by adjusting
> gimplify_init_constructor to clear
> the object before the initialization (if it's not done via aggregate
> copying).  

I did this in the previous versions of the patch like the following:

@@ -5001,6 +5185,17 @@ gimplify_init_constructor (tree *expr_p, gimple_seq 
*pre_p, gimple_seq *post_p,
  /* If a single access to the target must be ensured and all elements
 are zero, then it's optimal to clear whatever their number.  */
  cleared = true;
+   else if (flag_trivial_auto_var_init > AUTO_INIT_UNINITIALIZED
+&& !TREE_STATIC (object)
+&& type_has_padding (type))
+ /* If the user requests to initialize automatic variables with
+paddings inside the type, we should initialize the paddings too.
+C guarantees that brace-init with fewer initializers than members
+aggregate will initialize the rest of the aggregate as-if it were
+static initialization.  In turn static initialization guarantees
+that pad is initialized to zero bits.
+So, it's better to clear the whole record under such situation.  */
+ cleared = true;
else
  cleared = false;

Then the paddings are also initialized to zeroes with this option. (Even for 
-ftrivial-auto-var-init=pattern).

Is the above change Okay? (With this change, when 
-ftrivial-auto-var-init=pattern, the paddings for the
structure variables that have explicit initializer will be ZEROed, not 0xFE)

> The clearing
> could be done via .DEFERRED_INIT.

You mean to add additional calls to .DEFERRED_INIT for each individual padding 
of the structure in “gimplify_init_constructor"?
Then  later during RTL expand, expand these calls the same as other calls?
> 
> Note that I think .DEFERRED_INIT can be elided for variables that do
> not have their address
> taken - otherwise we'll also have to worry about aggregate copy
> initialization and SRA
> decomposing the copy, initializing only the used parts.

Please explain this a little bit more.

Thanks.

Qing
> 
> Richard.
> 
>> Thanks.
>> 
>> Qing
>> 
>>> On Jul 13, 2021, at 4:29 PM, Kees Cook  wrote:
>>> 
>>> On Mon, Jul 12, 2021 at 08:28:55PM +, Qing Zhao wrote:
> On Jul 12, 2021, at 12:56 PM, Kees Cook  wrote:
> On Wed, Jul 07, 2021 at 05:38:02PM +, Qing Zhao wrote:
>> This is the 4th version of the patch for the new security feature for 
>> GCC.
> 
> It looks like padding initialization has regressed to where things where
> in version 1[1] (it was, however, working in version 2[2]). I'm seeing
> these failures again in the kernel self-test:
> 
> test_stackinit: small_hole_static_all FAIL (uninit bytes: 3)
> test_stackinit: big_hole_static_all FAIL (uninit bytes: 61)
> test_stackinit: trailing_hole_static_all FAIL (uninit bytes: 7)
> test_stackinit: small_hole_dynamic_all FAIL (uninit bytes: 3)
> test_stackinit: big_hole_dynamic_all FAIL (uninit bytes: 61)
> test_stackinit: trailing_hole_dynamic_all FAIL (uninit bytes: 7)
 
 Are the above failures for -ftrivial-auto-var-init=zero or 
 -ftrivial-auto-var-init=pattern?  Or both?
>>> 
>>> Yes, I was only testing =zero (the kernel test handles =pattern as well:
>>> it doesn't explicitly test for 0x00). I've verified with =pattern now,
>>> too.
>>> 
 For the current implementation, I believe that all paddings should be 
 initialized wi

Re: GCC 11.1.1 Status Report (2021-07-06)

2021-07-14 Thread H.J. Lu via Gcc-patches
On Tue, Jul 6, 2021 at 12:00 AM Richard Biener  wrote:
>
>
> Status
> ==
>
> The GCC 11 branch is open for regression and documentation fixes.
> It's time for a GCC 11.2 release and we are aiming for a release
> candidate in about two weeks which would result in the GCC 11.2
> release about three months after GCC 11.1.
>
> Two weeks give you ample time to care for important regressions
> and backporting of fixes.  Please also look out for issues on
> non-primary/secondary targets.
>
>
> Quality Data
> 
>
> Priority  #   Change from last report
> ---   ---
> P1
> P2  272   +  20
> P3   94   +  56
> P4  210   +   2
> P5   24   -   1
> ---   ---
> Total P1-P3 366   +  76
> Total   600   +  79
>
>
> Previous Report
> ===
>
> https://gcc.gnu.org/pipermail/gcc/2021-April/235923.html

Hi,

I'd like to backport this regression fix:

https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=cc11b924bfe7752edbba052ca71653f46a60887a

to GCC 11 for

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101395

Thanks.

-- 
H.J.


Re: [PATCH] c++: constexpr array reference and value-initialization [PR101371]

2021-07-14 Thread Marek Polacek via Gcc-patches
On Wed, Jul 14, 2021 at 12:15:48AM -0400, Jason Merrill wrote:
> On 7/13/21 8:15 PM, Marek Polacek wrote:
> > This PR gave me a hard time: I saw multiple issues starting with
> > different revisions.  But ultimately the root cause seems to be
> > the following, and the attached patch fixes all issues I've found
> > here.
> > 
> > In cxx_eval_array_reference we create a new constexpr context for the
> > CP_AGGREGATE_TYPE_P case, but we also have to create it for the
> > non-aggregate case.
> 
> But not for the scalar case, surely?  Other similar places check
> AGGREGATE_TYPE_P || VECTOR_TYPE_P, or !SCALAR_TYPE_P.

Yea, I suppose I should avoid doing any extra work for scalars.
 
> > In this test, we are evaluating
> > 
> >((B *)this)->a = rhs->a
> > 
> > which means that we set ctx.object to ((B *)this)->a.  Then we proceed
> > to evaluate the initializer, rhs->a.  For *rhs, we eval rhs, a PARM_DECL,
> > for which we have (const B &) &c.arr[0] in the hash table.  Then
> > cxx_fold_indirect_ref gives us c.arr[0].  c is evaluated to {.arr={}} so
> > c.arr is {}.  Now we want c.arr[0], so we end up in cxx_eval_array_reference
> > and since we're initializing from {}, we call build_value_init which
> > gives us an AGGR_INIT_EXPR that calls 'constexpr B::B()'.  Then we
> > evaluate this AGGR_INIT_EXPR and since its first argument is dummy,
> > we take ctx.object instead.  But that is the wrong object, we're not
> > initializing ((B *)this)->a here.  And so we wound up with an
> > initializer for A, and then crash in cxx_eval_component_reference:
> > 
> >gcc_assert (DECL_CONTEXT (part) == TYPE_MAIN_VARIANT (TREE_TYPE 
> > (whole)));
> > 
> > where DECL_CONTEXT (part) is B (as it should be) but the type of whole
> > was A.
> > 
> > With that in mind, the fix is straightforward, except that when the
> > value-init produced an AGGR_INIT_EXPR, we shouldn't set ctx.object so
> > that
> > 
> > 2508   if (DECL_CONSTRUCTOR_P (fun) && !ctx->object
> > 2509   && TREE_CODE (t) == AGGR_INIT_EXPR)
> > 2510 {
> > 2511   /* We want to have an initialization target for an 
> > AGGR_INIT_EXPR.
> > 2512  If we don't already have one in CTX, use the 
> > AGGR_INIT_EXPR_SLOT.  */
> > 2513   new_ctx.object = AGGR_INIT_EXPR_SLOT (t);
> > 
> > comes into play.
> 
> Hmm, setting new_ctx.object to t here looks like it should be the correct
> c.arr[0], not ((B*)this)->a.  It was wrong in the current code because we
> weren't setting up new_ctx at all, but once that's fixed I don't think you
> need special AGGR_INIT_EXPR handling.

If you don't want the special AGGR_INIT_EXPR handling, we could do something
like the following.  That any better?

Full testing in progress.

-- >8 --
This PR gave me a hard time: I saw multiple issues starting with
different revisions.  But ultimately the root cause seems to be
the following, and the attached patch fixes all issues I've found
here.

In cxx_eval_array_reference we create a new constexpr context for the
CP_AGGREGATE_TYPE_P case, but we also have to create it for the
non-aggregate case.  In this test, we are evaluating

  ((B *)this)->a = rhs->a

which means that we set ctx.object to ((B *)this)->a.  Then we proceed
to evaluate the initializer, rhs->a.  For *rhs, we eval rhs, a PARM_DECL,
for which we have (const B &) &c.arr[0] in the hash table.  Then
cxx_fold_indirect_ref gives us c.arr[0].  c is evaluated to {.arr={}} so
c.arr is {}.  Now we want c.arr[0], so we end up in cxx_eval_array_reference
and since we're initializing from {}, we call build_value_init which
gives us an AGGR_INIT_EXPR that calls 'constexpr B::B()'.  Then we
evaluate this AGGR_INIT_EXPR and since its first argument is dummy,
we take ctx.object instead.  But that is the wrong object, we're not
initializing ((B *)this)->a here.  And so we wound up with an
initializer for A, and then crash in cxx_eval_component_reference:

  gcc_assert (DECL_CONTEXT (part) == TYPE_MAIN_VARIANT (TREE_TYPE (whole)));

where DECL_CONTEXT (part) is B (as it should be) but the type of whole
was A.

So create a new object, if there already was one, and the element type
is not a scalar.

PR c++/101371

gcc/cp/ChangeLog:

* constexpr.c (cxx_eval_array_reference): Create a new .object
and .ctor for the non-aggregate non-scalar case too when
value-initializing.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/constexpr-101371-2.C: New test.
* g++.dg/cpp1y/constexpr-101371.C: New test.
---
 gcc/cp/constexpr.c| 15 +++---
 .../g++.dg/cpp1y/constexpr-101371-2.C | 23 +++
 gcc/testsuite/g++.dg/cpp1y/constexpr-101371.C | 29 +++
 3 files changed, 63 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-101371-2.C
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-101371.C

diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index 39787f3f5d5..31fa5b66865 100644
--- a/gcc/cp/constex

RE: [PATCH] Generate gimple-match.c and generic-match.c earlier

2021-07-14 Thread Tamar Christina via Gcc-patches

> -Original Message-
> From: Richard Biener 
> Sent: Wednesday, July 14, 2021 2:19 PM
> To: Tamar Christina 
> Cc: Michael Matz ; Bernd Edlinger
> ; Richard Biener ; gcc-
> patc...@gcc.gnu.org
> Subject: Re: [PATCH] Generate gimple-match.c and generic-match.c earlier
> 
> On Wed, Jul 14, 2021 at 3:15 PM Richard Biener
>  wrote:
> >
> > On Wed, Jul 14, 2021 at 3:12 PM Richard Biener
> >  wrote:
> > >
> > > On Wed, Jul 14, 2021 at 2:48 PM Tamar Christina via Gcc-patches
> > >  wrote:
> > > >
> > > > Hi,
> > > >
> > > > Ever since this commit
> > > >
> > > > commit c9114f2804b91690e030383de15a24e0b738e856
> > > > Author: Bernd Edlinger 
> > > > Date:   Fri May 28 06:27:27 2021 +0200
> > > >
> > > > Various tools have been having trouble with cross compilation
> > > > resulting in
> > > >
> > > > make[2]: *** No rule to make target '../build-x86_64-build_pc-linux-
> gnu/libcpp/libcpp.a', needed by 'build/genmatch'.
> > > >
> > > > (took a while to track down).  I don't understand this part of the build
> system well enough to know how to fix this.
> > > > It looks like `libcpp.a` has special handling for cross compilers which 
> > > > now
> seems to be broken.
> > > >
> > > > I can't reproduce it with our normal cross compiler scripts. Which
> handles the stages on its own, but e.g.
> > > > https://github.com/crosstool-ng/crosstool-ng does reproduce the
> failure.
> > > >
> > > > Any ideas what's going on?
> > >
> > > There should be a dependence of all-gcc to all-build-libcpp,
> > > Makefile.def has
> > >
> > > dependencies = { module=all-gcc; on=all-build-libcpp; };
> > >
> > > so how come build-libcpp is not built when gcc/ starts?
> >
> > Ah, I guess (gcc/Makefile.in):
> >
> > CPPLIB = ../libcpp/libcpp.a
> > ...
> > # For stage1 and when cross-compiling use the build libcpp which is #
> > built with NLS disabled.  For stage2+ use the host library and # its
> > dependencies.
> > ifeq ($(build_objdir),$(build_libobjdir))
> > BUILD_CPPLIB = $(build_libobjdir)/libcpp/libcpp.a
> > else
> > BUILD_CPPLIB = $(CPPLIB) $(LIBIBERTY)
> >
> > is not properly reflected in above dependences.  Not sure how to fix
> > that though.
> 
> I guess add
> 
>  dependencies = { module=all-gcc; on=all-libcpp; };
> 

Isn't that already there? I see

dependencies = { module=all-gcc; on=all-libcpp; hard=true; };

already in Makefile.def

Regards,
Tamar

> like it is done for libiberty.
> 
> > > > Kind Regards,
> > > > Tamar
> > > >
> > > > > -Original Message-
> > > > > From: Gcc-patches  On Behalf Of
> > > > > Michael Matz
> > > > > Sent: Friday, May 28, 2021 4:33 PM
> > > > > To: Bernd Edlinger 
> > > > > Cc: gcc-patches@gcc.gnu.org; Richard Biener 
> > > > > Subject: Re: [PATCH] Generate gimple-match.c and generic-match.c
> > > > > earlier
> > > > >
> > > > > Hello,
> > > > >
> > > > > On Fri, 28 May 2021, Bernd Edlinger wrote:
> > > > >
> > > > > > >> I was wondering, why gimple-match.c and generic-match.c are
> > > > > > >> not built early but always last, which slows down parallel
> > > > > > >> makes significantly.
> > > > > > >>
> > > > > > >> The reason seems to be that generated_files does not
> > > > > > >> mention gimple-match.c and generic-match.c.
> > > > > > >>
> > > > > > >> This comment in Makefile.in says it all:
> > > > > > >>
> > > > > > >> $(ALL_HOST_OBJS) : | $(generated_files)
> > > > > > >>
> > > > > > >> So this patch adds gimple-match.c generic-match.c to
> generated_files.
> > > > > > >>
> > > > > > >>
> > > > > > >> Tested on x86_64-pc-linux-gnu.
> > > > > > >> Is it OK for trunk?
> > > > > > >
> > > > > > > This should help for what I was complaining about in
> > > > > > > https://gcc.gnu.org/pipermail/gcc/2021-May/235963.html . I
> > > > > > > build with
> > > > > > > -j24 and it was stalling on compiling gimple-match.c for me.
> > > > > > > Looks like insn-attrtab.c is missed too; I saw genattrtab
> > > > > > > was running last
> > > > > too.
> > > > > > >
> > > > > >
> > > > > > Yeah, probably insn-automata.c as well, sometimes it is picked
> > > > > > up early sometimes not. maybe $(simple_generated_c) should be
> > > > > > added to generated_files, but insn-attrtab.c is yet another
> exception.
> > > > >
> > > > > You can't put files in there that are sometimes slow to generate
> > > > > (which insn- {attrtab,automata}.c are on some targets), as
> > > > > _everything_ then waits for them to be created first.
> > > > >
> > > > > Ideally there would be a way for gnumake to mark some targets as
> > > > > "ugh- slow" and back-propagate this to all dependencies so that
> > > > > those are put in front of the work queue in a parallel make.
> > > > > Alas, something like that never came into existence :-/  (When
> > > > > order-only deps were introduced I got excited, but then came to
> > > > > realize that that wasn't what was really needed for this case, a
> > > > > "weak" version of it would be required at least, or better yet a
> > > > > specific facility to impose a cost with a target)

Re: [PATCH] Generate gimple-match.c and generic-match.c earlier

2021-07-14 Thread Richard Biener via Gcc-patches
On Wed, Jul 14, 2021 at 3:15 PM Richard Biener
 wrote:
>
> On Wed, Jul 14, 2021 at 3:12 PM Richard Biener
>  wrote:
> >
> > On Wed, Jul 14, 2021 at 2:48 PM Tamar Christina via Gcc-patches
> >  wrote:
> > >
> > > Hi,
> > >
> > > Ever since this commit
> > >
> > > commit c9114f2804b91690e030383de15a24e0b738e856
> > > Author: Bernd Edlinger 
> > > Date:   Fri May 28 06:27:27 2021 +0200
> > >
> > > Various tools have been having trouble with cross compilation resulting in
> > >
> > > make[2]: *** No rule to make target 
> > > '../build-x86_64-build_pc-linux-gnu/libcpp/libcpp.a', needed by 
> > > 'build/genmatch'.
> > >
> > > (took a while to track down).  I don't understand this part of the build 
> > > system well enough to know how to fix this.
> > > It looks like `libcpp.a` has special handling for cross compilers which 
> > > now seems to be broken.
> > >
> > > I can't reproduce it with our normal cross compiler scripts. Which 
> > > handles the stages on its own, but e.g.
> > > https://github.com/crosstool-ng/crosstool-ng does reproduce the failure.
> > >
> > > Any ideas what's going on?
> >
> > There should be a dependence of all-gcc to all-build-libcpp, Makefile.def 
> > has
> >
> > dependencies = { module=all-gcc; on=all-build-libcpp; };
> >
> > so how come build-libcpp is not built when gcc/ starts?
>
> Ah, I guess (gcc/Makefile.in):
>
> CPPLIB = ../libcpp/libcpp.a
> ...
> # For stage1 and when cross-compiling use the build libcpp which is
> # built with NLS disabled.  For stage2+ use the host library and
> # its dependencies.
> ifeq ($(build_objdir),$(build_libobjdir))
> BUILD_CPPLIB = $(build_libobjdir)/libcpp/libcpp.a
> else
> BUILD_CPPLIB = $(CPPLIB) $(LIBIBERTY)
>
> is not properly reflected in above dependences.  Not sure how to fix
> that though.

I guess add

 dependencies = { module=all-gcc; on=all-libcpp; };

like it is done for libiberty.

> > > Kind Regards,
> > > Tamar
> > >
> > > > -Original Message-
> > > > From: Gcc-patches  On Behalf Of
> > > > Michael Matz
> > > > Sent: Friday, May 28, 2021 4:33 PM
> > > > To: Bernd Edlinger 
> > > > Cc: gcc-patches@gcc.gnu.org; Richard Biener 
> > > > Subject: Re: [PATCH] Generate gimple-match.c and generic-match.c earlier
> > > >
> > > > Hello,
> > > >
> > > > On Fri, 28 May 2021, Bernd Edlinger wrote:
> > > >
> > > > > >> I was wondering, why gimple-match.c and generic-match.c are not
> > > > > >> built early but always last, which slows down parallel makes
> > > > > >> significantly.
> > > > > >>
> > > > > >> The reason seems to be that generated_files does not mention
> > > > > >> gimple-match.c and generic-match.c.
> > > > > >>
> > > > > >> This comment in Makefile.in says it all:
> > > > > >>
> > > > > >> $(ALL_HOST_OBJS) : | $(generated_files)
> > > > > >>
> > > > > >> So this patch adds gimple-match.c generic-match.c to 
> > > > > >> generated_files.
> > > > > >>
> > > > > >>
> > > > > >> Tested on x86_64-pc-linux-gnu.
> > > > > >> Is it OK for trunk?
> > > > > >
> > > > > > This should help for what I was complaining about in
> > > > > > https://gcc.gnu.org/pipermail/gcc/2021-May/235963.html . I build
> > > > > > with
> > > > > > -j24 and it was stalling on compiling gimple-match.c for me.
> > > > > > Looks like insn-attrtab.c is missed too; I saw genattrtab was 
> > > > > > running last
> > > > too.
> > > > > >
> > > > >
> > > > > Yeah, probably insn-automata.c as well, sometimes it is picked up
> > > > > early sometimes not. maybe $(simple_generated_c) should be added to
> > > > > generated_files, but insn-attrtab.c is yet another exception.
> > > >
> > > > You can't put files in there that are sometimes slow to generate (which 
> > > > insn-
> > > > {attrtab,automata}.c are on some targets), as _everything_ then waits 
> > > > for
> > > > them to be created first.
> > > >
> > > > Ideally there would be a way for gnumake to mark some targets as "ugh-
> > > > slow" and back-propagate this to all dependencies so that those are put 
> > > > in
> > > > front of the work queue in a parallel make.  Alas, something like that 
> > > > never
> > > > came into existence :-/  (When order-only deps were introduced I got
> > > > excited, but then came to realize that that wasn't what was really 
> > > > needed for
> > > > this case, a "weak" version of it would be required at least, or better 
> > > > yet a
> > > > specific facility to impose a cost with a target)
> > > >
> > > >
> > > > Ciao,
> > > > Michael.
> > > >
> > > > >
> > > > >
> > > > > Bernd.
> > > > >
> > > > > > Thanks,
> > > > > > Andrew
> > > > > >
> > > > > >>
> > > > > >>
> > > > > >> Thanks
> > > > > >> Bernd.
> > > > > >>
> > > > > >>
> > > > > >> 2021-05-28  Bernd Edlinger  
> > > > > >>
> > > > > >> * Makefile.in (generated_files): Add gimple-match.c and
> > > > > >> generic-match.c
> > > > >


Re: [PATCH] Generate gimple-match.c and generic-match.c earlier

2021-07-14 Thread Richard Biener via Gcc-patches
On Wed, Jul 14, 2021 at 3:12 PM Richard Biener
 wrote:
>
> On Wed, Jul 14, 2021 at 2:48 PM Tamar Christina via Gcc-patches
>  wrote:
> >
> > Hi,
> >
> > Ever since this commit
> >
> > commit c9114f2804b91690e030383de15a24e0b738e856
> > Author: Bernd Edlinger 
> > Date:   Fri May 28 06:27:27 2021 +0200
> >
> > Various tools have been having trouble with cross compilation resulting in
> >
> > make[2]: *** No rule to make target 
> > '../build-x86_64-build_pc-linux-gnu/libcpp/libcpp.a', needed by 
> > 'build/genmatch'.
> >
> > (took a while to track down).  I don't understand this part of the build 
> > system well enough to know how to fix this.
> > It looks like `libcpp.a` has special handling for cross compilers which now 
> > seems to be broken.
> >
> > I can't reproduce it with our normal cross compiler scripts. Which handles 
> > the stages on its own, but e.g.
> > https://github.com/crosstool-ng/crosstool-ng does reproduce the failure.
> >
> > Any ideas what's going on?
>
> There should be a dependence of all-gcc to all-build-libcpp, Makefile.def has
>
> dependencies = { module=all-gcc; on=all-build-libcpp; };
>
> so how come build-libcpp is not built when gcc/ starts?

Ah, I guess (gcc/Makefile.in):

CPPLIB = ../libcpp/libcpp.a
...
# For stage1 and when cross-compiling use the build libcpp which is
# built with NLS disabled.  For stage2+ use the host library and
# its dependencies.
ifeq ($(build_objdir),$(build_libobjdir))
BUILD_CPPLIB = $(build_libobjdir)/libcpp/libcpp.a
else
BUILD_CPPLIB = $(CPPLIB) $(LIBIBERTY)

is not properly reflected in above dependences.  Not sure how to fix
that though.

> > Kind Regards,
> > Tamar
> >
> > > -Original Message-
> > > From: Gcc-patches  On Behalf Of
> > > Michael Matz
> > > Sent: Friday, May 28, 2021 4:33 PM
> > > To: Bernd Edlinger 
> > > Cc: gcc-patches@gcc.gnu.org; Richard Biener 
> > > Subject: Re: [PATCH] Generate gimple-match.c and generic-match.c earlier
> > >
> > > Hello,
> > >
> > > On Fri, 28 May 2021, Bernd Edlinger wrote:
> > >
> > > > >> I was wondering, why gimple-match.c and generic-match.c are not
> > > > >> built early but always last, which slows down parallel makes
> > > > >> significantly.
> > > > >>
> > > > >> The reason seems to be that generated_files does not mention
> > > > >> gimple-match.c and generic-match.c.
> > > > >>
> > > > >> This comment in Makefile.in says it all:
> > > > >>
> > > > >> $(ALL_HOST_OBJS) : | $(generated_files)
> > > > >>
> > > > >> So this patch adds gimple-match.c generic-match.c to generated_files.
> > > > >>
> > > > >>
> > > > >> Tested on x86_64-pc-linux-gnu.
> > > > >> Is it OK for trunk?
> > > > >
> > > > > This should help for what I was complaining about in
> > > > > https://gcc.gnu.org/pipermail/gcc/2021-May/235963.html . I build
> > > > > with
> > > > > -j24 and it was stalling on compiling gimple-match.c for me.
> > > > > Looks like insn-attrtab.c is missed too; I saw genattrtab was running 
> > > > > last
> > > too.
> > > > >
> > > >
> > > > Yeah, probably insn-automata.c as well, sometimes it is picked up
> > > > early sometimes not. maybe $(simple_generated_c) should be added to
> > > > generated_files, but insn-attrtab.c is yet another exception.
> > >
> > > You can't put files in there that are sometimes slow to generate (which 
> > > insn-
> > > {attrtab,automata}.c are on some targets), as _everything_ then waits for
> > > them to be created first.
> > >
> > > Ideally there would be a way for gnumake to mark some targets as "ugh-
> > > slow" and back-propagate this to all dependencies so that those are put in
> > > front of the work queue in a parallel make.  Alas, something like that 
> > > never
> > > came into existence :-/  (When order-only deps were introduced I got
> > > excited, but then came to realize that that wasn't what was really needed 
> > > for
> > > this case, a "weak" version of it would be required at least, or better 
> > > yet a
> > > specific facility to impose a cost with a target)
> > >
> > >
> > > Ciao,
> > > Michael.
> > >
> > > >
> > > >
> > > > Bernd.
> > > >
> > > > > Thanks,
> > > > > Andrew
> > > > >
> > > > >>
> > > > >>
> > > > >> Thanks
> > > > >> Bernd.
> > > > >>
> > > > >>
> > > > >> 2021-05-28  Bernd Edlinger  
> > > > >>
> > > > >> * Makefile.in (generated_files): Add gimple-match.c and
> > > > >> generic-match.c
> > > >


Re: [PATCH] Generate gimple-match.c and generic-match.c earlier

2021-07-14 Thread Richard Biener via Gcc-patches
On Wed, Jul 14, 2021 at 2:48 PM Tamar Christina via Gcc-patches
 wrote:
>
> Hi,
>
> Ever since this commit
>
> commit c9114f2804b91690e030383de15a24e0b738e856
> Author: Bernd Edlinger 
> Date:   Fri May 28 06:27:27 2021 +0200
>
> Various tools have been having trouble with cross compilation resulting in
>
> make[2]: *** No rule to make target 
> '../build-x86_64-build_pc-linux-gnu/libcpp/libcpp.a', needed by 
> 'build/genmatch'.
>
> (took a while to track down).  I don't understand this part of the build 
> system well enough to know how to fix this.
> It looks like `libcpp.a` has special handling for cross compilers which now 
> seems to be broken.
>
> I can't reproduce it with our normal cross compiler scripts. Which handles 
> the stages on its own, but e.g.
> https://github.com/crosstool-ng/crosstool-ng does reproduce the failure.
>
> Any ideas what's going on?

There should be a dependence of all-gcc to all-build-libcpp, Makefile.def has

dependencies = { module=all-gcc; on=all-build-libcpp; };

so how come build-libcpp is not built when gcc/ starts?

> Kind Regards,
> Tamar
>
> > -Original Message-
> > From: Gcc-patches  On Behalf Of
> > Michael Matz
> > Sent: Friday, May 28, 2021 4:33 PM
> > To: Bernd Edlinger 
> > Cc: gcc-patches@gcc.gnu.org; Richard Biener 
> > Subject: Re: [PATCH] Generate gimple-match.c and generic-match.c earlier
> >
> > Hello,
> >
> > On Fri, 28 May 2021, Bernd Edlinger wrote:
> >
> > > >> I was wondering, why gimple-match.c and generic-match.c are not
> > > >> built early but always last, which slows down parallel makes
> > > >> significantly.
> > > >>
> > > >> The reason seems to be that generated_files does not mention
> > > >> gimple-match.c and generic-match.c.
> > > >>
> > > >> This comment in Makefile.in says it all:
> > > >>
> > > >> $(ALL_HOST_OBJS) : | $(generated_files)
> > > >>
> > > >> So this patch adds gimple-match.c generic-match.c to generated_files.
> > > >>
> > > >>
> > > >> Tested on x86_64-pc-linux-gnu.
> > > >> Is it OK for trunk?
> > > >
> > > > This should help for what I was complaining about in
> > > > https://gcc.gnu.org/pipermail/gcc/2021-May/235963.html . I build
> > > > with
> > > > -j24 and it was stalling on compiling gimple-match.c for me.
> > > > Looks like insn-attrtab.c is missed too; I saw genattrtab was running 
> > > > last
> > too.
> > > >
> > >
> > > Yeah, probably insn-automata.c as well, sometimes it is picked up
> > > early sometimes not. maybe $(simple_generated_c) should be added to
> > > generated_files, but insn-attrtab.c is yet another exception.
> >
> > You can't put files in there that are sometimes slow to generate (which 
> > insn-
> > {attrtab,automata}.c are on some targets), as _everything_ then waits for
> > them to be created first.
> >
> > Ideally there would be a way for gnumake to mark some targets as "ugh-
> > slow" and back-propagate this to all dependencies so that those are put in
> > front of the work queue in a parallel make.  Alas, something like that never
> > came into existence :-/  (When order-only deps were introduced I got
> > excited, but then came to realize that that wasn't what was really needed 
> > for
> > this case, a "weak" version of it would be required at least, or better yet 
> > a
> > specific facility to impose a cost with a target)
> >
> >
> > Ciao,
> > Michael.
> >
> > >
> > >
> > > Bernd.
> > >
> > > > Thanks,
> > > > Andrew
> > > >
> > > >>
> > > >>
> > > >> Thanks
> > > >> Bernd.
> > > >>
> > > >>
> > > >> 2021-05-28  Bernd Edlinger  
> > > >>
> > > >> * Makefile.in (generated_files): Add gimple-match.c and
> > > >> generic-match.c
> > >


*Ping**2 [Patch] Fortran: Fix bind(C) character length checks

2021-07-14 Thread Burnus, Tobias
Ping**2

On Juli 8, 2021 I wrote:

*Ping*

I intent to incorporate Sandra's suggestions, except for the beginning of line 
spacing - that's needed to avoid exceeding the 80 character line limit. I did 
not include an updated patch as just pinging is easier on a mobile during 
vacation :-)

Thanks,

Tobias

Loosemore, Sandra wrote:

On 7/1/21 11:08 AM, Tobias Burnus wrote:
> Hi all,
>
> this patch came up when discussing Sandra's TS29113 patch internally.
> There is presumably also some overlap with José's patches.
>
> This patch tries to rectify the BIND(C) CHARACTER handling on the
> diagnostic side, only. That is: what to accept and what
> to reject for which Fortran standard.
>
>
> The rules are:
>
> * [F2003-F2018] Interoperable is character(len=1)
>→ F2018, 18.3.1  Interoperability of intrinsic types
>(General, unchanged)
>
> * Fortran 2008: In some cases, const-length chars are
>permitted as well:
>→ F2018, 18.3.4  Interoperability of scalar variables
>→ F2018, 18.3.5  Interoperability of array variables
>→ F2018, 18.3.6  Interoperability of procedures and procedure interfaces
>   [= F2008, 15.3.{4,5,6}
> For global vars with bind(C), 18.3.4 + 18.3.5 applies directly (TODO:
> Add support, not in this patch)
> For passed-by ref dummy arguments, 18.3.4 + 18.3.5 are referenced in
> - F2008: R1229  proc-language-binding-spec is language-binding-spec
>   C1255 (R1229) 
> - F2018, F2018, C1554
>
> While it is not very clearly spelt out, I regard 'char parm[4]'
> interoperable with 'character(len=4) :: a', 'character(len=2) :: b(2)'
> and 'character(len=1) :: c(4)' for both global variables and for
> dummy arguments.
>
> * Fortran 2018/TS29113:  Uses additionally CFI array descriptor
>- allocatable, pointer:  must be len=:
>- nonallocatable/nonpointer: len=* → implies array descriptor also
>  for assumed-size/explicit-size/scalar arguments.
>- All which all passed by an array descriptor already without further
>  restrictions: assumed-shape, assumed-rank, i.e. len= seems
>  to be also fine
> → 18.3.6 under item (5) bullet point 2 and 3 plus (6).
>
>
> I hope I got the conditions right. I also fixed an issue with
> character(len=5) :: str – the code in trans-expr.c did crash for
> scalars  (decl.c did not check any constraints for arrays).
> I believe the condition is wrong and for len= no descriptor
> is used.
>
> Any comments, remarks?

I gave this patch a try on my TS 29113 last night.  Changing the error
messages kind of screwed up my list of FAILs, but I did see that it also
caught some invalid character arguments in
interoperability/typecodes-scalar.f90 and
interoperability/typecodes-scalar-ext.f90 (which are already broken by 2
other major gfortran bugs I still need to file PRs for).  :-S

I haven't tried to review the patch WRT correctness with the
requirements of the standard yet, but I have a few nits about error
messages

> +   /* F2018, 18.3.6 (6).  */
> +   if (!sym->ts.deferred)
> + {
> +   gfc_error ("Allocatable and pointer character dummy "
> +  "argument %qs at %L must have deferred length "
> +  "as procedure %qs is BIND(C)", sym->name,
> +  &sym->declared_at, sym->ns->proc_name->name);
> +   retval = false;
> + }

This is the error the two aforementioned test cases started giving, but
message is confusing and doesn't read well (it was a pointer dummy, not
"allocatable and pointer").  Maybe just s/and/or/, or customize the
message depending on which one it is?

> +   gfc_error ("Character dummy argument %qs at %L must be "
> +  "of constant length or assumed length, "
> +  "unless it has assumed-shape or assumed-rank, "
> +  "as procedure %qs has the BIND(C) attribute",
> +  sym->name, &sym->declared_at,
> +  sym->ns->proc_name->name);

I don't think either "assumed-shape" or "assumed-rank" should be
hyphenated in this context unless that exact hyphenation is a term of
art in the Fortran standard or other technical documentation.  In normal
English, adjective phrases are usually only hyphenated when they appear
immediately before the noun they modify; "assumed-shape array", but "an
array with assumed shape".

> +   else if (!gfc_notify_std (GFC_STD_F2018,
> + "Character dummy argument %qs at %L"
> + " with nonconstant length as "
> + "procedure %qs is BIND(C)",
> + sym->name, &sym->declared_at,
> + sym->ns->proc_name->name))
> + retval = false;
> + }

Elsewhere the conventi

RE: [PATCH] Generate gimple-match.c and generic-match.c earlier

2021-07-14 Thread Tamar Christina via Gcc-patches
Hi,

Ever since this commit 

commit c9114f2804b91690e030383de15a24e0b738e856
Author: Bernd Edlinger 
Date:   Fri May 28 06:27:27 2021 +0200

Various tools have been having trouble with cross compilation resulting in

make[2]: *** No rule to make target 
'../build-x86_64-build_pc-linux-gnu/libcpp/libcpp.a', needed by 
'build/genmatch'.

(took a while to track down).  I don't understand this part of the build system 
well enough to know how to fix this.
It looks like `libcpp.a` has special handling for cross compilers which now 
seems to be broken.

I can't reproduce it with our normal cross compiler scripts. Which handles the 
stages on its own, but e.g.
https://github.com/crosstool-ng/crosstool-ng does reproduce the failure.

Any ideas what's going on?

Kind Regards,
Tamar

> -Original Message-
> From: Gcc-patches  On Behalf Of
> Michael Matz
> Sent: Friday, May 28, 2021 4:33 PM
> To: Bernd Edlinger 
> Cc: gcc-patches@gcc.gnu.org; Richard Biener 
> Subject: Re: [PATCH] Generate gimple-match.c and generic-match.c earlier
> 
> Hello,
> 
> On Fri, 28 May 2021, Bernd Edlinger wrote:
> 
> > >> I was wondering, why gimple-match.c and generic-match.c are not
> > >> built early but always last, which slows down parallel makes
> > >> significantly.
> > >>
> > >> The reason seems to be that generated_files does not mention
> > >> gimple-match.c and generic-match.c.
> > >>
> > >> This comment in Makefile.in says it all:
> > >>
> > >> $(ALL_HOST_OBJS) : | $(generated_files)
> > >>
> > >> So this patch adds gimple-match.c generic-match.c to generated_files.
> > >>
> > >>
> > >> Tested on x86_64-pc-linux-gnu.
> > >> Is it OK for trunk?
> > >
> > > This should help for what I was complaining about in
> > > https://gcc.gnu.org/pipermail/gcc/2021-May/235963.html . I build
> > > with
> > > -j24 and it was stalling on compiling gimple-match.c for me.
> > > Looks like insn-attrtab.c is missed too; I saw genattrtab was running last
> too.
> > >
> >
> > Yeah, probably insn-automata.c as well, sometimes it is picked up
> > early sometimes not. maybe $(simple_generated_c) should be added to
> > generated_files, but insn-attrtab.c is yet another exception.
> 
> You can't put files in there that are sometimes slow to generate (which insn-
> {attrtab,automata}.c are on some targets), as _everything_ then waits for
> them to be created first.
> 
> Ideally there would be a way for gnumake to mark some targets as "ugh-
> slow" and back-propagate this to all dependencies so that those are put in
> front of the work queue in a parallel make.  Alas, something like that never
> came into existence :-/  (When order-only deps were introduced I got
> excited, but then came to realize that that wasn't what was really needed for
> this case, a "weak" version of it would be required at least, or better yet a
> specific facility to impose a cost with a target)
> 
> 
> Ciao,
> Michael.
> 
> >
> >
> > Bernd.
> >
> > > Thanks,
> > > Andrew
> > >
> > >>
> > >>
> > >> Thanks
> > >> Bernd.
> > >>
> > >>
> > >> 2021-05-28  Bernd Edlinger  
> > >>
> > >> * Makefile.in (generated_files): Add gimple-match.c and
> > >> generic-match.c
> >


Re: ping-2: [PATCH] c-family: Add more predefined macros for math flags

2021-07-14 Thread H.J. Lu via Gcc-patches
On Wed, Jul 14, 2021 at 12:32 AM Matthias Kretz  wrote:
>
> OK?
>
> On Wednesday, 30 June 2021 10:59:28 CEST Matthias Kretz wrote:
> > Library code, especially in headers, sometimes needs to know how the
> > compiler interprets / optimizes floating-point types and operations.
> > This information can be used for additional optimizations or for
> > ensuring correctness. This change makes -freciprocal-math,
> > -fno-signed-zeros, -fno-trapping-math, -fassociative-math, and
> > -frounding-math report their state via corresponding pre-defined macros.
> >
> > Signed-off-by: Matthias Kretz 
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.dg/associative-math-1.c: New test.
> >   * gcc.dg/associative-math-2.c: New test.
> >   * gcc.dg/no-signed-zeros-1.c: New test.
> >   * gcc.dg/no-signed-zeros-2.c: New test.
> >   * gcc.dg/no-trapping-math-1.c: New test.
> >   * gcc.dg/no-trapping-math-2.c: New test.
> >   * gcc.dg/reciprocal-math-1.c: New test.
> >   * gcc.dg/reciprocal-math-2.c: New test.
> >   * gcc.dg/rounding-math-1.c: New test.
> >   * gcc.dg/rounding-math-2.c: New test.
> >
> > gcc/c-family/ChangeLog:
> >
> >   * c-cppbuiltin.c (c_cpp_builtins_optimize_pragma): Define or
> >   undefine __RECIPROCAL_MATH__, __NO_SIGNED_ZEROS__,
> >   __NO_TRAPPING_MATH__, __ASSOCIATIVE_MATH__, and
> >   __ROUNDING_MATH__ according to the new optimization flags.
> >
> > gcc/ChangeLog:
> >
> >   * cppbuiltin.c (define_builtin_macros_for_compilation_flags):
> >   Define __RECIPROCAL_MATH__, __NO_SIGNED_ZEROS__,
> >   __NO_TRAPPING_MATH__, __ASSOCIATIVE_MATH__, and
> >   __ROUNDING_MATH__ according to their corresponding flags.
> >   * doc/cpp.texi: Document __RECIPROCAL_MATH__,
> >   __NO_SIGNED_ZEROS__, __NO_TRAPPING_MATH__, __ASSOCIATIVE_MATH__,
> >   and __ROUNDING_MATH__.
> > ---
> >  gcc/c-family/c-cppbuiltin.c   | 25 +++
> >  gcc/cppbuiltin.c  | 10 +
> >  gcc/doc/cpp.texi  | 18 
> >  gcc/testsuite/gcc.dg/associative-math-1.c | 17 +++
> >  gcc/testsuite/gcc.dg/associative-math-2.c | 17 +++
> >  gcc/testsuite/gcc.dg/no-signed-zeros-1.c  | 17 +++
> >  gcc/testsuite/gcc.dg/no-signed-zeros-2.c  | 17 +++
> >  gcc/testsuite/gcc.dg/no-trapping-math-1.c | 17 +++
> >  gcc/testsuite/gcc.dg/no-trapping-math-2.c | 17 +++
> >  gcc/testsuite/gcc.dg/reciprocal-math-1.c  | 17 +++
> >  gcc/testsuite/gcc.dg/reciprocal-math-2.c  | 17 +++
> >  gcc/testsuite/gcc.dg/rounding-math-1.c| 17 +++
> >  gcc/testsuite/gcc.dg/rounding-math-2.c| 17 +++
> >  13 files changed, 223 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.dg/associative-math-1.c
> >  create mode 100644 gcc/testsuite/gcc.dg/associative-math-2.c
> >  create mode 100644 gcc/testsuite/gcc.dg/no-signed-zeros-1.c
> >  create mode 100644 gcc/testsuite/gcc.dg/no-signed-zeros-2.c
> >  create mode 100644 gcc/testsuite/gcc.dg/no-trapping-math-1.c
> >  create mode 100644 gcc/testsuite/gcc.dg/no-trapping-math-2.c
> >  create mode 100644 gcc/testsuite/gcc.dg/reciprocal-math-1.c
> >  create mode 100644 gcc/testsuite/gcc.dg/reciprocal-math-2.c
> >  create mode 100644 gcc/testsuite/gcc.dg/rounding-math-1.c
> >  create mode 100644 gcc/testsuite/gcc.dg/rounding-math-2.c
>
>

Hi Hongtao,

Can this be used to address

https://gcc.gnu.org/pipermail/gcc/2021-July/236778.html

-- 
H.J.


Re: [PATCH] [i386] Remove pass_cpb which is related to enable avx512 embedded broadcast from constant pool.

2021-07-14 Thread H.J. Lu via Gcc-patches
On Tue, Jul 13, 2021 at 9:35 PM Hongtao Liu  wrote:
>
> On Wed, Jul 14, 2021 at 10:34 AM liuhongt  wrote:
> >
> > By optimizing vector movement to broadcast in ix86_expand_vector_move
> > during pass_expand, pass_reload/LRA can automatically generate an avx512
> > embedded broadcast, pass_cpb is not needed.
> >
> > Considering that in the absence of avx512f, broadcast from memory is
> > still slightly faster than loading the entire memory, so always enable
> > broadcast.
> >
> > benchmark:
> > https://gitlab.com/x86-benchmarks/microbenchmark/-/tree/vaddps/broadcast
> >
> > The performance diff
> >
> > strategy: cycles
> > memory  : 1046611188
> > memory  : 1255420817
> > memory  : 1044720793
> > memory  : 1253414145
> > average : 1097868397
> >
> > broadcast   : 1044430688
> > broadcast   : 1044477630
> > broadcast   : 1253554603
> > broadcast   : 1044561934
> > average : 1096756213
> >
> > But however broadcast has larger size.
> >
> > the size diff
> >
> > size broadcast.o
> >textdata bss dec hex filename
> > 137   0   0 137  89 broadcast.o
> >
> > size memory.o
> >textdata bss dec hex filename
> > 115   0   0 115  73 memory.o
> >
> > Bootstrapped and regtested on x86_64-linux-gnu{-m32,}
> >
> > gcc/ChangeLog:
> >
> > * config/i386/i386-expand.c
> > (ix86_broadcast_from_integer_constant): Rename to ..
> > (ix86_broadcast_from_constant): .. this, and extend it to
> > handle float mode.
> > (ix86_expand_vector_move): Extend to float mode.
> > * config/i386/i386-features.c
> > (replace_constant_pool_with_broadcast): Remove.
> > (remove_partial_avx_dependency_gate): Ditto.
> > (constant_pool_broadcast): Ditto.
> > (class pass_constant_pool_broadcast): Ditto.
> > (make_pass_constant_pool_broadcast): Ditto.
> > (remove_partial_avx_dependency): Adjust gate.
> > * config/i386/i386-passes.def: Remove pass_constant_pool_broadcast.
> > * config/i386/i386-protos.h
> > (make_pass_constant_pool_broadcast): Remove.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/i386/fuse-caller-save-xmm.c: Adjust testcase.
> > ---
> >  gcc/config/i386/i386-expand.c |  29 +++-
> >  gcc/config/i386/i386-features.c   | 157 +-
> >  gcc/config/i386/i386-passes.def   |   1 -
> >  gcc/config/i386/i386-protos.h |   1 -
> >  .../gcc.target/i386/fuse-caller-save-xmm.c|   2 +-
> >  5 files changed, 26 insertions(+), 164 deletions(-)
> >
> > diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
> > index 69ea79e6123..ba870145acd 100644
> > --- a/gcc/config/i386/i386-expand.c
> > +++ b/gcc/config/i386/i386-expand.c
> > @@ -453,8 +453,10 @@ ix86_expand_move (machine_mode mode, rtx operands[])
> >emit_insn (gen_rtx_SET (op0, op1));
> >  }
> >
> > +/* OP is a memref of CONST_VECTOR, return scalar constant mem
> > +   if CONST_VECTOR is a vec_duplicate, else return NULL.  */
> >  static rtx
> > -ix86_broadcast_from_integer_constant (machine_mode mode, rtx op)
> > +ix86_broadcast_from_constant (machine_mode mode, rtx op)
> >  {
> >int nunits = GET_MODE_NUNITS (mode);
> >if (nunits < 2)
> > @@ -462,7 +464,8 @@ ix86_broadcast_from_integer_constant (machine_mode 
> > mode, rtx op)
> >
> >/* Don't use integer vector broadcast if we can't move from GPR to SSE
> >   register directly.  */
> > -  if (!TARGET_INTER_UNIT_MOVES_TO_VEC)
> > +  if (!TARGET_INTER_UNIT_MOVES_TO_VEC
> > +  && INTEGRAL_MODE_P (mode))
> >  return nullptr;
> >
> >/* Convert CONST_VECTOR to a non-standard SSE constant integer
> > @@ -470,12 +473,17 @@ ix86_broadcast_from_integer_constant (machine_mode 
> > mode, rtx op)
> >if (!(TARGET_AVX2
> > || (TARGET_AVX
> > && (GET_MODE_INNER (mode) == SImode
> > -   || GET_MODE_INNER (mode) == DImode)))
> > +   || GET_MODE_INNER (mode) == DImode))
> > +   || FLOAT_MODE_P (mode))
> >|| standard_sse_constant_p (op, mode))
> >  return nullptr;
> >
> > -  /* Don't broadcast from a 64-bit integer constant in 32-bit mode.  */
> > -  if (GET_MODE_INNER (mode) == DImode && !TARGET_64BIT)
> > +  /* Don't broadcast from a 64-bit integer constant in 32-bit mode.
> > + We can still put 64-bit integer constant in memory when
> > + avx512 embed broadcast is available.  */
> > +  if (GET_MODE_INNER (mode) == DImode && !TARGET_64BIT
> > +  && (!TARGET_AVX512F
> > + || (GET_MODE_SIZE (mode) < 64 && !TARGET_AVX512VL)))
> >  return nullptr;
> >
> >if (GET_MODE_INNER (mode) == TImode)
> > @@ -561,17 +569,20 @@ ix86_expand_vector_move (machine_mode mode, rtx 
> > operands[])
> >
> >if (can_create_pseudo_p ()
> >&& GET_MODE_SIZE (mode) >= 16
> > -  && GET_MODE_CLASS (mode) == MODE_VECTOR_INT
> > +  

PING^1 [PATCH v2] x86: Check AVX512 without mask instructions

2021-07-14 Thread H.J. Lu via Gcc-patches
On Fri, Jun 25, 2021 at 5:39 AM H.J. Lu  wrote:
>
> On Fri, Jun 25, 2021 at 12:50 AM Uros Bizjak  wrote:
> >
> > On Fri, Jun 25, 2021 at 4:51 AM Hongtao Liu  wrote:
> > >
> > > On Fri, Jun 25, 2021 at 12:13 AM Uros Bizjak via Gcc-patches
> > >  wrote:
> > > >
> > > > On Thu, Jun 24, 2021 at 2:12 PM H.J. Lu  wrote:
> > > > >
> > > > > CPUID functions are used to detect CPU features.  If vector ISAs
> > > > > are enabled, compiler is free to use them in these functions.  Add
> > > > > __attribute__ ((target("general-regs-only"))) to CPUID functions
> > > > > to avoid vector instructions.
> > > >
> > > > These functions are intended to be inlined, so how does target
> > > > attribute affect inlining?
> > > I guess w/ -O0. they may not be inlined, that's why H.J adds those
> > > attributes to those functions.
> >
> > The problem is not with these functions, but with surrounding checks
> > for cpuid features. These checks are implemented with logic
> > instructions, and nothing prevents RA from allocating mask registers,
> > and consequently mask insn is emitted. Regarding mentioned functions,
> > cpuid insn pattern has four GPR single-reg constraints, so mask
> > registers can't be allocated here.
> >
> > > pr96814.dump:
> > > 0804aa40 :
> > >  804aa40: 8d 4c 24 04  lea0x4(%esp),%ecx
> > > ...
> > >  804aa63: 6a 07push   $0x7
> > >  804aa65: e8 e0 e7 ff ffcall   804924a <__get_cpuid_count>
> > >
> > > Also we need to add a target attribute to avx512f_os_support (), and
> > > that would be enough to fix the AVX512 part.
> > >
> > > Moreover, all check functions in below files may also need to deal with:
> > > adx-check.h
> > > aes-avx-check.h
> > > aes-check.h
> > > amx-check.h
> > > attr-nocf-check-1a.c
> > > attr-nocf-check-3a.c
> > > avx2-check.h
> > > avx2-vpop-check.h
> > > avx512bw-check.h
> > > avx512-check.h
> > > avx512dq-check.h
> > > avx512er-check.h
> > > avx512f-check.h
> > > avx512vl-check.h
> > > avx-check.h
> > > bmi2-check.h
> > > bmi-check.h
> > > cf_check-1.c
> > > cf_check-2.c
> > > cf_check-3.c
> > > cf_check-4.c
> > > cf_check-5.c
> > > f16c-check.h
> > > fma4-check.h
> > > fma-check.h
> > > isa-check.h
> > > lzcnt-check.h
> > > m128-check.h
> > > m256-check.h
> > > m512-check.h
> > > mmx-3dnow-check.h
> > > mmx-check.h
> > > pclmul-avx-check.h
> > > pclmul-check.h
> > > pr39315-check.c
> > > rtm-check.h
> > > sha-check.h
> > > spellcheck-options-1.c
> > > spellcheck-options-2.c
> > > spellcheck-options-3.c
> > > spellcheck-options-4.c
> > > spellcheck-options-5.c
> > > sse2-check.h
> > > sse3-check.h
> > > sse4_1-check.h
> > > sse4_2-check.h
> > > sse4a-check.h
> > > sse-check.h
> > > ssse3-check.h
> > > stack-check-11.c
> > > stack-check-12.c
> > > stack-check-17.c
> > > stack-check-18.c
> > > stack-check-19.c
> > > xop-check.h
> >
> > True, but this would just paper over the real problem. Now, it is
> > expected that the user decorates the function that checks CPUID
> > features with the target attribute. I'm not sure if this is OK.
> >
> > Uros.
>
> CPUID functions are used to detect CPU features.  If mask instructions
> are enabled, compiler is free to use them in these functions.  Disable
> AVX512F in AVX512 check with target pragma to avoid mask instructions.
>
> OK for master?
>

PING:

https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573717.html


-- 
H.J.


Re: [RFC/PATCH] vect: Recog mul_highpart pattern

2021-07-14 Thread Richard Sandiford via Gcc-patches
"Kewen.Lin"  writes:
> Hi Richard,
>
> on 2021/7/14 下午4:38, Richard Sandiford wrote:
>> "Kewen.Lin"  writes:
>>> gcc/ChangeLog:
>>>
>>> * internal-fn.c (first_commutative_argument): Add info for IFN_MULH.
>>> * internal-fn.def (IFN_MULH): New internal function.
>>> * tree-vect-patterns.c (vect_recog_mulhs_pattern): Add support to
>>> recog normal multiply highpart as IFN_MULH.
>> 
>> LGTM FWIW, although:
>> 
>
> Thanks for the review!
>
>>> @@ -2030,8 +2048,7 @@ vect_recog_mulhs_pattern (vec_info *vinfo,
>>>/* Check for target support.  */
>>>tree new_vectype = get_vectype_for_scalar_type (vinfo, new_type);
>>>if (!new_vectype
>>> -  || !direct_internal_fn_supported_p
>>> -   (ifn, new_vectype, OPTIMIZE_FOR_SPEED))
>>> +  || !direct_internal_fn_supported_p (ifn, new_vectype, 
>>> OPTIMIZE_FOR_SPEED))
>>>  return NULL;
>>>  
>>>/* The IR requires a valid vector type for the cast result, even though
>>> @@ -2043,8 +2060,8 @@ vect_recog_mulhs_pattern (vec_info *vinfo,
>>>/* Generate the IFN_MULHRS call.  */
>>>tree new_var = vect_recog_temp_ssa_var (new_type, NULL);
>>>tree new_ops[2];
>>> -  vect_convert_inputs (vinfo, last_stmt_info, 2, new_ops, new_type,
>>> -  unprom_mult, new_vectype);
>>> +  vect_convert_inputs (vinfo, last_stmt_info, 2, new_ops, new_type, 
>>> unprom_mult,
>>> +  new_vectype);
>>>gcall *mulhrs_stmt
>>>  = gimple_build_call_internal (ifn, 2, new_ops[0], new_ops[1]);
>>>gimple_call_set_lhs (mulhrs_stmt, new_var);
>> 
>> …these changes look like formatting only.  (I guess it's down to whether
>> or not the 80th column should be kept free for an “end of line+1” cursor.)
>> 
>
> Yeah, just for formatting, the formatting tool (clang-format) reformatted
> them.  Thanks for the information on "end of line+1" cursor, I didn't know
> that before.  I guess you prefer me to keep the original format?  If so I
> will remove them when committing it.  I was thinking whether I should change
> field ColumnLimit of my .clang-format to 79 to avoid this kind of case to
> be caught by formatting tool again.  Hope reviewers won't nit-pick the exact
> 80 column cases then. :)

TBH, 79 vs. 80 isn't normally something I'd worry about when reviewing
new code.  But I know in the past people have asked for 79 to be used
for the “end+1” reason, so I don't think we should “fix” existing code
that honours the 79 limit so that it no longer does, especially when the
lines surrounding the code aren't changing.

There's also a risk of yo-yo-ing if someone else is using clang-format
and does have the limit set to 79 columns.

So yeah, I think it'd better to commit without the two hunks above.

Thanks,
Richard


Re: [PATCH 4/4] pass location to md_asm_adjust

2021-07-14 Thread Richard Biener via Gcc-patches
On Wed, Jul 14, 2021 at 10:21 AM Trevor Saunders  wrote:
>
> So the hook can use it as the location of diagnostics.
>
> bootstrapped and regtested on x86_64-linux-gnu, also tested one make all-gcc 
> for each effected cpu, ok?

OK.

Richard.

> Trev
>
> gcc/ChangeLog:
>
> * cfgexpand.c (expand_asm_loc): Adjust.
> (expand_asm_stmt): Likewise.
> * config/arm/aarch-common-protos.h (arm_md_asm_adjust): Likewise.
> * config/arm/aarch-common.c (arm_md_asm_adjust): Likewise.
> * config/arm/arm.c (thumb1_md_asm_adjust): Likewise.
> * config/avr/avr.c (avr_md_asm_adjust): Likewise.
> * config/cris/cris.c (cris_md_asm_adjust): Likewise.
> * config/i386/i386.c (ix86_md_asm_adjust): Likewise.
> * config/mn10300/mn10300.c (mn10300_md_asm_adjust): Likewise.
> * config/nds32/nds32.c (nds32_md_asm_adjust): Likewise.
> * config/pdp11/pdp11.c (pdp11_md_asm_adjust): Likewise.
> * config/rs6000/rs6000.c (rs6000_md_asm_adjust): Likewise.
> * config/s390/s390.c (s390_md_asm_adjust): Likewise.
> * config/vax/vax.c (vax_md_asm_adjust): Likewise.
> * config/visium/visium.c (visium_md_asm_adjust): Likewise.
> * doc/tm.texi: Regenerate.
> * target.def: Add location argument to md_asm_adjust.
> ---
>  gcc/cfgexpand.c  | 9 +
>  gcc/config/arm/aarch-common-protos.h | 3 ++-
>  gcc/config/arm/aarch-common.c| 8 
>  gcc/config/arm/arm.c | 4 ++--
>  gcc/config/avr/avr.c | 3 ++-
>  gcc/config/cris/cris.c   | 4 ++--
>  gcc/config/i386/i386.c   | 8 
>  gcc/config/mn10300/mn10300.c | 2 +-
>  gcc/config/nds32/nds32.c | 3 ++-
>  gcc/config/pdp11/pdp11.c | 4 ++--
>  gcc/config/rs6000/rs6000.c   | 2 +-
>  gcc/config/s390/s390.c   | 2 +-
>  gcc/config/vax/vax.c | 5 +++--
>  gcc/config/visium/visium.c   | 4 ++--
>  gcc/doc/tm.texi  | 5 +++--
>  gcc/target.def   | 5 +++--
>  16 files changed, 39 insertions(+), 32 deletions(-)
>
> diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
> index 46f2960c491..88fd7014941 100644
> --- a/gcc/cfgexpand.c
> +++ b/gcc/cfgexpand.c
> @@ -2897,7 +2897,8 @@ expand_asm_loc (tree string, int vol, location_t locus)
>
>if (targetm.md_asm_adjust)
> targetm.md_asm_adjust (output_rvec, input_rvec, input_mode,
> -  constraints, clobber_rvec, clobbered_regs);
> +  constraints, clobber_rvec, clobbered_regs,
> +  locus);
>
>asm_op = body;
>nclobbers = clobber_rvec.length ();
> @@ -3074,8 +3075,7 @@ expand_asm_stmt (gasm *stmt)
>return;
>  }
>
> -  /* There are some legacy diagnostics in here, and also avoids an extra
> - parameter to targetm.md_asm_adjust.  */
> +  /* There are some legacy diagnostics in here.  */
>save_input_location s_i_l(locus);
>
>unsigned noutputs = gimple_asm_noutputs (stmt);
> @@ -3456,7 +3456,8 @@ expand_asm_stmt (gasm *stmt)
>if (targetm.md_asm_adjust)
>  after_md_seq
> = targetm.md_asm_adjust (output_rvec, input_rvec, input_mode,
> -constraints, clobber_rvec, clobbered_regs);
> +constraints, clobber_rvec, clobbered_regs,
> +locus);
>
>/* Do not allow the hook to change the output and input count,
>   lest it mess up the operand numbering.  */
> diff --git a/gcc/config/arm/aarch-common-protos.h 
> b/gcc/config/arm/aarch-common-protos.h
> index b6171e8668d..6be5fb1e083 100644
> --- a/gcc/config/arm/aarch-common-protos.h
> +++ b/gcc/config/arm/aarch-common-protos.h
> @@ -147,6 +147,7 @@ struct cpu_cost_table
>  rtx_insn *arm_md_asm_adjust (vec &outputs, vec & /*inputs*/,
>  vec & /*input_modes*/,
>  vec &constraints,
> -vec &clobbers, HARD_REG_SET 
> &clobbered_regs);
> +vec &clobbers, HARD_REG_SET &clobbered_regs,
> +location_t loc);
>
>  #endif /* GCC_AARCH_COMMON_PROTOS_H */
> diff --git a/gcc/config/arm/aarch-common.c b/gcc/config/arm/aarch-common.c
> index 0dbdc56f542..67343fe4025 100644
> --- a/gcc/config/arm/aarch-common.c
> +++ b/gcc/config/arm/aarch-common.c
> @@ -534,7 +534,7 @@ rtx_insn *
>  arm_md_asm_adjust (vec &outputs, vec & /*inputs*/,
>vec & /*input_modes*/,
>vec &constraints, vec & /*clobbers*/,
> -  HARD_REG_SET & /*clobbered_regs*/)
> +  HARD_REG_SET & /*clobbered_regs*/, location_t loc)
>  {
>bool saw_asm_flag = false;
>
> @@ -547,7 +547,7 @@ arm_md_asm_adjust (vec &outputs, vec & 
> /*inputs*/,
>con += 4;
>if (strchr (con, ',') != NULL)
> {
> -

Re: [PATCH 3/4] use diagnostic location in diagnostic_report_current_function

2021-07-14 Thread Richard Biener via Gcc-patches
On Wed, Jul 14, 2021 at 10:21 AM Trevor Saunders  wrote:
>
> It appears that input_location was used here before the diagnostic's location
> was available, and never updated, when the other part of the header was added
> that uses it, so this makes it consistent.
>
> bootstrapped and regtested on x86_64-linux-gnu, ok?

OK.

Thanks,
Richard.

> Trev
>
> gcc/ChangeLog:
>
> * tree-diagnostic.c (diagnostic_report_current_function): Use the
> diagnostic's location, not input_location.
> ---
>  gcc/tree-diagnostic.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/tree-diagnostic.c b/gcc/tree-diagnostic.c
> index 8bb214b2cf5..705da94637d 100644
> --- a/gcc/tree-diagnostic.c
> +++ b/gcc/tree-diagnostic.c
> @@ -36,9 +36,9 @@ void
>  diagnostic_report_current_function (diagnostic_context *context,
> diagnostic_info *diagnostic)
>  {
> -  diagnostic_report_current_module (context, diagnostic_location 
> (diagnostic));
> -  lang_hooks.print_error_function (context, LOCATION_FILE (input_location),
> -  diagnostic);
> +  location_t loc = diagnostic_location (diagnostic);
> +  diagnostic_report_current_module (context, loc);
> +  lang_hooks.print_error_function (context, LOCATION_FILE (loc), diagnostic);
>  }
>
>  static void
> --
> 2.20.1
>


Re: [PATCH 2/4] use error_at and warning_at in cfgexpand.c

2021-07-14 Thread Richard Biener via Gcc-patches
On Wed, Jul 14, 2021 at 10:20 AM Trevor Saunders  wrote:
>
> bootstrapped and regtested on x86_64-linux-gnu, ok?

OK.

Thanks,
Richard.

> Trev
>
> gcc/ChangeLog:
>
> * cfgexpand.c (tree_conflicts_with_clobbers_p): Pass location to
> diagnostics.
> (expand_asm_stmt): Likewise.
> ---
>  gcc/cfgexpand.c | 35 ++-
>  1 file changed, 18 insertions(+), 17 deletions(-)
>
> diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
> index fea8c837c80..46f2960c491 100644
> --- a/gcc/cfgexpand.c
> +++ b/gcc/cfgexpand.c
> @@ -2954,7 +2954,8 @@ check_operand_nalternatives (const vec 
> &constraints)
> variable definition for error, NULL_TREE for ok.  */
>
>  static bool
> -tree_conflicts_with_clobbers_p (tree t, HARD_REG_SET *clobbered_regs)
> +tree_conflicts_with_clobbers_p (tree t, HARD_REG_SET *clobbered_regs,
> +   location_t loc)
>  {
>/* Conflicts between asm-declared register variables and the clobber
>   list are not allowed.  */
> @@ -2962,9 +2963,8 @@ tree_conflicts_with_clobbers_p (tree t, HARD_REG_SET 
> *clobbered_regs)
>
>if (overlap)
>  {
> -  error ("% specifier for variable %qE conflicts with "
> -"% clobber list",
> -DECL_NAME (overlap));
> +  error_at (loc, "% specifier for variable %qE conflicts with "
> +   "% clobber list", DECL_NAME (overlap));
>
>/* Reset registerness to stop multiple errors emitted for a single
>  variable.  */
> @@ -3087,7 +3087,7 @@ expand_asm_stmt (gasm *stmt)
>/* ??? Diagnose during gimplification?  */
>if (ninputs + noutputs + nlabels > MAX_RECOG_OPERANDS)
>  {
> -  error ("more than %d operands in %", MAX_RECOG_OPERANDS);
> +  error_at (locus, "more than %d operands in %", 
> MAX_RECOG_OPERANDS);
>return;
>  }
>
> @@ -3140,7 +3140,8 @@ expand_asm_stmt (gasm *stmt)
>   if (j == -2)
> {
>   /* ??? Diagnose during gimplification?  */
> - error ("unknown register name %qs in %", regname);
> + error_at (locus, "unknown register name %qs in %",
> +   regname);
>   error_seen = true;
> }
>   else if (j == -4)
> @@ -3205,7 +3206,8 @@ expand_asm_stmt (gasm *stmt)
> && HARD_REGISTER_P (DECL_RTL (output_tvec[j]))
> && output_hregno == REGNO (DECL_RTL (output_tvec[j])))
>   {
> -   error ("invalid hard register usage between output operands");
> +   error_at (locus, "invalid hard register usage between output "
> + "operands");
> error_seen = true;
>   }
>
> @@ -3231,16 +3233,16 @@ expand_asm_stmt (gasm *stmt)
> if (i == match
> && output_hregno != input_hregno)
>   {
> -   error ("invalid hard register usage between output "
> -  "operand and matching constraint operand");
> +   error_at (locus, "invalid hard register usage between "
> + "output operand and matching constraint 
> operand");
> error_seen = true;
>   }
> else if (early_clobber_p
>  && i != match
>  && output_hregno == input_hregno)
>   {
> -   error ("invalid hard register usage between "
> -  "earlyclobber operand and input operand");
> +   error_at (locus, "invalid hard register usage between "
> + "earlyclobber operand and input operand");
> error_seen = true;
>   }
>   }
> @@ -3319,7 +3321,7 @@ expand_asm_stmt (gasm *stmt)
>
>   if (! allows_reg && !MEM_P (op))
> {
> - error ("output number %d not directly addressable", i);
> + error_at (locus, "output number %d not directly addressable", 
> i);
>   error_seen = true;
> }
>   if ((! allows_mem && MEM_P (op) && GET_MODE (op) != BLKmode)
> @@ -3415,9 +3417,8 @@ expand_asm_stmt (gasm *stmt)
>   if (allows_reg && TYPE_MODE (type) != BLKmode)
> op = force_reg (TYPE_MODE (type), op);
>   else if (!allows_mem)
> -   warning (0, "% operand %d probably does not match "
> -"constraints",
> -i + noutputs);
> +   warning_at (locus, 0, "% operand %d probably does not match 
> "
> +   "constraints", i + noutputs);
>   else if (MEM_P (op))
> {
>   /* We won't recognize either volatile memory or memory
> @@ -3471,10 +3472,10 @@ expand_asm_stmt (gasm *stmt)
>
>bool clobber_conflict_found = 0;
>for (i = 0; i < noutputs; ++i)
> -if (tree_conflic

Re: [PATCH 1/4] force decls to be allocated through build_decl to initialize them

2021-07-14 Thread Richard Biener via Gcc-patches
On Wed, Jul 14, 2021 at 10:20 AM Trevor Saunders  wrote:
>
> prior to this commit all calls to build_decl used input_location, even if
> temporarily  until build_decl reset the location to something else that it was
> told was the proper location.  To avoid using the global we need the caller to
> pass in the location it wants, however that's not possible with make_node 
> since
> it makes other types of nodes.  So we force all callers who wish to make a 
> decl
> to go through build_decl which already takes a location argument.  To avoid
> changing behavior this just explicitly passes in input_location to build_decl
> for callers of make_node that create a decl, however it would seem in many of
> these cases that the location of the decl being coppied might be a better
> location.
>
> bootstrapped and regtested on x86_64-linux-gnu, ok?

I think all eventually DECL_ARTIFICIAL decls should better use
UNKNOWN_LOCATION instead of input_location.

I'm not sure if I like the (transitional) extra arg to make_node, I suppose
we could hide make_node by declaring it in tree-raw.h or so or by
guarding the decl with NEED_MAKE_NODE.  There's nothing inherently
wrong with calling make_node.  So what I mean with transitional is that
with this change we should simply set the location to UNKNOWN_LOCATION
(aka zero, which it already is), not input_location, in make_node.

Richard.

> Trev
>
> gcc/ChangeLog:
>
> * cfgexpand.c (avoid_deep_ter_for_debug): Call build_decl not
> make_node.
> (expand_gimple_basic_block): Likewise.
> * ipa-param-manipulation.c (ipa_param_adjustments::modify_call):
> * Likewise.
> (ipa_param_body_adjustments::reset_debug_stmts): Likewise.
> * omp-simd-clone.c (ipa_simd_modify_stmt_ops): Likewise.
> * stor-layout.c (start_bitfield_representative): Likewise.
> * tree-inline.c (remap_ssa_name): Likewise.
> (tree_function_versioning): Likewise.
> * tree-into-ssa.c (rewrite_debug_stmt_uses): Likewise.
> * tree-nested.c (lookup_field_for_decl): Likewise.
> (get_chain_field): Likewise.
> (create_field_for_decl): Likewise.
> (get_nl_goto_field): Likewise.
> (finalize_nesting_tree_1): Likewise.
> * tree-ssa-ccp.c (optimize_atomic_bit_test_and): Likewise.
> * tree-ssa-loop-ivopts.c (remove_unused_ivs): Likewise.
> * tree-ssa-phiopt.c (spaceship_replacement): Likewise.
> * tree-ssa-reassoc.c (make_new_ssa_for_def): Likewise.
> * tree-ssa.c (insert_debug_temp_for_var_def): Likewise.
> * tree-streamer-in.c (streamer_alloc_tree): Adjust.
> * tree.c (make_node): Add argument to specify the caller.
> (build_decl): Move initialization from make_node.
> * tree.h (enum make_node_caller): new enum.
> (make_node): Adjust prototype.
> * varasm.c (make_debug_expr_from_rtl): call build_decl.
>
> gcc/cp/ChangeLog:
>
> * constraint.cc (build_type_constraint): Call build_decl not 
> make_node.
> * cp-gimplify.c (cp_genericize_r): Likewise.
> * parser.c (cp_parser_introduction_list): Likewise.
> * module.cc (trees_in::start): Adjust.
>
> gcc/fortran/ChangeLog:
>
> * trans-decl.c (generate_namelist_decl): Call build_decl not 
> make_node.
> * trans-types.c (gfc_get_array_descr_info): Likewise.
>
> gcc/objc/ChangeLog:
>
> * objc-act.c (objc_add_property_declaration): Call build_decl not
> make_node.
> (maybe_make_artificial_property_decl): Likewise.
> (objc_build_keyword_decl): Likewise.
> (build_method_decl): Likewise.
> ---
>  gcc/cfgexpand.c  |  8 
>  gcc/cp/constraint.cc |  2 +-
>  gcc/cp/cp-gimplify.c |  5 +++--
>  gcc/cp/module.cc |  2 +-
>  gcc/cp/parser.c  |  6 ++
>  gcc/fortran/trans-decl.c |  5 ++---
>  gcc/fortran/trans-types.c|  4 ++--
>  gcc/ipa-param-manipulation.c |  8 
>  gcc/objc/objc-act.c  | 16 ++--
>  gcc/omp-simd-clone.c |  4 ++--
>  gcc/stor-layout.c|  2 +-
>  gcc/tree-inline.c| 13 +++--
>  gcc/tree-into-ssa.c  |  4 ++--
>  gcc/tree-nested.c| 24 ++--
>  gcc/tree-ssa-ccp.c   |  4 ++--
>  gcc/tree-ssa-loop-ivopts.c   |  4 ++--
>  gcc/tree-ssa-phiopt.c|  8 
>  gcc/tree-ssa-reassoc.c   |  4 ++--
>  gcc/tree-ssa.c   |  4 ++--
>  gcc/tree-streamer-in.c   |  2 +-
>  gcc/tree.c   | 35 ++-
>  gcc/tree.h   | 13 -
>  gcc/varasm.c | 12 ++--
>  23 files changed, 96 insertions(+), 93 deletions(-)
>
> diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
> index 3edd53c37dc..fea8c837c80 100644
> --- a/gcc/cfgexpand.c
> +++ b/gcc/cfgexpand.c
> @@ -4342,10 +4342,10 @@ avoid_deep_ter_for_debug (gimple *stmt, int d

[committed] libstdc++: Add noexcept-specifier to basic_string_view(It, End)

2021-07-14 Thread Jonathan Wakely via Gcc-patches
This adds a conditional noexcept to the C++20 constructor. The
std::to_address call cannot throw, so only taking the difference of the
two iterators can throw.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* include/std/string_view (basic_string_view(It, End)): Add
noexcept-specifier.
* testsuite/21_strings/basic_string_view/cons/char/range.cc:
Check noexcept-specifier. Also check construction without CTAD.

Tested powerpc64le-linux. Committed to trunk.

commit f9c2ce1dae270d8d5dc261a57a21f96a1da5ea2d
Author: Jonathan Wakely 
Date:   Wed Jul 14 11:03:17 2021

libstdc++: Add noexcept-specifier to basic_string_view(It, End)

This adds a conditional noexcept to the C++20 constructor. The
std::to_address call cannot throw, so only taking the difference of the
two iterators can throw.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* include/std/string_view (basic_string_view(It, End)): Add
noexcept-specifier.
* testsuite/21_strings/basic_string_view/cons/char/range.cc:
Check noexcept-specifier. Also check construction without CTAD.

diff --git a/libstdc++-v3/include/std/string_view 
b/libstdc++-v3/include/std/string_view
index 4ea72c6cef2..d8cbee9bee0 100644
--- a/libstdc++-v3/include/std/string_view
+++ b/libstdc++-v3/include/std/string_view
@@ -144,6 +144,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  && (!convertible_to<_End, size_type>)
constexpr
basic_string_view(_It __first, _End __last)
+   noexcept(noexcept(__last - __first))
: _M_len(__last - __first), _M_str(std::to_address(__first))
{ }
 
diff --git 
a/libstdc++-v3/testsuite/21_strings/basic_string_view/cons/char/range.cc 
b/libstdc++-v3/testsuite/21_strings/basic_string_view/cons/char/range.cc
index 39fcaf52925..7376e2fa3f4 100644
--- a/libstdc++-v3/testsuite/21_strings/basic_string_view/cons/char/range.cc
+++ b/libstdc++-v3/testsuite/21_strings/basic_string_view/cons/char/range.cc
@@ -15,24 +15,34 @@
 // with this library; see the file COPYING3.  If not see
 // .
 
-// { dg-options "-std=gnu++2a" }
-// { dg-do run { target c++2a } }
+// { dg-options "-std=gnu++20" }
+// { dg-do run { target c++20 } }
 
 #include 
 #include 
 #include 
+#include 
 
 constexpr char str[] = "abcdefg";
-constexpr std::basic_string_view s(std::begin(str), std::cend(str) - 1);
+constexpr std::basic_string_view s(std::begin(str), std::cend(str) - 1);
 static_assert( s == str );
 static_assert( s.data() == str );
+constexpr std::basic_string_view ctad(std::begin(str), std::cend(str) - 1);
+static_assert( ctad == s );
+
+// The standard does not require this constructor to have a noexcept-specifier.
+static_assert( noexcept(std::basic_string_view(str, str)) );
+using I = __gnu_test::contiguous_iterator_wrapper;
+static_assert( ! noexcept(std::basic_string_view(I{}, I{})) );
 
 void
 test01()
 {
   std::vector v{'a', 'b', 'c'};
-  std::basic_string_view s(v.begin(), v.end());
+  std::basic_string_view s(v.begin(), v.end());
   VERIFY( s.data() == v.data() );
+  std::basic_string_view ctad(v.begin(), v.end());
+  VERIFY( ctad == s );
 }
 
 int


Re: contracts library support (was Re: [PATCH] PING implement pre-c++20 contracts)

2021-07-14 Thread Jonathan Wakely via Gcc-patches
On Wed, 14 Jul 2021 at 04:56, Jason Merrill  wrote:
>
> On 7/12/21 3:58 PM, Jonathan Wakely wrote:
> > On Mon, 5 Jul 2021 at 20:07, Jason Merrill  wrote:
> >>
> >> On 6/26/21 10:23 AM, Andrew Sutton wrote:
> >>>
> >>> I ended up taking over this work from Jeff (CC'd on his existing email
> >>> address). I scraped all the contracts changes into one big patch
> >>> against master. See attached. The ChangeLog.contracts files list the
> >>> sum of changes for the patch, not the full history of the work.
> >>
> >> Jonathan, can you advise where the library support should go?
> >>
> >> In N4820  was part of the language-support clause, which makes
> >> sense, but it uses string_view, which brings in a lot of the rest of the
> >> library.  Did LWG talk about this when contracts went in?  How are
> >> freestanding implementations expected to support contracts?
> >
> > I don't recall that being discussed, but I think I was in another room
> > for much of the contracts review.
> >
> > If necessary we could make the std::char_traits specialization
> > available freestanding, without the primary template (or the other
> > specializations). But since C++20 std::string_view also depends on
> > quite a lot of ranges, which depends on iterators, which is not
> > freestanding. Some of those dependencies were added more recently than
> > contracts was reviewed and then yanked out, so maybe wasn't considered
> > a big problem back then. In any case, depending on std::string_view
> > (even without the rest of std::basic_string_view) is not currently
> > possible for freestanding.
>
> I guess I'll change string_view to const char* for now.

I think that's best. Making std::string_view usable would take some work.

> >> I imagine the header should be  for now.
> >
> > Agreed.
>
> And the type std::experimental::??::contract_violation.  Maybe
> contracts_v1 for the inline namespace?

LGTM

> Did you have any thoughts about the violation handler?  Is it OK to add
> a default definition to the library, in the above namespace?

I'd rather not have any std::experimental::* symbols go into the DSO.
For std::experimental::filesystem we added libstdc++fs.a, with no
corresponding .so library, which users need to link to explicitly to
use that TS. Would something like libstdc++contracts.a work here? Is
it just one symbol?

Aside: Ulrich Drepper suggested recently that the driver should have
been updated to automatically add -lstdc++fs so that using
 was seamless, as the archive contents
wouldn't be used unless something in the program referred to the
symbols in it.

Is just using std::terminate as the handler viable? Or if we're sure
contracts in some form will go into the IS eventually, and the
signature won't change, we could just add it in __cxxabiv1:: as you
suggested earlier.



Re: [PING][PATCH] define auto_vec copy ctor and assignment (PR 90904)

2021-07-14 Thread Jonathan Wakely via Gcc-patches
On Wed, 14 Jul 2021 at 04:39, Jason Merrill  wrote:
>
> On 7/13/21 4:02 PM, Martin Sebor wrote:
> > On 7/13/21 12:37 PM, Jason Merrill wrote:
> >> On 7/13/21 10:08 AM, Jonathan Wakely wrote:
> >>> On Mon, 12 Jul 2021 at 12:02, Richard Biener wrote:
>  Somebody with more C++ knowledge than me needs to approve the
>  vec.h changes - I don't feel competent to assess all effects of the
>  change.
> >>>
> >>> They look OK to me except for:
> >>>
> >>> -extern vnull vNULL;
> >>> +static constexpr vnull vNULL{ };
> >>>
> >>> Making vNULL have static linkage can make it an ODR violation to use
> >>> vNULL in templates and inline functions, because different
> >>> instantiations will refer to a different "vNULL" in each translation
> >>> unit.
> >>
> >> The ODR says this is OK because it's a literal constant with the same
> >> value (6.2/12.2.1).
> >>
> >> But it would be better without the explicit 'static'; then in C++17
> >> it's implicitly inline instead of static.
> >
> > I'll remove the static.
> >
> >>
> >> But then, do we really want to keep vNULL at all?  It's a weird
> >> blurring of the object/pointer boundary that is also dependent on vec
> >> being a thin wrapper around a pointer.  In almost all cases it can be
> >> replaced with {}; one exception is == comparison, where it seems to be
> >> testing that the embedded pointer is null, which is a weird thing to
> >> want to test.
> >
> > The one use case I know of for vNULL where I can't think of
> > an equally good substitute is in passing a vec as an argument by
> > value.  The only way to do that that I can think of is to name
> > the full vec type (i.e., the specialization) which is more typing
> > and less generic than vNULL.  I don't use vNULL myself so I wouldn't
> > miss this trick if it were to be removed but others might feel
> > differently.
>
> In C++11, it can be replaced by {} in that context as well.

Or if people don't like that, you could add a constructor taking
std::nullptr_t and an equality comparison with std::nullptr_t and then
use nullptr instead of vNULL.

I think just using {} for an empty, value-initialized vec makes more
sense though.



[PATCH] tree-optimization/101445 - fix negative stride SLP vect with gaps

2021-07-14 Thread Richard Biener
The following fixes the IV adjustment for the gap in a negative
stride SLP vectorization.  The adjustment was in the wrong direction,
now fixes as in the patch.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-07-14  Richard Biener  

PR tree-optimization/101445
* tree-vect-stmts.c (vectorizable_load): Do the gap adjustment
of the IV in the correct direction for negative stride
accesses.

* gcc.dg/vect/pr101445.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/pr101445.c | 28 
 gcc/tree-vect-stmts.c|  6 ++
 2 files changed, 34 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr101445.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr101445.c 
b/gcc/testsuite/gcc.dg/vect/pr101445.c
new file mode 100644
index 000..f8a6e9ce6f7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr101445.c
@@ -0,0 +1,28 @@
+/* { dg-do run } */
+
+#include "tree-vect.h"
+
+int a[35] = { 1, 1, 3 };
+
+void __attribute__((noipa))
+foo ()
+{
+  for (int b = 4; b >= 0; b--)
+{
+  int tem = a[b * 5 + 3 + 1];
+  a[b * 5 + 3] = tem;
+  a[b * 5 + 2] = tem;
+  a[b * 5 + 1] = tem;
+  a[b * 5 + 0] = tem;
+}
+}
+
+int main()
+{
+  check_vect ();
+  foo ();
+  for (int d = 0; d < 25; d++)
+if (a[d] != 0)
+  __builtin_abort ();
+  return 0;
+}
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index e590f34d75d..3980f0918b2 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -9759,6 +9759,9 @@ vectorizable_load (vec_info *vinfo,
  poly_wide_int bump_val
= (wi::to_wide (TYPE_SIZE_UNIT (elem_type))
   * group_gap_adj);
+ if (tree_int_cst_sgn
+   (vect_dr_behavior (vinfo, dr_info)->step) == -1)
+   bump_val = -bump_val;
  tree bump = wide_int_to_tree (sizetype, bump_val);
  dataref_ptr = bump_vector_ptr (vinfo, dataref_ptr, ptr_incr,
 gsi, stmt_info, bump);
@@ -9772,6 +9775,9 @@ vectorizable_load (vec_info *vinfo,
  poly_wide_int bump_val
= (wi::to_wide (TYPE_SIZE_UNIT (elem_type))
   * group_gap_adj);
+ if (tree_int_cst_sgn
+   (vect_dr_behavior (vinfo, dr_info)->step) == -1)
+   bump_val = -bump_val;
  tree bump = wide_int_to_tree (sizetype, bump_val);
  dataref_ptr = bump_vector_ptr (vinfo, dataref_ptr, ptr_incr, gsi,
 stmt_info, bump);
-- 
2.26.2


Re: [PATCH V2] Use preferred mode for doloop iv [PR61837].

2021-07-14 Thread guojiufu via Gcc-patches

On 2021-07-14 12:40, guojiufu via Gcc-patches wrote:
Updated the patch as below:
Thanks for comments.

gcc/ChangeLog:

2021-07-13  Jiufu Guo  

PR target/61837
* config/rs6000/rs6000.c (TARGET_PREFERRED_DOLOOP_MODE): New hook.
(rs6000_preferred_doloop_mode): New hook.
* doc/tm.texi: Regenerate.
* doc/tm.texi.in: Add hook preferred_doloop_mode.
* target.def (preferred_doloop_mode): New hook.
* targhooks.c (default_preferred_doloop_mode): New hook.
* targhooks.h (default_preferred_doloop_mode): New hook.
* tree-ssa-loop-ivopts.c (compute_doloop_base_on_mode): New function.
(add_iv_candidate_for_doloop): Call targetm.preferred_doloop_mode
and compute_doloop_base_on_mode.

gcc/testsuite/ChangeLog:

2021-07-13  Jiufu Guo  

PR target/61837
* gcc.target/powerpc/pr61837.c: New test.
---
 gcc/config/rs6000/rs6000.c | 11 
 gcc/doc/tm.texi| 10 
 gcc/doc/tm.texi.in |  2 +
 gcc/target.def | 14 +
 gcc/targhooks.c|  8 +++
 gcc/targhooks.h|  1 +
 gcc/testsuite/gcc.target/powerpc/pr61837.c | 20 +++
 gcc/tree-ssa-loop-ivopts.c | 67 +-
 8 files changed, 131 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr61837.c

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 9a5db63d0ef..3bdf0cb97a3 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -1700,6 +1700,9 @@ static const struct attribute_spec 
rs6000_attribute_table[] =

 #undef TARGET_DOLOOP_COST_FOR_ADDRESS
 #define TARGET_DOLOOP_COST_FOR_ADDRESS 10

+#undef TARGET_PREFERRED_DOLOOP_MODE
+#define TARGET_PREFERRED_DOLOOP_MODE rs6000_preferred_doloop_mode
+
 #undef TARGET_ATOMIC_ASSIGN_EXPAND_FENV
 #define TARGET_ATOMIC_ASSIGN_EXPAND_FENV 
rs6000_atomic_assign_expand_fenv


@@ -27867,6 +27870,14 @@ rs6000_predict_doloop_p (struct loop *loop)
   return true;
 }

+/* Implement TARGET_PREFERRED_DOLOOP_MODE. */
+
+static machine_mode
+rs6000_preferred_doloop_mode (machine_mode)
+{
+  return word_mode;
+}
+
 /* Implement TARGET_CANNOT_SUBSTITUTE_MEM_EQUIV_P.  */

 static bool
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 2a41ae5fba1..4fb516169dc 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -11984,6 +11984,16 @@ By default, the RTL loop optimizer does not use 
a present doloop pattern for

 loops containing function calls or branch on table instructions.
 @end deftypefn

+@deftypefn {Target Hook} machine_mode TARGET_PREFERRED_DOLOOP_MODE 
(machine_mode @var{mode})

+This hook takes a @var{mode} which is the original mode of doloop IV.
+And if the target prefers other mode for doloop IV, this hook returns 
the

+preferred mode.
+For example, on 64bit target, DImode may be preferred than SImode.
+This hook could return the original mode itself if the target prefer to
+keep the original mode.
+The origianl mode and return mode should be MODE_INT.
+@end deftypefn
+
 @deftypefn {Target Hook} bool TARGET_LEGITIMATE_COMBINED_INSN (rtx_insn 
*@var{insn})
 Take an instruction in @var{insn} and return @code{false} if the 
instruction

 is not appropriate as a combination of two or more instructions.  The
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index f881cdabe9e..38215149a92 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -7917,6 +7917,8 @@ to by @var{ce_info}.

 @hook TARGET_INVALID_WITHIN_DOLOOP

+@hook TARGET_PREFERRED_DOLOOP_MODE
+
 @hook TARGET_LEGITIMATE_COMBINED_INSN

 @hook TARGET_CAN_FOLLOW_JUMP
diff --git a/gcc/target.def b/gcc/target.def
index c009671c583..1b6c9872807 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -4454,6 +4454,20 @@ loops containing function calls or branch on 
table instructions.",

  const char *, (const rtx_insn *insn),
  default_invalid_within_doloop)

+/* Returns the machine mode which the target prefers for doloop IV.  */
+DEFHOOK
+(preferred_doloop_mode,
+ "This hook takes a @var{mode} which is the original mode of doloop 
IV.\n\
+And if the target prefers another mode for doloop IV, this hook returns 
the\n\

+preferred mode.\n\
+For example, on 64bit target, DImode may be preferred than SImode.\n\
+This hook could return the original mode itself if the target prefer 
to\n\

+keep the original mode.\n\
+The original mode and return mode should be MODE_INT.",
+ machine_mode,
+ (machine_mode mode),
+ default_preferred_doloop_mode)
+
 /* Returns true for a legitimate combined insn.  */
 DEFHOOK
 (legitimate_combined_insn,
diff --git a/gcc/targhooks.c b/gcc/targhooks.c
index 44a1facedcf..eb5190910dc 100644
--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -660,6 +660,14 @@ default_predict_doloop_p (class loop *loop 
ATTRIBUTE_UNUSED)

   return false;
 }

+/* By default, just use the input MODE itself.  */
+
+machine_mode
+

Re: [RFC/PATCH] vect: Recog mul_highpart pattern

2021-07-14 Thread Kewen.Lin via Gcc-patches
Hi Richard,

on 2021/7/14 下午4:38, Richard Sandiford wrote:
> "Kewen.Lin"  writes:
>> gcc/ChangeLog:
>>
>>  * internal-fn.c (first_commutative_argument): Add info for IFN_MULH.
>>  * internal-fn.def (IFN_MULH): New internal function.
>>  * tree-vect-patterns.c (vect_recog_mulhs_pattern): Add support to
>>  recog normal multiply highpart as IFN_MULH.
> 
> LGTM FWIW, although:
> 

Thanks for the review!

>> @@ -2030,8 +2048,7 @@ vect_recog_mulhs_pattern (vec_info *vinfo,
>>/* Check for target support.  */
>>tree new_vectype = get_vectype_for_scalar_type (vinfo, new_type);
>>if (!new_vectype
>> -  || !direct_internal_fn_supported_p
>> -(ifn, new_vectype, OPTIMIZE_FOR_SPEED))
>> +  || !direct_internal_fn_supported_p (ifn, new_vectype, 
>> OPTIMIZE_FOR_SPEED))
>>  return NULL;
>>  
>>/* The IR requires a valid vector type for the cast result, even though
>> @@ -2043,8 +2060,8 @@ vect_recog_mulhs_pattern (vec_info *vinfo,
>>/* Generate the IFN_MULHRS call.  */
>>tree new_var = vect_recog_temp_ssa_var (new_type, NULL);
>>tree new_ops[2];
>> -  vect_convert_inputs (vinfo, last_stmt_info, 2, new_ops, new_type,
>> -   unprom_mult, new_vectype);
>> +  vect_convert_inputs (vinfo, last_stmt_info, 2, new_ops, new_type, 
>> unprom_mult,
>> +   new_vectype);
>>gcall *mulhrs_stmt
>>  = gimple_build_call_internal (ifn, 2, new_ops[0], new_ops[1]);
>>gimple_call_set_lhs (mulhrs_stmt, new_var);
> 
> …these changes look like formatting only.  (I guess it's down to whether
> or not the 80th column should be kept free for an “end of line+1” cursor.)
> 

Yeah, just for formatting, the formatting tool (clang-format) reformatted
them.  Thanks for the information on "end of line+1" cursor, I didn't know
that before.  I guess you prefer me to keep the original format?  If so I
will remove them when committing it.  I was thinking whether I should change
field ColumnLimit of my .clang-format to 79 to avoid this kind of case to
be caught by formatting tool again.  Hope reviewers won't nit-pick the exact
80 column cases then. :)

BR,
Kewen


Re: [PATCH] PR fortran/100949 - [9/10/11/12 Regression] ICE in gfc_conv_expr_present, at fortran/trans-expr.c:1975

2021-07-14 Thread Thomas Koenig via Gcc-patches

Hi Harald,


we rather shouldn't consider a presence check for a non-dummy variable.

Regtested on x86_64-pc-linux-gnu.  OK for all affected branches?


OK for all.

Thanks for the patch!

Best regards

Thomas


RE: [PATCH AArch64]Use stable sort in generating ldp/stp

2021-07-14 Thread Kyrylo Tkachov via Gcc-patches


> -Original Message-
> From: Gcc-patches  bounces+kyrylo.tkachov=arm@gcc.gnu.org> On Behalf Of bin.cheng via
> Gcc-patches
> Sent: 14 July 2021 10:19
> To: GCC Patches 
> Subject: [PATCH AArch64]Use stable sort in generating ldp/stp
> 
> Hi,
> Like previous patch, this is found when I was playing with stx::simd.  It's an
> obvious
> change as described in commit summary.  Also the dead store in the code
> should be
> optimized away, but I guess there is no guarantee, so here is a simple patch
> fixing it.
> 
> 
> Is it OK?

Ok.
Thanks,
Kyrill

> 
> Thanks,
> bin


[PATCH AArch64]Use stable sort in generating ldp/stp

2021-07-14 Thread bin.cheng via Gcc-patches
Hi,
Like previous patch, this is found when I was playing with stx::simd.  It's an 
obvious
change as described in commit summary.  Also the dead store in the code should 
be
optimized away, but I guess there is no guarantee, so here is a simple patch 
fixing it.


Is it OK?

Thanks,
bin

0002-AArch64-use-stable-sorting-in-generating-ldp-stp.patch
Description: Binary data


0001-Don-t-skip-prologue-instructions-as-it-could-affect-.patch

2021-07-14 Thread bin.cheng via Gcc-patches
Hi,
I ran into a wrong code bug in code with deep template instantiation when 
working on sdx::simd.
The root cause as described in commit summary is we skip prologue insns in 
init_alias_analysis.
This simple patch fixes the issue, however, it's hard to reduce a case because 
of heavy use of
templates.
Bootstrap and test on x86_64, is it OK?

Thanks,
bin

0001-Don-t-skip-prologue-instructions-as-it-could-affect-.patch
Description: Binary data


Re: [RFC/PATCH] vect: Recog mul_highpart pattern

2021-07-14 Thread Richard Sandiford via Gcc-patches
"Kewen.Lin"  writes:
> gcc/ChangeLog:
>
>   * internal-fn.c (first_commutative_argument): Add info for IFN_MULH.
>   * internal-fn.def (IFN_MULH): New internal function.
>   * tree-vect-patterns.c (vect_recog_mulhs_pattern): Add support to
>   recog normal multiply highpart as IFN_MULH.

LGTM FWIW, although:

> @@ -2030,8 +2048,7 @@ vect_recog_mulhs_pattern (vec_info *vinfo,
>/* Check for target support.  */
>tree new_vectype = get_vectype_for_scalar_type (vinfo, new_type);
>if (!new_vectype
> -  || !direct_internal_fn_supported_p
> - (ifn, new_vectype, OPTIMIZE_FOR_SPEED))
> +  || !direct_internal_fn_supported_p (ifn, new_vectype, 
> OPTIMIZE_FOR_SPEED))
>  return NULL;
>  
>/* The IR requires a valid vector type for the cast result, even though
> @@ -2043,8 +2060,8 @@ vect_recog_mulhs_pattern (vec_info *vinfo,
>/* Generate the IFN_MULHRS call.  */
>tree new_var = vect_recog_temp_ssa_var (new_type, NULL);
>tree new_ops[2];
> -  vect_convert_inputs (vinfo, last_stmt_info, 2, new_ops, new_type,
> -unprom_mult, new_vectype);
> +  vect_convert_inputs (vinfo, last_stmt_info, 2, new_ops, new_type, 
> unprom_mult,
> +new_vectype);
>gcall *mulhrs_stmt
>  = gimple_build_call_internal (ifn, 2, new_ops[0], new_ops[1]);
>gimple_call_set_lhs (mulhrs_stmt, new_var);

…these changes look like formatting only.  (I guess it's down to whether
or not the 80th column should be kept free for an “end of line+1” cursor.)

Thanks,
Richard


[PATCH 4/4] pass location to md_asm_adjust

2021-07-14 Thread Trevor Saunders
So the hook can use it as the location of diagnostics.

bootstrapped and regtested on x86_64-linux-gnu, also tested one make all-gcc 
for each effected cpu, ok?

Trev

gcc/ChangeLog:

* cfgexpand.c (expand_asm_loc): Adjust.
(expand_asm_stmt): Likewise.
* config/arm/aarch-common-protos.h (arm_md_asm_adjust): Likewise.
* config/arm/aarch-common.c (arm_md_asm_adjust): Likewise.
* config/arm/arm.c (thumb1_md_asm_adjust): Likewise.
* config/avr/avr.c (avr_md_asm_adjust): Likewise.
* config/cris/cris.c (cris_md_asm_adjust): Likewise.
* config/i386/i386.c (ix86_md_asm_adjust): Likewise.
* config/mn10300/mn10300.c (mn10300_md_asm_adjust): Likewise.
* config/nds32/nds32.c (nds32_md_asm_adjust): Likewise.
* config/pdp11/pdp11.c (pdp11_md_asm_adjust): Likewise.
* config/rs6000/rs6000.c (rs6000_md_asm_adjust): Likewise.
* config/s390/s390.c (s390_md_asm_adjust): Likewise.
* config/vax/vax.c (vax_md_asm_adjust): Likewise.
* config/visium/visium.c (visium_md_asm_adjust): Likewise.
* doc/tm.texi: Regenerate.
* target.def: Add location argument to md_asm_adjust.
---
 gcc/cfgexpand.c  | 9 +
 gcc/config/arm/aarch-common-protos.h | 3 ++-
 gcc/config/arm/aarch-common.c| 8 
 gcc/config/arm/arm.c | 4 ++--
 gcc/config/avr/avr.c | 3 ++-
 gcc/config/cris/cris.c   | 4 ++--
 gcc/config/i386/i386.c   | 8 
 gcc/config/mn10300/mn10300.c | 2 +-
 gcc/config/nds32/nds32.c | 3 ++-
 gcc/config/pdp11/pdp11.c | 4 ++--
 gcc/config/rs6000/rs6000.c   | 2 +-
 gcc/config/s390/s390.c   | 2 +-
 gcc/config/vax/vax.c | 5 +++--
 gcc/config/visium/visium.c   | 4 ++--
 gcc/doc/tm.texi  | 5 +++--
 gcc/target.def   | 5 +++--
 16 files changed, 39 insertions(+), 32 deletions(-)

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index 46f2960c491..88fd7014941 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -2897,7 +2897,8 @@ expand_asm_loc (tree string, int vol, location_t locus)
 
   if (targetm.md_asm_adjust)
targetm.md_asm_adjust (output_rvec, input_rvec, input_mode,
-  constraints, clobber_rvec, clobbered_regs);
+  constraints, clobber_rvec, clobbered_regs,
+  locus);
 
   asm_op = body;
   nclobbers = clobber_rvec.length ();
@@ -3074,8 +3075,7 @@ expand_asm_stmt (gasm *stmt)
   return;
 }
 
-  /* There are some legacy diagnostics in here, and also avoids an extra
- parameter to targetm.md_asm_adjust.  */
+  /* There are some legacy diagnostics in here.  */
   save_input_location s_i_l(locus);
 
   unsigned noutputs = gimple_asm_noutputs (stmt);
@@ -3456,7 +3456,8 @@ expand_asm_stmt (gasm *stmt)
   if (targetm.md_asm_adjust)
 after_md_seq
= targetm.md_asm_adjust (output_rvec, input_rvec, input_mode,
-constraints, clobber_rvec, clobbered_regs);
+constraints, clobber_rvec, clobbered_regs,
+locus);
 
   /* Do not allow the hook to change the output and input count,
  lest it mess up the operand numbering.  */
diff --git a/gcc/config/arm/aarch-common-protos.h 
b/gcc/config/arm/aarch-common-protos.h
index b6171e8668d..6be5fb1e083 100644
--- a/gcc/config/arm/aarch-common-protos.h
+++ b/gcc/config/arm/aarch-common-protos.h
@@ -147,6 +147,7 @@ struct cpu_cost_table
 rtx_insn *arm_md_asm_adjust (vec &outputs, vec & /*inputs*/,
 vec & /*input_modes*/,
 vec &constraints,
-vec &clobbers, HARD_REG_SET &clobbered_regs);
+vec &clobbers, HARD_REG_SET &clobbered_regs,
+location_t loc);
 
 #endif /* GCC_AARCH_COMMON_PROTOS_H */
diff --git a/gcc/config/arm/aarch-common.c b/gcc/config/arm/aarch-common.c
index 0dbdc56f542..67343fe4025 100644
--- a/gcc/config/arm/aarch-common.c
+++ b/gcc/config/arm/aarch-common.c
@@ -534,7 +534,7 @@ rtx_insn *
 arm_md_asm_adjust (vec &outputs, vec & /*inputs*/,
   vec & /*input_modes*/,
   vec &constraints, vec & /*clobbers*/,
-  HARD_REG_SET & /*clobbered_regs*/)
+  HARD_REG_SET & /*clobbered_regs*/, location_t loc)
 {
   bool saw_asm_flag = false;
 
@@ -547,7 +547,7 @@ arm_md_asm_adjust (vec &outputs, vec & /*inputs*/,
   con += 4;
   if (strchr (con, ',') != NULL)
{
- error ("alternatives not allowed in % flag output");
+ error_at (loc, "alternatives not allowed in % flag output");
  continue;
}
 
@@ -608,7 +608,7 @@ arm_md_asm_adjust (vec &outputs, vec & /*inputs*/,
  mode = CC_Vmode, code = NE;

[PATCH 3/4] use diagnostic location in diagnostic_report_current_function

2021-07-14 Thread Trevor Saunders
It appears that input_location was used here before the diagnostic's location
was available, and never updated, when the other part of the header was added
that uses it, so this makes it consistent.

bootstrapped and regtested on x86_64-linux-gnu, ok?

Trev

gcc/ChangeLog:

* tree-diagnostic.c (diagnostic_report_current_function): Use the
diagnostic's location, not input_location.
---
 gcc/tree-diagnostic.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/tree-diagnostic.c b/gcc/tree-diagnostic.c
index 8bb214b2cf5..705da94637d 100644
--- a/gcc/tree-diagnostic.c
+++ b/gcc/tree-diagnostic.c
@@ -36,9 +36,9 @@ void
 diagnostic_report_current_function (diagnostic_context *context,
diagnostic_info *diagnostic)
 {
-  diagnostic_report_current_module (context, diagnostic_location (diagnostic));
-  lang_hooks.print_error_function (context, LOCATION_FILE (input_location),
-  diagnostic);
+  location_t loc = diagnostic_location (diagnostic);
+  diagnostic_report_current_module (context, loc);
+  lang_hooks.print_error_function (context, LOCATION_FILE (loc), diagnostic);
 }
 
 static void
-- 
2.20.1



[PATCH 2/4] use error_at and warning_at in cfgexpand.c

2021-07-14 Thread Trevor Saunders
bootstrapped and regtested on x86_64-linux-gnu, ok?

Trev

gcc/ChangeLog:

* cfgexpand.c (tree_conflicts_with_clobbers_p): Pass location to
diagnostics.
(expand_asm_stmt): Likewise.
---
 gcc/cfgexpand.c | 35 ++-
 1 file changed, 18 insertions(+), 17 deletions(-)

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index fea8c837c80..46f2960c491 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -2954,7 +2954,8 @@ check_operand_nalternatives (const vec 
&constraints)
variable definition for error, NULL_TREE for ok.  */
 
 static bool
-tree_conflicts_with_clobbers_p (tree t, HARD_REG_SET *clobbered_regs)
+tree_conflicts_with_clobbers_p (tree t, HARD_REG_SET *clobbered_regs,
+   location_t loc)
 {
   /* Conflicts between asm-declared register variables and the clobber
  list are not allowed.  */
@@ -2962,9 +2963,8 @@ tree_conflicts_with_clobbers_p (tree t, HARD_REG_SET 
*clobbered_regs)
 
   if (overlap)
 {
-  error ("% specifier for variable %qE conflicts with "
-"% clobber list",
-DECL_NAME (overlap));
+  error_at (loc, "% specifier for variable %qE conflicts with "
+   "% clobber list", DECL_NAME (overlap));
 
   /* Reset registerness to stop multiple errors emitted for a single
 variable.  */
@@ -3087,7 +3087,7 @@ expand_asm_stmt (gasm *stmt)
   /* ??? Diagnose during gimplification?  */
   if (ninputs + noutputs + nlabels > MAX_RECOG_OPERANDS)
 {
-  error ("more than %d operands in %", MAX_RECOG_OPERANDS);
+  error_at (locus, "more than %d operands in %", MAX_RECOG_OPERANDS);
   return;
 }
 
@@ -3140,7 +3140,8 @@ expand_asm_stmt (gasm *stmt)
  if (j == -2)
{
  /* ??? Diagnose during gimplification?  */
- error ("unknown register name %qs in %", regname);
+ error_at (locus, "unknown register name %qs in %",
+   regname);
  error_seen = true;
}
  else if (j == -4)
@@ -3205,7 +3206,8 @@ expand_asm_stmt (gasm *stmt)
&& HARD_REGISTER_P (DECL_RTL (output_tvec[j]))
&& output_hregno == REGNO (DECL_RTL (output_tvec[j])))
  {
-   error ("invalid hard register usage between output operands");
+   error_at (locus, "invalid hard register usage between output "
+ "operands");
error_seen = true;
  }
 
@@ -3231,16 +3233,16 @@ expand_asm_stmt (gasm *stmt)
if (i == match
&& output_hregno != input_hregno)
  {
-   error ("invalid hard register usage between output "
-  "operand and matching constraint operand");
+   error_at (locus, "invalid hard register usage between "
+ "output operand and matching constraint operand");
error_seen = true;
  }
else if (early_clobber_p
 && i != match
 && output_hregno == input_hregno)
  {
-   error ("invalid hard register usage between "
-  "earlyclobber operand and input operand");
+   error_at (locus, "invalid hard register usage between "
+ "earlyclobber operand and input operand");
error_seen = true;
  }
  }
@@ -3319,7 +3321,7 @@ expand_asm_stmt (gasm *stmt)
 
  if (! allows_reg && !MEM_P (op))
{
- error ("output number %d not directly addressable", i);
+ error_at (locus, "output number %d not directly addressable", i);
  error_seen = true;
}
  if ((! allows_mem && MEM_P (op) && GET_MODE (op) != BLKmode)
@@ -3415,9 +3417,8 @@ expand_asm_stmt (gasm *stmt)
  if (allows_reg && TYPE_MODE (type) != BLKmode)
op = force_reg (TYPE_MODE (type), op);
  else if (!allows_mem)
-   warning (0, "% operand %d probably does not match "
-"constraints",
-i + noutputs);
+   warning_at (locus, 0, "% operand %d probably does not match "
+   "constraints", i + noutputs);
  else if (MEM_P (op))
{
  /* We won't recognize either volatile memory or memory
@@ -3471,10 +3472,10 @@ expand_asm_stmt (gasm *stmt)
 
   bool clobber_conflict_found = 0;
   for (i = 0; i < noutputs; ++i)
-if (tree_conflicts_with_clobbers_p (output_tvec[i], &clobbered_regs))
+if (tree_conflicts_with_clobbers_p (output_tvec[i], &clobbered_regs, 
locus))
clobber_conflict_found = 1;
   for (i = 0; i < ninputs - ninout; ++i)
-if (tree_conflicts_with_clobbers_p (input_tvec[i], &clobbered_regs))
+if (tree_confli

[PATCH 1/4] force decls to be allocated through build_decl to initialize them

2021-07-14 Thread Trevor Saunders
prior to this commit all calls to build_decl used input_location, even if
temporarily  until build_decl reset the location to something else that it was
told was the proper location.  To avoid using the global we need the caller to
pass in the location it wants, however that's not possible with make_node since
it makes other types of nodes.  So we force all callers who wish to make a decl
to go through build_decl which already takes a location argument.  To avoid
changing behavior this just explicitly passes in input_location to build_decl
for callers of make_node that create a decl, however it would seem in many of
these cases that the location of the decl being coppied might be a better
location.

bootstrapped and regtested on x86_64-linux-gnu, ok?

Trev

gcc/ChangeLog:

* cfgexpand.c (avoid_deep_ter_for_debug): Call build_decl not
make_node.
(expand_gimple_basic_block): Likewise.
* ipa-param-manipulation.c (ipa_param_adjustments::modify_call):
* Likewise.
(ipa_param_body_adjustments::reset_debug_stmts): Likewise.
* omp-simd-clone.c (ipa_simd_modify_stmt_ops): Likewise.
* stor-layout.c (start_bitfield_representative): Likewise.
* tree-inline.c (remap_ssa_name): Likewise.
(tree_function_versioning): Likewise.
* tree-into-ssa.c (rewrite_debug_stmt_uses): Likewise.
* tree-nested.c (lookup_field_for_decl): Likewise.
(get_chain_field): Likewise.
(create_field_for_decl): Likewise.
(get_nl_goto_field): Likewise.
(finalize_nesting_tree_1): Likewise.
* tree-ssa-ccp.c (optimize_atomic_bit_test_and): Likewise.
* tree-ssa-loop-ivopts.c (remove_unused_ivs): Likewise.
* tree-ssa-phiopt.c (spaceship_replacement): Likewise.
* tree-ssa-reassoc.c (make_new_ssa_for_def): Likewise.
* tree-ssa.c (insert_debug_temp_for_var_def): Likewise.
* tree-streamer-in.c (streamer_alloc_tree): Adjust.
* tree.c (make_node): Add argument to specify the caller.
(build_decl): Move initialization from make_node.
* tree.h (enum make_node_caller): new enum.
(make_node): Adjust prototype.
* varasm.c (make_debug_expr_from_rtl): call build_decl.

gcc/cp/ChangeLog:

* constraint.cc (build_type_constraint): Call build_decl not make_node.
* cp-gimplify.c (cp_genericize_r): Likewise.
* parser.c (cp_parser_introduction_list): Likewise.
* module.cc (trees_in::start): Adjust.

gcc/fortran/ChangeLog:

* trans-decl.c (generate_namelist_decl): Call build_decl not make_node.
* trans-types.c (gfc_get_array_descr_info): Likewise.

gcc/objc/ChangeLog:

* objc-act.c (objc_add_property_declaration): Call build_decl not
make_node.
(maybe_make_artificial_property_decl): Likewise.
(objc_build_keyword_decl): Likewise.
(build_method_decl): Likewise.
---
 gcc/cfgexpand.c  |  8 
 gcc/cp/constraint.cc |  2 +-
 gcc/cp/cp-gimplify.c |  5 +++--
 gcc/cp/module.cc |  2 +-
 gcc/cp/parser.c  |  6 ++
 gcc/fortran/trans-decl.c |  5 ++---
 gcc/fortran/trans-types.c|  4 ++--
 gcc/ipa-param-manipulation.c |  8 
 gcc/objc/objc-act.c  | 16 ++--
 gcc/omp-simd-clone.c |  4 ++--
 gcc/stor-layout.c|  2 +-
 gcc/tree-inline.c| 13 +++--
 gcc/tree-into-ssa.c  |  4 ++--
 gcc/tree-nested.c| 24 ++--
 gcc/tree-ssa-ccp.c   |  4 ++--
 gcc/tree-ssa-loop-ivopts.c   |  4 ++--
 gcc/tree-ssa-phiopt.c|  8 
 gcc/tree-ssa-reassoc.c   |  4 ++--
 gcc/tree-ssa.c   |  4 ++--
 gcc/tree-streamer-in.c   |  2 +-
 gcc/tree.c   | 35 ++-
 gcc/tree.h   | 13 -
 gcc/varasm.c | 12 ++--
 23 files changed, 96 insertions(+), 93 deletions(-)

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index 3edd53c37dc..fea8c837c80 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -4342,10 +4342,10 @@ avoid_deep_ter_for_debug (gimple *stmt, int depth)
  tree &vexpr = deep_ter_debug_map->get_or_insert (use);
  if (vexpr != NULL)
continue;
- vexpr = make_node (DEBUG_EXPR_DECL);
+ vexpr = build_decl (input_location, DEBUG_EXPR_DECL, nullptr,
+ TREE_TYPE (use));
  gimple *def_temp = gimple_build_debug_bind (vexpr, use, g);
  DECL_ARTIFICIAL (vexpr) = 1;
- TREE_TYPE (vexpr) = TREE_TYPE (use);
  SET_DECL_MODE (vexpr, TYPE_MODE (TREE_TYPE (use)));
  gimple_stmt_iterator gsi = gsi_for_stmt (g);
  gsi_insert_after (&gsi, def_temp, GSI_NEW_STMT);
@@ -5899,14 +5899,14 @@ expand_gimple_basic_block (basic_block bb, bool 
disable_tail_calls)
   temporary.  */
gimple *debugst

[PATCH] gcov: Fix use of profile info section

2021-07-14 Thread Sebastian Huber
If the -fprofile-info-section is used, then the gcov information is registered
in a linker set.  This is done by build_gcov_info_var_registration().  The
compiler generated object placed in the section was not marked as referenced,
so once optimization was enabled, this object was optimized away.  Mark it as
referenced.

gcc/
coverage.c (build_gcov_info_var_registration): Mark the object placed
in the linker set as referenced so that it does not get optimized away.
---
 gcc/coverage.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/coverage.c b/gcc/coverage.c
index dfc8108d5d83..ac9a9fdad228 100644
--- a/gcc/coverage.c
+++ b/gcc/coverage.c
@@ -50,6 +50,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "auto-profile.h"
 #include "profile.h"
 #include "diagnostic.h"
+#include "varasm.h"
 
 #include "gcov-io.c"
 
@@ -1121,6 +1122,7 @@ build_gcov_info_var_registration (tree gcov_info_type)
   DECL_NAME (var) = get_identifier (name_buf);
   get_section (profile_info_section, SECTION_UNNAMED, NULL);
   set_decl_section_name (var, profile_info_section);
+  mark_decl_referenced (var);
   DECL_INITIAL (var) = build_fold_addr_expr (gcov_info_var);
   varpool_node::finalize_decl (var);
 }
-- 
2.26.2



Re: [RFC/PATCH] vect: Recog mul_highpart pattern

2021-07-14 Thread Kewen.Lin via Gcc-patches
on 2021/7/14 下午2:38, Richard Biener wrote:
> On Tue, Jul 13, 2021 at 4:59 PM Kewen.Lin  wrote:
>>
>> on 2021/7/13 下午8:42, Richard Biener wrote:
>>> On Tue, Jul 13, 2021 at 12:25 PM Kewen.Lin  wrote:

 Hi Richi,

 Thanks for the comments!

 on 2021/7/13 下午5:35, Richard Biener wrote:
> On Tue, Jul 13, 2021 at 10:53 AM Kewen.Lin  wrote:
>>
>> Hi,
>>
>> When I added the support for Power10 newly introduced multiply
>> highpart instrutions, I noticed that currently vectorizer
>> doesn't try to vectorize multiply highpart pattern, I hope
>> this isn't intentional?
>>
>> This patch is to extend the existing pattern mulhs handlings
>> to cover multiply highpart.  Another alternative seems to
>> recog mul_highpart operation in a general place applied for
>> scalar code when the target supports the optab for the scalar
>> operation, it's based on the assumption that one target which
>> supports vector version of multiply highpart should have the
>> scalar version.  I noticed that the function can_mult_highpart_p
>> can check/handle mult_highpart well even without mul_highpart
>> optab support, I think to recog this pattern in vectorizer
>> is better.  Is it on the right track?
>
> I think it's on the right track, using IFN_LAST is a bit awkward
> in case yet another case pops up so maybe you can use
> a code_helper instance instead which unifies tree_code,
> builtin_code and internal_fn?
>

 If there is one new requirement which doesn't have/introduce IFN
 stuffs but have one existing tree_code, can we add one more field
 with type tree_code, then for the IFN_LAST path we can check the
 different requirements under the guard with that tree_code variable?

> I also notice that can_mult_highpart_p will return true if
> only vec_widen_[us]mult_{even,odd,hi,lo} are available,
> but then the result might be less optimal (or even not
> handled later)?
>

 I think it will be handled always?  The expander calls

 rtx
 expand_mult_highpart (machine_mode mode, rtx op0, rtx op1,
   rtx target, bool uns_p)

 which will further check with can_mult_highpart_p.

 For the below case,

 #define SHT_CNT 16

 __attribute__ ((noipa)) void
 test ()
 {
   for (int i = 0; i < N; i++)
 sh_c[i] = ((SI) sh_a[i] * (SI) sh_b[i]) >> 16;
 }

 Without this patch, it use widen_mult like below:

   vect__1.5_19 = MEM  [(short int *)&sh_a + 
 ivtmp.18_24 * 1];
   vect__3.8_14 = MEM  [(short int *)&sh_b + 
 ivtmp.18_24 * 1];
   vect_patt_22.9_13 = WIDEN_MULT_LO_EXPR ;
   vect_patt_22.9_9 = WIDEN_MULT_HI_EXPR ;
   vect__6.10_25 = vect_patt_22.9_13 >> 16;
   vect__6.10_26 = vect_patt_22.9_9 >> 16;
   vect__7.11_27 = VEC_PACK_TRUNC_EXPR ;
   MEM  [(short int *)&sh_c + ivtmp.18_24 * 1] = 
 vect__7.11_27;

 .L2:
 lxvx 33,7,9
 lxvx 32,8,9
 vmulosh 13,0,1// widen mult
 vmulesh 0,0,1
 xxmrglw 33,32,45  // merge
 xxmrghw 32,32,45
 vsraw 1,1,12  // shift
 vsraw 0,0,12
 vpkuwum 0,0,1 // pack
 stxvx 32,10,9
 addi 9,9,16
 bdnz .L2


 With this patch, it ends up with:

   vect__1.5_14 = MEM  [(short int *)&sh_a + 
 ivtmp.17_24 * 1];
   vect__3.8_8 = MEM  [(short int *)&sh_b + 
 ivtmp.17_24 * 1];
   vect_patt_21.9_25 = vect__3.8_8 h* vect__1.5_14;
   MEM  [(short int *)&sh_c + ivtmp.17_24 * 1] = 
 vect_patt_21.9_25;
>>>
>>> Yes, so I'm curious what it ends up with/without the patch on x86_64 which
>>> can do vec_widen_[us]mult_{even,odd} but not [us]mul_highpart.
>>>
>>
>> For test case:
>>
>> ```
>> #define N 32
>> typedef signed int bigType;
>> typedef signed short smallType;
>> #define SH_CNT 16
>>
>> extern smallType small_a[N], small_b[N], small_c[N];
>>
>> __attribute__((noipa)) void test_si(int n) {
>>   for (int i = 0; i < n; i++)
>> small_c[i] = ((bigType)small_a[i] * (bigType)small_b[i]) >> SH_CNT;
>> }
>>
>> ```
>>
>> on x86_64, with option set: -mfpmath=sse -msse2 -O2 -ftree-vectorize
>>
>> 1) without this patch, the pattern isn't recognized, the IR looks like:
>>
>>[local count: 94607391]:
>>   bnd.5_34 = niters.4_25 >> 3;
>>   _13 = (sizetype) bnd.5_34;
>>   _29 = _13 * 16;
>>
>>[local count: 378429566]:
>>   # ivtmp.34_4 = PHI 
>>   vect__1.10_40 = MEM  [(short int *)&small_a + 
>> ivtmp.34_4 * 1];
>>   vect__3.13_43 = MEM  [(short int *)&small_b + 
>> ivtmp.34_4 * 1];
>>   vect_patt_18.14_44 = WIDEN_MULT_LO_EXPR ;
>>   vect_patt_18.14_45 = WIDEN_MULT_HI_EXPR ;
>>   vect__6.15_46 = vect_patt_18.14_44 >> 16;
>>   vect__6.15_47 = vect_patt_18.14_45 >> 16;
>>   vect__7.16_48 = VEC_PACK_TRUNC_EXPR ;
>>   MEM  [(short int

  1   2   >