Re: [PATCH v2] MIPS: IPL is 8bit in Cause register if TARGET_MCU

2022-03-15 Thread YunQiang Su

在 2022/2/12 16:47, Maciej W. Rozycki 写道:

On Fri, 11 Feb 2022, Jeff Law wrote:


If MIPS MCU extension is enable, the IPL section in Cause register
has been expand to 8bit instead of 6bit.

gcc/ChangeLog:

* config/mips/mips.cc (mips_expand_prologue):
  IPL is 8bit for MCU ASE.

OK


  But this is still wrong AFAICT.



Yes. you are right.


  The mask is applied to the CP0 Status register according to the comment,
but the layout of the interrupt bit-field is different between the CP0
Status and the CP0 Cause registers, so you can't just extract it from one
of the two registers and directly apply to the other.



Since our case has 128 interrupts, so I didn't find this problem.


  I would like to know how this code has been verified.


And now new version sent with the test with 256 interrupts.

See v3 please.



   Maciej
.




RE: [PATCH] PR tree-optimization/101895: Fold VEC_PERM to help recognize FMA.

2022-03-15 Thread Roger Sayle

Hi Richard and Marc,
Many thanks for both your feedback on my patch for PR 101895.
Here's version 2 of this patch, incorporating all of the suggested improvements.
The one minor complication is that the :s qualifier doesn't automatically
recognize that a capture already has two (or N) uses in a pattern,
so I have to manually confirm that there are no other uses of the mult
using num_imm_uses.

This revision has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check with no new failures.  Ok for mainline?

2022-03-15  Roger Sayle  
Marc Glisse  
Richard Biener  

gcc/ChangeLog
PR tree-optimization/101895
* match.pd (vec_same_elem_p): Handle CONSTRUCTOR_EXPR def.
(plus (vec_perm (mult ...) ...) ...): New reordering simplification.

gcc/testsuite/ChangeLog
PR tree-optimization/101895
* gcc.target/i386/pr101895.c: New test case.


Thanks in advance,
Roger
--

> -Original Message-
> From: Richard Biener 
> Sent: 14 March 2022 07:38
> To: GCC Patches 
> Cc: Roger Sayle ; Marc Glisse
> 
> Subject: Re: [PATCH] PR tree-optimization/101895: Fold VEC_PERM to help
> recognize FMA.
> 
> On Sun, Mar 13, 2022 at 12:39 AM Marc Glisse via Gcc-patches  patc...@gcc.gnu.org> wrote:
> >
> > On Fri, 11 Mar 2022, Roger Sayle wrote:
> >
> > +(match vec_same_elem_p
> > +  CONSTRUCTOR@0
> > +  (if (uniform_vector_p (TREE_CODE (@0) == SSA_NAME
> > +? gimple_assign_rhs1 (SSA_NAME_DEF_STMT (@0))
> > +: @0
> >
> > Ah, I didn't remember we needed that, we don't seem to be very
> > consistent about it. Probably for this reason, the transformation
> > "Prefer vector1 << scalar to vector1 << vector2" does not match
> >
> > typedef int vec __attribute__((vector_size(16))); vec f(vec a, int b){
> >vec bb = { b, b, b, b };
> >return a << bb;
> > }
> >
> > which is only optimized at vector lowering time.
> 
> Few more comments - since match.pd is matching in match.pd order the
> 
> (match vec_same_elem_p
>   @0
>   (...))
> 
> should come last.  Please use
> 
> +(match vec_same_elem_p
> +  CONSTRUCTOR@0
> (if (TREE_CODE (@0) == SSA_NAME
>  && uniform_vector_p (...
> 
> since otherwise we'll try uniform_vector_p twice on all CTORs (that are not
> uniform).
> 
> > +/* Push VEC_PERM earlier if that may help FMA perception (PR101895).
> > +*/ (for plusminus (plus minus)
> > +  (simplify
> > +(plusminus (vec_perm (mult@0 @1 vec_same_elem_p@2) @0 @3) @4)
> > +(plusminus (mult (vec_perm @1 @1 @3) @2) @4)))
> >
> > Don't you want :s on mult and vec_perm?
> 
> Yes.  Also for plus you want :c on it , likewise you want :c on the mult.  
> The :c on
> the plus will require splitting the plus and minus case :/
> 
> Otherwise looks reasonable.
> 
> Richard.
> 
> >
> > --
> > Marc Glisse
diff --git a/gcc/match.pd b/gcc/match.pd
index 97399e5..12c92f4 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -7689,16 +7689,33 @@ and,
 /* VEC_PERM_EXPR (v, v, mask) -> v where v contains same element.  */
 
 (match vec_same_elem_p
+ (vec_duplicate @0))
+
+(match vec_same_elem_p
+ CONSTRUCTOR@0
+ (if (TREE_CODE (@0) == SSA_NAME
+  && uniform_vector_p (gimple_assign_rhs1 (SSA_NAME_DEF_STMT (@0))
+
+(match vec_same_elem_p
  @0
  (if (uniform_vector_p (@0
 
-(match vec_same_elem_p
- (vec_duplicate @0))
 
 (simplify
  (vec_perm vec_same_elem_p@0 @0 @1)
  @0)
 
+/* Push VEC_PERM earlier if that may help FMA perception (PR101895).  */
+(simplify
+ (plus:c (vec_perm:s (mult:c@0 @1 vec_same_elem_p@2) @0 @3) @4)
+ (if (TREE_CODE (@0) == SSA_NAME && num_imm_uses (@0) == 2)
+  (plus (mult (vec_perm @1 @1 @3) @2) @4)))
+(simplify
+ (minus (vec_perm:s (mult:c@0 @1 vec_same_elem_p@2) @0 @3) @4)
+ (if (TREE_CODE (@0) == SSA_NAME && num_imm_uses (@0) == 2)
+  (minus (mult (vec_perm @1 @1 @3) @2) @4)))
+
+
 /* Match count trailing zeroes for simplify_count_trailing_zeroes in fwprop.
The canonical form is array[((x & -x) * C) >> SHIFT] where C is a magic
constant which when multiplied by a power of 2 contains a unique value
diff --git a/gcc/testsuite/gcc.target/i386/pr101895.c 
b/gcc/testsuite/gcc.target/i386/pr101895.c
new file mode 100644
index 000..4d0f1cb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr101895.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=cascadelake" } */
+
+void foo(float * __restrict__ a, float b, float *c) {
+  a[0] = c[0]*b + a[0];
+  a[1] = c[2]*b + a[1];
+  a[2] = c[1]*b + a[2];
+  a[3] = c[3]*b + a[3];
+}
+
+/* { dg-final { scan-assembler "vfmadd" } } */


Re: [PATCH] Ignore (possible) signed zeros in operands of FP comparisons.

2022-03-15 Thread Richard Biener via Gcc-patches
On Mon, Mar 14, 2022 at 8:26 PM Roger Sayle  wrote:
>
>
> I've been wondering about the possible performance/missed-optimization
> impact of my patch for PR middle-end/98420 and similar IEEE correctness
> fixes that disable constant folding optimizations when worrying about -0.0.
> In the common situation where the floating point result is used by a
> FP comparison, there's no distinction between +0.0 and -0.0, so some
> HONOR_SIGNED_ZEROS optimizations that we'd usually disable, are safe.
>
> Consider the following interesting example:
>
> int foo(int x, double y) {
> return (x * 0.0) < y;
> }
>
> Although we know that x (when converted to double) can't be NaN or Inf,
> we still worry that for negative values of x that (x * 0.0) may be -0.0
> and so perform the multiplication at run-time.  But in this case, the
> result of the comparison (-0.0 < y) will be exactly the same as (+0.0 < y)
> for any y, hence the above may be safely constant folded to "0.0 < y"
> avoiding the multiplication at run-time.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check with no new failures, and allows GCC to continue to
> optimize cases that we optimized in GCC 11 (without regard to correctness).
> Ok for mainline?

Isn't that something that gimple-ssa-backprop.c is designed to handle?  I wonder
if you can see whether the signed zero speciality can be retrofitted there?
It currently tracks "sign does not matter", so possibly another state,
"sign of zero
does not matter" could be introduced there.

Thanks,
Richard.

>
> 2022-03-14  Roger Sayle  
>
> gcc/ChangeLog
> * match.pd (X CMP (Y-Y) -> X CMP 0.0): New transformation.
> (X CMP (Y * 0.0) -> X CMP 0.0): Likewise.
> (X CMP X -> true): Test tree_expr_maybe_nan_p instead of HONOR_NANS.
> (X LTGT X -> false): Enable if X is not tree_expr_maybe_nan_p, as
> this can't trap/signal.
>
> gcc/testsuite/ChangeLog
> * gcc.dg/fold-compare-9.c: New test case.
>
>
> Thanks in advance,
> Roger
> --
>


Re: [PATCH] PR tree-optimization/101895: Fold VEC_PERM to help recognize FMA.

2022-03-15 Thread Richard Biener via Gcc-patches
On Tue, Mar 15, 2022 at 8:25 AM Roger Sayle  wrote:
>
>
> Hi Richard and Marc,
> Many thanks for both your feedback on my patch for PR 101895.
> Here's version 2 of this patch, incorporating all of the suggested 
> improvements.
> The one minor complication is that the :s qualifier doesn't automatically
> recognize that a capture already has two (or N) uses in a pattern,
> so I have to manually confirm that there are no other uses of the mult
> using num_imm_uses.
>
> This revision has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check with no new failures.  Ok for mainline?

OK.

Thanks,
Richard.

> 2022-03-15  Roger Sayle  
> Marc Glisse  
> Richard Biener  
>
> gcc/ChangeLog
> PR tree-optimization/101895
> * match.pd (vec_same_elem_p): Handle CONSTRUCTOR_EXPR def.
> (plus (vec_perm (mult ...) ...) ...): New reordering simplification.
>
> gcc/testsuite/ChangeLog
> PR tree-optimization/101895
> * gcc.target/i386/pr101895.c: New test case.
>
>
> Thanks in advance,
> Roger
> --
>
> > -Original Message-
> > From: Richard Biener 
> > Sent: 14 March 2022 07:38
> > To: GCC Patches 
> > Cc: Roger Sayle ; Marc Glisse
> > 
> > Subject: Re: [PATCH] PR tree-optimization/101895: Fold VEC_PERM to help
> > recognize FMA.
> >
> > On Sun, Mar 13, 2022 at 12:39 AM Marc Glisse via Gcc-patches  > patc...@gcc.gnu.org> wrote:
> > >
> > > On Fri, 11 Mar 2022, Roger Sayle wrote:
> > >
> > > +(match vec_same_elem_p
> > > +  CONSTRUCTOR@0
> > > +  (if (uniform_vector_p (TREE_CODE (@0) == SSA_NAME
> > > +? gimple_assign_rhs1 (SSA_NAME_DEF_STMT (@0))
> > > +: @0
> > >
> > > Ah, I didn't remember we needed that, we don't seem to be very
> > > consistent about it. Probably for this reason, the transformation
> > > "Prefer vector1 << scalar to vector1 << vector2" does not match
> > >
> > > typedef int vec __attribute__((vector_size(16))); vec f(vec a, int b){
> > >vec bb = { b, b, b, b };
> > >return a << bb;
> > > }
> > >
> > > which is only optimized at vector lowering time.
> >
> > Few more comments - since match.pd is matching in match.pd order the
> >
> > (match vec_same_elem_p
> >   @0
> >   (...))
> >
> > should come last.  Please use
> >
> > +(match vec_same_elem_p
> > +  CONSTRUCTOR@0
> > (if (TREE_CODE (@0) == SSA_NAME
> >  && uniform_vector_p (...
> >
> > since otherwise we'll try uniform_vector_p twice on all CTORs (that are not
> > uniform).
> >
> > > +/* Push VEC_PERM earlier if that may help FMA perception (PR101895).
> > > +*/ (for plusminus (plus minus)
> > > +  (simplify
> > > +(plusminus (vec_perm (mult@0 @1 vec_same_elem_p@2) @0 @3) @4)
> > > +(plusminus (mult (vec_perm @1 @1 @3) @2) @4)))
> > >
> > > Don't you want :s on mult and vec_perm?
> >
> > Yes.  Also for plus you want :c on it , likewise you want :c on the mult.  
> > The :c on
> > the plus will require splitting the plus and minus case :/
> >
> > Otherwise looks reasonable.
> >
> > Richard.
> >
> > >
> > > --
> > > Marc Glisse


[PATCH] Performance/size improvement to single_use when matching GIMPLE.

2022-03-15 Thread Roger Sayle
 

This patch improves the implementation of single_use as used in code

generated from match.pd for patterns using :s.  The current implementation

contains the logic "has_zero_uses (t) || has_single_use (t)" which

performs a loop over the uses to first check if there are zero non-debug

uses [which is rare], then another loop over these uses to check if there

is exactly one non-debug use.  This can be better implemented using a

single loop.

 

This function is currently inlined over 800 times in gimple-match.cc,

whose .o on x86_64-pc-linux-gnu is now up to 30 Mbytes, so speeding up

and shrinking this function should help offset the growth in match.pd

for GCC 12.

 

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap

and make -k check with no new failures.  Ok for mainline?

 

 

2022-03-15  Roger Sayle  

 

gcc/ChangeLog

* gimple-match-head.cc (single_use): Implement inline using a

single loop.

 

Thanks in advance,

Roger

--

 

diff --git a/gcc/gimple-match-head.cc b/gcc/gimple-match-head.cc
index 74d5818..fc537b9 100644
--- a/gcc/gimple-match-head.cc
+++ b/gcc/gimple-match-head.cc
@@ -1163,7 +1163,22 @@ types_match (tree t1, tree t2)
 static inline bool
 single_use (tree t)
 {
-  return TREE_CODE (t) != SSA_NAME || has_zero_uses (t) || has_single_use (t);
+  if (TREE_CODE (t) != SSA_NAME)
+return true;
+
+  /* Inline return has_zero_uses (t) || has_single_use (t);  */
+  const ssa_use_operand_t *const head = &(SSA_NAME_IMM_USE_NODE (t));
+  const ssa_use_operand_t *ptr;
+  bool single = false;
+
+  for (ptr = head->next; ptr != head; ptr = ptr->next)
+if (USE_STMT(ptr) && !is_gimple_debug (USE_STMT (ptr)))
+  {
+if (single)
+  return false;
+   single = true;
+  }
+  return true;
 }
 
 /* Return true if math operations should be canonicalized,


RE: [PATCH] Ignore (possible) signed zeros in operands of FP comparisons.

2022-03-15 Thread Roger Sayle


> -Original Message-
> From: Richard Biener 
> Sent: 15 March 2022 07:29
> To: Roger Sayle 
> Cc: GCC Patches 
> Subject: Re: [PATCH] Ignore (possible) signed zeros in operands of FP
> comparisons.
> 
> On Mon, Mar 14, 2022 at 8:26 PM Roger Sayle
>  wrote:
> >
> >
> > I've been wondering about the possible performance/missed-optimization
> > impact of my patch for PR middle-end/98420 and similar IEEE
> > correctness fixes that disable constant folding optimizations when worrying
> about -0.0.
> > In the common situation where the floating point result is used by a
> > FP comparison, there's no distinction between +0.0 and -0.0, so some
> > HONOR_SIGNED_ZEROS optimizations that we'd usually disable, are safe.
> >
> > Consider the following interesting example:
> >
> > int foo(int x, double y) {
> > return (x * 0.0) < y;
> > }
> >
> > Although we know that x (when converted to double) can't be NaN or
> > Inf, we still worry that for negative values of x that (x * 0.0) may
> > be -0.0 and so perform the multiplication at run-time.  But in this
> > case, the result of the comparison (-0.0 < y) will be exactly the same
> > as (+0.0 < y) for any y, hence the above may be safely constant folded to 
> > "0.0 <
> y"
> > avoiding the multiplication at run-time.
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check with no new failures, and allows GCC to continue to
> > optimize cases that we optimized in GCC 11 (without regard to correctness).
> > Ok for mainline?
> 
> Isn't that something that gimple-ssa-backprop.c is designed to handle?  I 
> wonder
> if you can see whether the signed zero speciality can be retrofitted there?
> It currently tracks "sign does not matter", so possibly another state, "sign 
> of
> zero does not matter" could be introduced there.

Two questions. Would adding tracking of "sign of zero does not matter" to
gimple-ssa-backprop.c be suitable for stage4?  Secondly, even if 
gimple-ssa-backprop.c
performed this kind of optimization, would that be a reason not to support
these transformations in match.pd?  Perhaps someone could open a missed
optimization PR for backprop in Bugzilla, but the above patch still needs to be
reviewed on its own merits.

Speaking of tree-ssa passes that could be improved, I was wondering whether
you could review my EVRP patch to fix regression PR/102950.  Pretty please?
https://gcc.gnu.org/pipermail/gcc-patches/2022-February/589569.html

Thanks (as always),
Roger

> Thanks,
> Richard.
> 
> >
> > 2022-03-14  Roger Sayle  
> >
> > gcc/ChangeLog
> > * match.pd (X CMP (Y-Y) -> X CMP 0.0): New transformation.
> > (X CMP (Y * 0.0) -> X CMP 0.0): Likewise.
> > (X CMP X -> true): Test tree_expr_maybe_nan_p instead of
> HONOR_NANS.
> > (X LTGT X -> false): Enable if X is not tree_expr_maybe_nan_p, as
> > this can't trap/signal.
> >
> > gcc/testsuite/ChangeLog
> > * gcc.dg/fold-compare-9.c: New test case.
> >
> >
> > Thanks in advance,
> > Roger
> > --
> >



[PATCH] configure: use OBJDUMP determined by libtool [PR95648]

2022-03-15 Thread David Seifert via Gcc-patches
$ac_cv_prog_OBJDUMP contains the --host OBJDUMP that
libtool has inferred. Current config/gcc-plugin.m4 does
not respect the user's choice for OBJDUMP.

config/

* gcc-plugin.m4: Use libtool's $ac_cv_prog_OBJDUMP.

gcc/

* configure: Regenerate.

libcc1/

* configure: Regenerate.
---
 config/gcc-plugin.m4 | 2 +-
 gcc/configure| 2 +-
 libcc1/configure | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/config/gcc-plugin.m4 b/config/gcc-plugin.m4
index 7ee342fe5fe..2ccb9ca7258 100644
--- a/config/gcc-plugin.m4
+++ b/config/gcc-plugin.m4
@@ -45,7 +45,7 @@ AC_DEFUN([GCC_ENABLE_PLUGINS],
  ;;
  *)
if test x$build = x$host; then
-export_sym_check="objdump${exeext} -T"
+export_sym_check="$ac_cv_prog_OBJDUMP -T"
elif test x$host = x$target; then
 export_sym_check="$gcc_cv_objdump -T"
else
diff --git a/gcc/configure b/gcc/configure
index 14b19c8fe0c..9cf18259461 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -32057,7 +32057,7 @@ fi
  ;;
  *)
if test x$build = x$host; then
-export_sym_check="objdump${exeext} -T"
+export_sym_check="$ac_cv_prog_OBJDUMP -T"
elif test x$host = x$target; then
 export_sym_check="$gcc_cv_objdump -T"
else
diff --git a/libcc1/configure b/libcc1/configure
index 01cfb2806da..6dd91a086e6 100755
--- a/libcc1/configure
+++ b/libcc1/configure
@@ -15034,7 +15034,7 @@ fi
  ;;
  *)
if test x$build = x$host; then
-export_sym_check="objdump${exeext} -T"
+export_sym_check="$ac_cv_prog_OBJDUMP -T"
elif test x$host = x$target; then
 export_sym_check="$gcc_cv_objdump -T"
else
-- 
2.35.1



Re: PATCH, rs6000] Add V1TI into vector comparison expand [PR103316]

2022-03-15 Thread Kewen.Lin via Gcc-patches
Hi Haochen,

Some minor comments are inlined.

on 2022/3/10 2:31 PM, HAO CHEN GUI via Gcc-patches wrote:
> Hi,
>This patch adds V1TI mode into mode iterator used in vector comparison
> expands.With the patch, both built-ins and direct comparison could generate
> P10 new V1TI comparison instructions.
> 
>Bootstrapped and tested on ppc64 Linux BE and LE with no regressions. Is
> this okay for trunk? Any recommendations? Thanks a lot.
> 
> ChangeLog
> 2022-03-09 Haochen Gui 
> 
> gcc/
>   PR target/103316
>   * config/rs6000/rs6000-builtin.cc (rs6000_gimple_fold_builtin): Enable
>   gimple folding for RS6000_BIF_VCMPEQUT, RS6000_BIF_VCMPNET,
>   RS6000_BIF_CMPGE_1TI, RS6000_BIF_CMPGE_U1TI, RS6000_BIF_VCMPGTUT,
>   RS6000_BIF_VCMPGTST, RS6000_BIF_CMPLE_1TI, RS6000_BIF_CMPLE_U1TI.
>   * config/rs6000/vector.md (VEC_IC): Define. Add support for new Power10
>   V1TI instructions.
>   (vec_cmp): Set mode iterator to VEC_IC.
>   (vec_cmpu): Likewise.
> 
> gcc/testsuite/
>   PR target/103316
>   * gcc.target/powerpc/pr103316.c: New.
> 
> patch.diff
> diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
> b/gcc/config/rs6000/rs6000-builtin.cc
> index 5d34c1bcfc9..143effa89bf 100644
> --- a/gcc/config/rs6000/rs6000-builtin.cc
> +++ b/gcc/config/rs6000/rs6000-builtin.cc
> @@ -1994,6 +1994,7 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>  case RS6000_BIF_VCMPEQUH:
>  case RS6000_BIF_VCMPEQUW:
>  case RS6000_BIF_VCMPEQUD:
> +case RS6000_BIF_VCMPEQUT:
>  /* We deliberately omit RS6000_BIF_VCMPEQUT for now, because gimple
> folding produces worse code for 128-bit compares.  */

The comment above is saying why there is no RS6000_BIF_VCMPEQUT before, IIUC 
the point
doesn't hold any more with your patch.  So could you remove it to avoid possible
confusions?  Also some similar places ...

>fold_compare_helper (gsi, EQ_EXPR, stmt);
> @@ -2002,6 +2003,7 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>  case RS6000_BIF_VCMPNEB:
>  case RS6000_BIF_VCMPNEH:
>  case RS6000_BIF_VCMPNEW:
> +case RS6000_BIF_VCMPNET:
>  /* We deliberately omit RS6000_BIF_VCMPNET for now, because gimple
> folding produces worse code for 128-bit compares.  */

here ...

>fold_compare_helper (gsi, NE_EXPR, stmt);
> @@ -2015,6 +2017,8 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>  case RS6000_BIF_CMPGE_U4SI:
>  case RS6000_BIF_CMPGE_2DI:
>  case RS6000_BIF_CMPGE_U2DI:
> +case RS6000_BIF_CMPGE_1TI:
> +case RS6000_BIF_CMPGE_U1TI:
>  /* We deliberately omit RS6000_BIF_CMPGE_1TI and RS6000_BIF_CMPGE_U1TI
> for now, because gimple folding produces worse code for 128-bit
> compares.  */

here...

> @@ -2029,6 +2033,8 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>  case RS6000_BIF_VCMPGTUW:
>  case RS6000_BIF_VCMPGTUD:
>  case RS6000_BIF_VCMPGTSD:
> +case RS6000_BIF_VCMPGTUT:
> +case RS6000_BIF_VCMPGTST:
>  /* We deliberately omit RS6000_BIF_VCMPGTUT and RS6000_BIF_VCMPGTST
> for now, because gimple folding produces worse code for 128-bit
> compares.  */

here...

> @@ -2043,6 +2049,8 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>  case RS6000_BIF_CMPLE_U4SI:
>  case RS6000_BIF_CMPLE_2DI:
>  case RS6000_BIF_CMPLE_U2DI:
> +case RS6000_BIF_CMPLE_1TI:
> +case RS6000_BIF_CMPLE_U1TI:
>  /* We deliberately omit RS6000_BIF_CMPLE_1TI and RS6000_BIF_CMPLE_U1TI
> for now, because gimple folding produces worse code for 128-bit
> compares.  */

here...

> diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md
> index b87a742cca8..1afb8a6d786 100644
> --- a/gcc/config/rs6000/vector.md
> +++ b/gcc/config/rs6000/vector.md
> @@ -26,6 +26,9 @@
>  ;; Vector int modes
>  (define_mode_iterator VEC_I [V16QI V8HI V4SI V2DI])
> 
> +;; Vector int modes for comparison
> +(define_mode_iterator VEC_IC [V16QI V8HI V4SI V2DI V1TI])
> +

Maybe we can make this define be like:

(define_mode_iterator VEC_IC [V16QI V8HI V4SI V2DI (V1TI "TARGET_POWER10")])

...

>  ;; 128-bit int modes
>  (define_mode_iterator VEC_TI [V1TI TI])
> 
> @@ -533,11 +536,12 @@ (define_expand "vcond_mask_"
> 
>  ;; For signed integer vectors comparison.
>  (define_expand "vec_cmp"
> -  [(set (match_operand:VEC_I 0 "vint_operand")
> +  [(set (match_operand:VEC_IC 0 "vint_operand")
>   (match_operator 1 "signed_or_equality_comparison_operator"
> -   [(match_operand:VEC_I 2 "vint_operand")
> -(match_operand:VEC_I 3 "vint_operand")]))]
> -  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)"
> +   [(match_operand:VEC_IC 2 "vint_operand")
> +(match_operand:VEC_IC 3 "vint_operand")]))]
> +  "(VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode) && mode!= V1TImode)
> +   || (mode == V1TImode && TARGET_POWER10)"

and this condition can be kept as simple with VECTOR_UNIT_ALTIVEC_OR_VSX_P 
(mode)?


>  {
>enum rtx

[PATCH] RISC-V: Implement ZTSO extension.

2022-03-15 Thread shihua
From: LiaoShihua 

  ZTSO is the extension of tatol store order model.
  This extension adds no new instructions to the ISA, and you can use it 
with arch "ztso".
  If you use it, TSO flag will be generate in the ELF header.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc: define new arch.
* config/riscv/riscv-opts.h (MASK_ZTSO): Ditto.
(TARGET_ZTSO):Ditto.
* config/riscv/riscv.opt:Ditto.

---
 gcc/common/config/riscv/riscv-common.cc | 4 +++-
 gcc/config/riscv/riscv-opts.h   | 3 +++
 gcc/config/riscv/riscv.opt  | 3 +++
 3 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index a904893b9ed..f4730b991d7 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -185,6 +185,8 @@ static const struct riscv_ext_version 
riscv_ext_version_table[] =
   {"zvl32768b", ISA_SPEC_CLASS_NONE, 1, 0},
   {"zvl65536b", ISA_SPEC_CLASS_NONE, 1, 0},
 
+  {"ztso", ISA_SPEC_CLASS_NONE, 0, 1},
+
   /* Terminate the list.  */
   {NULL, ISA_SPEC_CLASS_NONE, 0, 0}
 };
@@ -1080,7 +1082,7 @@ static const riscv_ext_flag_table_t 
riscv_ext_flag_table[] =
   {"zvl32768b", &gcc_options::x_riscv_zvl_flags, MASK_ZVL32768B},
   {"zvl65536b", &gcc_options::x_riscv_zvl_flags, MASK_ZVL65536B},
 
-
+  {"ztso", &gcc_options::x_riscv_ztso_subext, MASK_ZTSO},
   {NULL, NULL, 0}
 };
 
diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index 929e4e3a7c5..9cb5f2a550a 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ b/gcc/config/riscv/riscv-opts.h
@@ -136,4 +136,7 @@ enum stack_protector_guard {
 #define TARGET_ZVL32768B ((riscv_zvl_flags & MASK_ZVL32768B) != 0)
 #define TARGET_ZVL65536B ((riscv_zvl_flags & MASK_ZVL65536B) != 0)
 
+#define MASK_ZTSO(1 <<  0)
+#define TARGET_ZTSO((riscv_ztso_subext & MASK_ZTSO) != 0)
+
 #endif /* ! GCC_RISCV_OPTS_H */
diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
index 9fffc08220d..6128bfa31dc 100644
--- a/gcc/config/riscv/riscv.opt
+++ b/gcc/config/riscv/riscv.opt
@@ -209,6 +209,9 @@ int riscv_vector_eew_flags
 TargetVariable
 int riscv_zvl_flags
 
+TargetVariable
+int riscv_ztso_subext
+
 Enum
 Name(isa_spec_class) Type(enum riscv_isa_spec_class)
 Supported ISA specs (for use with the -misa-spec= option):
-- 
2.31.1.windows.1



Re: [PATCH] Performance/size improvement to single_use when matching GIMPLE.

2022-03-15 Thread Richard Biener via Gcc-patches
On Tue, 15 Mar 2022, Roger Sayle wrote:

>  
> 
> This patch improves the implementation of single_use as used in code
> 
> generated from match.pd for patterns using :s.  The current implementation
> 
> contains the logic "has_zero_uses (t) || has_single_use (t)" which
> 
> performs a loop over the uses to first check if there are zero non-debug
> 
> uses [which is rare], then another loop over these uses to check if there
> 
> is exactly one non-debug use.  This can be better implemented using a
> 
> single loop.
> 
>  
> 
> This function is currently inlined over 800 times in gimple-match.cc,
> 
> whose .o on x86_64-pc-linux-gnu is now up to 30 Mbytes, so speeding up
> 
> and shrinking this function should help offset the growth in match.pd
> 
> for GCC 12.
> 
>  
> 
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> 
> and make -k check with no new failures.  Ok for mainline?

Note the intent of has_zero_uses () is even simpler - it's the case
for when there's no SSA operand info on the stmt (no update_stmt called
yet).  More precisely it wants to catch the case where the definition
of the SSA name is not in the IL.

I'm not sure if we want to twist the effective semantics at this
point (I guess we do not want that), so the patch looks like an
improvement.  But may I ask to move the function out of line for
even more savings?  Just put it in gimple-match-head.cc and have it
not declared inline.  I think we may want to go as far and
declare the function 'pure' using ATTRIBUTE_PURE.

>  
> 
>  
> 
> 2022-03-15  Roger Sayle  
> 
>  
> 
> gcc/ChangeLog
> 
> * gimple-match-head.cc (single_use): Implement inline using a
> 
> single loop.
> 
>  
> 
> Thanks in advance,
> 
> Roger
> 
> --
> 
>  
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)


[PATCH] i386: Use no-mmx,no-sse for LIBGCC2_UNWIND_ATTRIBUTE [PR104890]

2022-03-15 Thread Jakub Jelinek via Gcc-patches
Hi!

Regardless of the outcome of the general-regs-only stuff in x86gprintrin.h,
apparently general-regs-only is much bigger hammer than no-sse, and e.g.
using 387 instructions in the unwinder isn't a big deal, it never needs
to realign the stack because of it.

So, the following patch uses no-sse (and adds no-mmx to it, even when not
strictly needed).

Bootstrapped/regtested on x86_64-linux and i686-linux, on the latter
both normally and with -msse2 -mfpmath=sse -mstackrealign in the options and
--enable-cet, ok for trunk?

2022-03-15  Jakub Jelinek  

PR target/104890
* config/i386/i386.h (LIBGCC2_UNWIND_ATTRIBUTE): Use no-mmx,no-sse
instead of general-regs-only.

--- gcc/config/i386/i386.h.jj   2022-03-09 15:25:28.355498493 +0100
+++ gcc/config/i386/i386.h  2022-03-14 15:27:33.831976579 +0100
@@ -2848,10 +2848,10 @@ extern enum attr_cpu ix86_schedule;
 #define NUM_X86_64_MS_CLOBBERED_REGS 12
 #endif
 
-/* __builtin_eh_return can't handle stack realignment, so restrict to
-   general regs in 32-bit libgcc functions that call it.  */
+/* __builtin_eh_return can't handle stack realignment, so disable MMX/SSE
+   in 32-bit libgcc functions that call it.  */
 #ifndef __x86_64__
-#define LIBGCC2_UNWIND_ATTRIBUTE __attribute__((target ("general-regs-only")))
+#define LIBGCC2_UNWIND_ATTRIBUTE __attribute__((target ("no-mmx,no-sse")))
 #endif
 
 /*

Jakub



Re: [PATCH] i386: Use no-mmx,no-sse for LIBGCC2_UNWIND_ATTRIBUTE [PR104890]

2022-03-15 Thread Richard Biener via Gcc-patches
On Tue, 15 Mar 2022, Jakub Jelinek wrote:

> Hi!
> 
> Regardless of the outcome of the general-regs-only stuff in x86gprintrin.h,
> apparently general-regs-only is much bigger hammer than no-sse, and e.g.
> using 387 instructions in the unwinder isn't a big deal, it never needs
> to realign the stack because of it.
> 
> So, the following patch uses no-sse (and adds no-mmx to it, even when not
> strictly needed).
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, on the latter
> both normally and with -msse2 -mfpmath=sse -mstackrealign in the options and
> --enable-cet, ok for trunk?

OK.

> 2022-03-15  Jakub Jelinek  
> 
>   PR target/104890
>   * config/i386/i386.h (LIBGCC2_UNWIND_ATTRIBUTE): Use no-mmx,no-sse
>   instead of general-regs-only.
> 
> --- gcc/config/i386/i386.h.jj 2022-03-09 15:25:28.355498493 +0100
> +++ gcc/config/i386/i386.h2022-03-14 15:27:33.831976579 +0100
> @@ -2848,10 +2848,10 @@ extern enum attr_cpu ix86_schedule;
>  #define NUM_X86_64_MS_CLOBBERED_REGS 12
>  #endif
>  
> -/* __builtin_eh_return can't handle stack realignment, so restrict to
> -   general regs in 32-bit libgcc functions that call it.  */
> +/* __builtin_eh_return can't handle stack realignment, so disable MMX/SSE
> +   in 32-bit libgcc functions that call it.  */
>  #ifndef __x86_64__
> -#define LIBGCC2_UNWIND_ATTRIBUTE __attribute__((target 
> ("general-regs-only")))
> +#define LIBGCC2_UNWIND_ATTRIBUTE __attribute__((target ("no-mmx,no-sse")))
>  #endif
>  
>  /*
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)


RE: [PATCH] Performance/size improvement to single_use when matching GIMPLE.

2022-03-15 Thread Roger Sayle


Hi Richard,
Interestingly, I've already done a little analysis on the influence of
inlining
in gimple-match-head.cc.  With the new improved/smaller implementation
of single_use there's actually no significant change in code size from
removing
the inline.  Likewise for constant_for_folding and do_valueize/3.

The biggest improvement is from removing inline from get_def, and the
second biggest from do_valueize/2, but removing inline from types_match
is actually a size regression.

The results, sorted on size of gimple_match.o during stage1 therefore
checking the inlining of the host compiler, are:

 12215488   -types_match
 12215456  gimple-match.oconstant_for_folding/do_valueize
3/single_use
 12215080   -do_valueize 2
 12215016   -get_def

I can redo the analysis for stage3, but this was a little more inconvenient.
I do, however, have other ideas for improving the situation... stay tuned.

Cheers,
Roger
--

> -Original Message-
> From: Richard Biener 
> Sent: 15 March 2022 09:18
> To: Roger Sayle 
> Cc: 'GCC Patches' ; 'Marc Glisse'
> 
> Subject: Re: [PATCH] Performance/size improvement to single_use when
> matching GIMPLE.
> 
> On Tue, 15 Mar 2022, Roger Sayle wrote:
> 
> >
> >
> > This patch improves the implementation of single_use as used in code
> >
> > generated from match.pd for patterns using :s.  The current
> > implementation
> >
> > contains the logic "has_zero_uses (t) || has_single_use (t)" which
> >
> > performs a loop over the uses to first check if there are zero
> > non-debug
> >
> > uses [which is rare], then another loop over these uses to check if
> > there
> >
> > is exactly one non-debug use.  This can be better implemented using a
> >
> > single loop.
> >
> >
> >
> > This function is currently inlined over 800 times in gimple-match.cc,
> >
> > whose .o on x86_64-pc-linux-gnu is now up to 30 Mbytes, so speeding up
> >
> > and shrinking this function should help offset the growth in match.pd
> >
> > for GCC 12.
> >
> >
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> >
> > and make -k check with no new failures.  Ok for mainline?
> 
> Note the intent of has_zero_uses () is even simpler - it's the case for
when
> there's no SSA operand info on the stmt (no update_stmt called yet).  More
> precisely it wants to catch the case where the definition of the SSA name
is not
> in the IL.
> 
> I'm not sure if we want to twist the effective semantics at this point (I
guess we
> do not want that), so the patch looks like an improvement.  But may I ask
to
> move the function out of line for even more savings?  Just put it in
gimple-
> match-head.cc and have it not declared inline.  I think we may want to go
as far
> and declare the function 'pure' using ATTRIBUTE_PURE.
> 
> >
> >
> >
> >
> > 2022-03-15  Roger Sayle  
> >
> >
> >
> > gcc/ChangeLog
> >
> > * gimple-match-head.cc (single_use): Implement inline using a
> >
> > single loop.
> >
> >
> >
> > Thanks in advance,
> >
> > Roger
> >
> > --
> >
> >
> >
> >
> 
> --
> Richard Biener 
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)



Re: [PATCH] Ignore (possible) signed zeros in operands of FP comparisons.

2022-03-15 Thread Richard Biener via Gcc-patches
On Tue, Mar 15, 2022 at 9:03 AM Roger Sayle  wrote:
>
>
> > -Original Message-
> > From: Richard Biener 
> > Sent: 15 March 2022 07:29
> > To: Roger Sayle 
> > Cc: GCC Patches 
> > Subject: Re: [PATCH] Ignore (possible) signed zeros in operands of FP
> > comparisons.
> >
> > On Mon, Mar 14, 2022 at 8:26 PM Roger Sayle
> >  wrote:
> > >
> > >
> > > I've been wondering about the possible performance/missed-optimization
> > > impact of my patch for PR middle-end/98420 and similar IEEE
> > > correctness fixes that disable constant folding optimizations when 
> > > worrying
> > about -0.0.
> > > In the common situation where the floating point result is used by a
> > > FP comparison, there's no distinction between +0.0 and -0.0, so some
> > > HONOR_SIGNED_ZEROS optimizations that we'd usually disable, are safe.
> > >
> > > Consider the following interesting example:
> > >
> > > int foo(int x, double y) {
> > > return (x * 0.0) < y;
> > > }
> > >
> > > Although we know that x (when converted to double) can't be NaN or
> > > Inf, we still worry that for negative values of x that (x * 0.0) may
> > > be -0.0 and so perform the multiplication at run-time.  But in this
> > > case, the result of the comparison (-0.0 < y) will be exactly the same
> > > as (+0.0 < y) for any y, hence the above may be safely constant folded to 
> > > "0.0 <
> > y"
> > > avoiding the multiplication at run-time.
> > >
> > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > > and make -k check with no new failures, and allows GCC to continue to
> > > optimize cases that we optimized in GCC 11 (without regard to 
> > > correctness).
> > > Ok for mainline?
> >
> > Isn't that something that gimple-ssa-backprop.c is designed to handle?  I 
> > wonder
> > if you can see whether the signed zero speciality can be retrofitted there?
> > It currently tracks "sign does not matter", so possibly another state, 
> > "sign of
> > zero does not matter" could be introduced there.
>
> Two questions. Would adding tracking of "sign of zero does not matter" to
> gimple-ssa-backprop.c be suitable for stage4?

Probably not.

>  Secondly, even if gimple-ssa-backprop.c
> performed this kind of optimization, would that be a reason not to support
> these transformations in match.pd?

The only reason would be to avoid growing match.pd with lots of special
patterns for cases that should rarely matter in practice.  For example the
pattern at hand wouldn't trigger for (x * 0.0) * z < y which is why I thought
of backprop.  Yes, we do have match.pd patterns with similar issues already.

Basically when the pattern doesn't simplify the outermost expression it
is prone to such issues.

> Perhaps someone could open a missed
> optimization PR for backprop in Bugzilla, but the above patch still needs to 
> be
> reviewed on its own merits.

There's a few other pieces in the patch (didn't look at it before), changing
HONOR_NANS and ltgt, those are OK independently.

One comment, instead of matching both

 (cmp (mult ...) @2)

and

  (cmp @2 (mult ..))

you can use :c on the 'cmp' - it will do the "right" thing (swap the
comparison code)
when matching the other way around.  That will reduce repetition.

>
> Speaking of tree-ssa passes that could be improved, I was wondering whether
> you could review my EVRP patch to fix regression PR/102950.  Pretty please?
> https://gcc.gnu.org/pipermail/gcc-patches/2022-February/589569.html

I've left this to the ranger folks - you may want to ping Andrew here.

Richard.

> Thanks (as always),
> Roger
>
> > Thanks,
> > Richard.
> >
> > >
> > > 2022-03-14  Roger Sayle  
> > >
> > > gcc/ChangeLog
> > > * match.pd (X CMP (Y-Y) -> X CMP 0.0): New transformation.
> > > (X CMP (Y * 0.0) -> X CMP 0.0): Likewise.
> > > (X CMP X -> true): Test tree_expr_maybe_nan_p instead of
> > HONOR_NANS.
> > > (X LTGT X -> false): Enable if X is not tree_expr_maybe_nan_p, as
> > > this can't trap/signal.
> > >
> > > gcc/testsuite/ChangeLog
> > > * gcc.dg/fold-compare-9.c: New test case.
> > >
> > >
> > > Thanks in advance,
> > > Roger
> > > --
> > >
>


[PATCH] riscv: Allow -Wno-psabi to turn off ABI warnings [PR91229]

2022-03-15 Thread Jakub Jelinek via Gcc-patches
Hi!

While checking if all targets honor -Wno-psabi for ABI related warnings
or messages, I found that almost all do, except for riscv.
In the testsuite when we want to ignore ABI related messages we
typically use -Wno-psabi -w, but it would be nice to get rid of those
-w uses eventually.

The following allows silencing those warnings with -Wno-psabi rather than
just -w even on riscv.

Ok for trunk?

2022-03-15  Jakub Jelinek  

PR target/91229
* config/riscv/riscv.cc (riscv_pass_aggregate_in_fpr_pair_p,
riscv_pass_aggregate_in_fpr_and_gpr_p): Pass OPT_Wpsabi instead of 0
to warning calls.

--- gcc/config/riscv/riscv.cc.jj2022-03-07 15:00:17.239592719 +0100
+++ gcc/config/riscv/riscv.cc   2022-03-15 11:20:37.823661044 +0100
@@ -2918,8 +2918,8 @@ riscv_pass_aggregate_in_fpr_pair_p (cons
 
   if ((n_old != n_new) && (warned == 0))
 {
-  warning (0, "ABI for flattened struct with zero-length bit-fields "
-  "changed in GCC 10");
+  warning (OPT_Wpsabi, "ABI for flattened struct with zero-length "
+  "bit-fields changed in GCC 10");
   warned = 1;
 }
 
@@ -2960,8 +2960,8 @@ riscv_pass_aggregate_in_fpr_and_gpr_p (c
   && (num_int_old != num_int_new || num_float_old != num_float_new)))
   && (warned == 0))
 {
-  warning (0, "ABI for flattened struct with zero-length bit-fields "
-  "changed in GCC 10");
+  warning (OPT_Wpsabi, "ABI for flattened struct with zero-length "
+  "bit-fields changed in GCC 10");
   warned = 1;
 }
 

Jakub



[PATCH] rs6000: Fix the check of bif argument number [PR104482]

2022-03-15 Thread Kewen.Lin via Gcc-patches
Hi,

PR104482 is one regression about the handlings on different argument
numbers from its prototype of built-in function.  Without the patch,
the code only catches the case when argument number is more than the
one of prototype, but it ignores the possibility that the number of
arguments can be more as the PR shows.

Bootstrapped and regtested on powerpc64-linux-gnu P8 and
powerpc64le-linux-gnu P9 and P10.

Is it ok for trunk?

BR,
Kewen
--

PR target/104482

gcc/ChangeLog:

* config/rs6000/rs6000-c.cc (altivec_resolve_overloaded_builtin): Fix
the equality check for argument number, and adjust this hunk's
location.

---
 gcc/config/rs6000/rs6000-c.cc | 60 +--
 1 file changed, 30 insertions(+), 30 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc
index d2e480ad7df..2f344e07a40 100644
--- a/gcc/config/rs6000/rs6000-c.cc
+++ b/gcc/config/rs6000/rs6000-c.cc
@@ -1747,6 +1747,36 @@ altivec_resolve_overloaded_builtin (location_t loc, tree 
fndecl,
   vec *arglist = static_cast *> (passed_arglist);
   unsigned int nargs = vec_safe_length (arglist);
 
+  /* If the number of arguments did not match the prototype, return NULL
+ and the generic code will issue the appropriate error message.  Skip
+ this test for functions where we don't fully describe all the possible
+ overload signatures in rs6000-overload.def (because they aren't relevant
+ to the expansion here).  If we don't, we get confusing error messages.  */
+  /* As an example, for vec_splats we have:
+
+; There are no actual builtins for vec_splats.  There is special handling for
+; this in altivec_resolve_overloaded_builtin in rs6000-c.cc, where the call
+; is replaced by a constructor.  The single overload here causes
+; __builtin_vec_splats to be registered with the front end so that can happen.
+[VEC_SPLATS, vec_splats, __builtin_vec_splats]
+  vsi __builtin_vec_splats (vsi);
+ABS_V4SI SPLATS_FAKERY
+
+So even though __builtin_vec_splats accepts all vector types, the
+infrastructure cheats and just records one prototype.  We end up getting
+an error message that refers to this specific prototype even when we
+are handling a different argument type.  That is completely confusing
+to the user, so it's best to let these cases be handled individually
+in the resolve_vec_splats, etc., helper functions.  */
+
+  if (expected_args != nargs
+  && !(fcode == RS6000_OVLD_VEC_PROMOTE
+  || fcode == RS6000_OVLD_VEC_SPLATS
+  || fcode == RS6000_OVLD_VEC_EXTRACT
+  || fcode == RS6000_OVLD_VEC_INSERT
+  || fcode == RS6000_OVLD_VEC_STEP))
+return NULL;
+
   for (n = 0;
!VOID_TYPE_P (TREE_VALUE (fnargs)) && n < nargs;
fnargs = TREE_CHAIN (fnargs), n++)
@@ -1806,36 +1836,6 @@ altivec_resolve_overloaded_builtin (location_t loc, tree 
fndecl,
   types[n] = type;
 }
 
-  /* If the number of arguments did not match the prototype, return NULL
- and the generic code will issue the appropriate error message.  Skip
- this test for functions where we don't fully describe all the possible
- overload signatures in rs6000-overload.def (because they aren't relevant
- to the expansion here).  If we don't, we get confusing error messages.  */
-  /* As an example, for vec_splats we have:
-
-; There are no actual builtins for vec_splats.  There is special handling for
-; this in altivec_resolve_overloaded_builtin in rs6000-c.cc, where the call
-; is replaced by a constructor.  The single overload here causes
-; __builtin_vec_splats to be registered with the front end so that can happen.
-[VEC_SPLATS, vec_splats, __builtin_vec_splats]
-  vsi __builtin_vec_splats (vsi);
-ABS_V4SI SPLATS_FAKERY
-
-So even though __builtin_vec_splats accepts all vector types, the
-infrastructure cheats and just records one prototype.  We end up getting
-an error message that refers to this specific prototype even when we
-are handling a different argument type.  That is completely confusing
-to the user, so it's best to let these cases be handled individually
-in the resolve_vec_splats, etc., helper functions.  */
-
-  if (n != expected_args
-  && !(fcode == RS6000_OVLD_VEC_PROMOTE
-  || fcode == RS6000_OVLD_VEC_SPLATS
-  || fcode == RS6000_OVLD_VEC_EXTRACT
-  || fcode == RS6000_OVLD_VEC_INSERT
-  || fcode == RS6000_OVLD_VEC_STEP))
-return NULL;
-
   /* Some overloads require special handling.  */
   tree returned_expr = NULL;
   resolution res = unresolved;
-- 
2.27.0



PING^1 [PATCH] rs6000: Fix some issues related to Power10 fusion [PR104024]

2022-03-15 Thread Kewen.Lin via Gcc-patches
Hi,

Gentle ping: https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590692.html

BR,
Kewen

on 2022/2/22 10:47 AM, Kewen.Lin via Gcc-patches wrote:
> Hi,
> 
> As PR104024 shows, currently the option -mpower10-fusion isn't guarded
> under -mcpu=power10, so compiler can optimize some patterns unexpectedly.
> As the option is undocumented, this patch just simply unmasks it.
> For some define_insns in fusion.md which have constraint v, they don't
> have the correct conditions there, it can cause ICEs if the modes are
> not supported there.  Besides, it seems better to use BOOL_128 instead
> of VM since the patterns are vector logical operations.
> 
> Bootstrapped and regtested on powerpc64-linux-gnu P8 and
> powerpc64le-linux-gnu P9 and P10.
> 
> Is it ok for trunk?
> 
> BR,
> Kewen
> -
>   PR target/104024
> 
> gcc/ChangeLog:
> 
>   * config/rs6000/fusion.md: Regenerate.
>   * config/rs6000/genfusion.pl: Add the check for define_insns
>   with constraint v, use BOOL_128 instead of VM.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/powerpc/pr104024-1.c: New test.
>   * gcc.target/powerpc/pr104024-2.c: New test.




PING^1 [PATCH] rs6000/test: Adjust p9-vec-length-7 sensitive to unroll [PR103196]

2022-03-15 Thread Kewen.Lin via Gcc-patches
Hi,

Gentle ping: https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590959.html

BR,
Kewen

on 2022/2/28 1:37 PM, Kewen.Lin via Gcc-patches wrote:
> Hi,
> 
> As PR103196 shows, p9-vec-length-full-7.c needs to be adjusted as the
> complete unrolling can happen on some of its loops.  This patch is to
> use pragma "GCC unroll 0" to disable all possible loop unrollings.
> Hope it can help the case not that fragile.
> 
> There are some other p9-vec-length* cases, I noticed that some of them
> use either bigger or unknown loop iteration counts, and
> "p9-vec-length-3*" have considered the effects of complete unrolling.
> So I just leave them alone for now.
> 
> Tested on powerpc64-linux-gnu P8 and powerpc64le-linux-gnu P9 and P10.
> 
> Is it ok for trunk?
> 
> BR,
> Kewen
> -
>   PR testsuite/103196
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/powerpc/p9-vec-length-7.h: Add DO_PRAGMA macro.
>   * gcc.target/powerpc/p9-vec-length-epil-7.c: Use unroll pragma to
>   disable any unrollings.
>   * gcc.target/powerpc/p9-vec-length-full-7.c: Remove useless option.
>   * gcc.target/powerpc/p9-vec-length.h: Likewise.




PING^1 [PATCH] rs6000: Guard bifs {un, }pack_{longdouble, ibm128} under hard float [PR103623]

2022-03-15 Thread Kewen.Lin via Gcc-patches
Hi,

Gentle ping: https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591147.html

BR,
Kewen

on 2022/3/3 10:11 AM, Kewen.Lin via Gcc-patches wrote:
> Hi,
> 
> As PR103623 shows, it's a regression failure due to new built-in
> function framework, previously we guard __builtin_{un,}pack_{longdouble,
> ibm128} built-in functions under hard float, so they are unavailable
> with the given configuration.  While with new bif infrastructure, it
> becomes available and gets ICE due to incomplete supports.
> 
> Segher and Peter pointed out that we should make them available with
> soft float, I agree we can extend them to cover both soft and hard
> float.  But considering it's stage 4 now and this regression is
> classified as P1, also the previous behavior requiring hard float
> aligns with what document [1] says, I guess it may be a good idea to
> fix it with the attached small patch to be consistent with the previous
> behavior.  Then we can extend the functionality in upcoming stage 1.
> 
> Bootstrapped and regtested on powerpc64-linux-gnu P8 and
> powerpc64le-linux-gnu P9 and P10.
> 
> Any thoughts?
> 
> [1] 
> https://gcc.gnu.org/onlinedocs/gcc/Basic-PowerPC-Built-in-Functions-Available-on-ISA-2_002e05.html#Basic-PowerPC-Built-in-Functions-Available-on-ISA-2_002e05
> 
> BR,
> Kewen
> --
>   PR target/103623
> 
> gcc/ChangeLog:
> 
>   * config/rs6000/rs6000-builtins.def (__builtin_pack_longdouble): Add
>   nosoft attribute.
>   (__builtin_unpack_longdouble): Likewise.
>   (__builtin_pack_ibm128): Likewise.
>   (__builtin_unpack_ibm128): Likewise.
> 



PING^1 [PATCH] rs6000: Adjust mov optabs for opaque modes [PR103353]

2022-03-15 Thread Kewen.Lin via Gcc-patches
Hi,

Gentle ping: https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591150.html

BR,
Kewen

on 2022/3/3 4:38 PM, Kewen.Lin via Gcc-patches wrote:
> Hi,
> 
> As PR103353 shows, we may want to continue to expand a MMA built-in
> function like a normal function, even if we have already emitted
> error messages about some missing required conditions.  As shown in
> that PR, without one explicit mov optab on OOmode provided, it would
> call emit_move_insn recursively.
> 
> So this patch is to allow the mov pattern to be generated when we are
> expanding to RTL and have seen errors even without MMA supported, it's
> expected that the generated pattern would not cause further ICEs as the
> compilation would stop soon after expanding.
> 
> Bootstrapped and regtested on powerpc64-linux-gnu P8 and
> powerpc64le-linux-gnu P9 and P10.
> 
> Is it ok for trunk?
> 
> BR,
> Kewen
> --
> 
>   PR target/103353
> 
> gcc/ChangeLog:
> 
>   * config/rs6000/mma.md (define_expand movoo): Move TARGET_MMA condition
>   check to preparation statements and add handlings for !TARGET_MMA.
>   (define_expand movxo): Likewise.



Re: [PATCH] c++: Fix up constexpr evaluation of new with zero sized types [PR104568]

2022-03-15 Thread Jakub Jelinek via Gcc-patches
On Fri, Mar 11, 2022 at 11:28:09PM -0500, Jason Merrill wrote:
> > @@ -7264,9 +7265,66 @@ cxx_eval_constant_expression (const cons
> > DECL_NAME (var)
> >   = (DECL_NAME (var) == heap_uninit_identifier
> >  ? heap_identifier : heap_vec_identifier);
> > +   /* For zero sized elt_type, try to recover how many outer_nelts
> > +  it should have.  */
> > +   if ((cookie_size ? tree_int_cst_equal (var_size, cookie_size)
> > +: integer_zerop (var_size))
> > +   && !int_size_in_bytes (elt_type)
> > +   && TREE_CODE (oldop) == CALL_EXPR
> > +   && call_expr_nargs (oldop) >= 1)
> > + if (tree fun = get_function_named_in_call (oldop))
> > +   if (cxx_replaceable_global_alloc_fn (fun)
> > +   && IDENTIFIER_NEW_OP_P (DECL_NAME (fun)))
> > + {
> > +   tree arg0 = CALL_EXPR_ARG (oldop, 0);
> 
> How about setting var_size to arg0 at this point, and moving the
> decomposition of the size expression into build_new_constexpr_heap_type?

That would be more difficult, because for the cxx_eval_constant_expression
calls we need ctx, non_constant_p and overflow_p arguments, so
build_new_constexpr_heap_type would need to remove that one bool arg
added by this patch but instead pass around those 3 new ones.
As build_new_constexpr_heap_type is called only from 2 spots where the
other one passes NULL as full_size, the decomposition is only useful
for this caller and not the other one.

But if you strongly prefer it that way, I can do that.
Note, probably not 3 new args but 4, depends on whether we could turn
all those cases where the tree arg0 = CALL_EXPR_ARG (oldop, 0);
is done but var_size_adjusted is false into assertion failures.
I'm worried that with the zero size of element we could end up with
a variable number of elements which when multiplied by 0 gives constant 0,
though hopefully that would be rejected earlier during constant evaluation.
> 
> > +   STRIP_NOPS (arg0);
> > +   if (cookie_size)
> > + {
> > +   if (TREE_CODE (arg0) != PLUS_EXPR)
> > + arg0 = NULL_TREE;
> > +   else if (TREE_CODE (TREE_OPERAND (arg0, 0))
> > +== INTEGER_CST
> > +&& tree_int_cst_equal (cookie_size,
> > +   TREE_OPERAND (arg0,
> > + 0)))
> > + {
> > +   arg0 = TREE_OPERAND (arg0, 1);
> > +   STRIP_NOPS (arg0);
> > + }
> > +   else if (TREE_CODE (TREE_OPERAND (arg0, 1))
> > +== INTEGER_CST
> > +&& tree_int_cst_equal (cookie_size,
> > +   TREE_OPERAND (arg0,
> > + 1)))
> > + {
> > +   arg0 = TREE_OPERAND (arg0, 0);
> > +   STRIP_NOPS (arg0);
> > + }
> > +   else
> > + arg0 = NULL_TREE;
> > + }
> > +   if (arg0 && TREE_CODE (arg0) == MULT_EXPR)
> > + {
> > +   tree op0 = TREE_OPERAND (arg0, 0);
> > +   tree op1 = TREE_OPERAND (arg0, 1);
> > +   var_size_adjusted = true;
> > +   if (integer_zerop (op0))
> > + var_size
> > +   = cxx_eval_constant_expression (ctx, op1, false,
> > +   non_constant_p,
> > +   overflow_p);
> > +   else if (integer_zerop (op1))
> > + var_size
> > +   = cxx_eval_constant_expression (ctx, op0, false,
> > +   non_constant_p,
> > +   overflow_p);
> > +   else
> > + var_size_adjusted = false;
> > + }
> > + }
> > TREE_TYPE (var)
> >   = build_new_constexpr_heap_type (elt_type, cookie_size,
> > -  var_size);
> > +  var_size, var_size_adjusted);
> > TREE_TYPE (TREE_OPERAND (op, 0))
> >   = build_pointer_type (TREE_TYPE (var));
> >   }

Jakub



Re: RFA: crc builtin functions & optimizations

2022-03-15 Thread Martin Jambor
Hello,

just one question

On Tue, Mar 15 2022, Joern Rennecke wrote:
> Most microprocessors have efficient ways to perform CRC operations, be
> that with lookup tables, rotates, or even special instructions.
> However, because we lack a representation for CRC in the compiler, we
> can't do proper instruction selection.  With this patch I seek out to
> rectify this,
> I've avoided using a mode name for the built-in functions because that
> would tie the semantics to the size of the addressable unit.  We
> generally use abbreviations like s/l/ll for type names, which is all
> right when the type can be widened without changing semantics.  For
> the data input, however, we also have to consider the shift count that
> is tied to it.  That is why I used a number to designate the width of
> the data input and shift.
>
> For machine support, I made a start with 8 and 16 bit little-endian
> CRC for RISCV using a
> lookup table.  I am sure once we have the basic infrastructure in the
> tree, we'll get more
> contributions of suitable named patterns for various ports.
>
> bootstrapped on x86_64-pc-linux-gnu .
> 2022-03-14  Jon Beniston  
>   Joern Rennecke  
>
>   * Makefile.in (OBJS): Add tree-crc.o .
>   * builtin-types.def (BT_FN_UINT16_UINT16_UINT8_CONST_SIZE): Define.
>   (BT_FN_UINT16_UINT16_UINT16_CONST_SIZE): Likewise.
>   (BT_FN_UINT16_UINT16_UINT32_CONST_SIZE): Likewise.
>   * builtins.cc (associated_internal_fn):
>   Handle BUILT_IN_CRC8S, BUILT_IN_CRC16S, BUILT_IN_CRC32S.
>   * builtins.def (BUILT_IN_CRC8S, BUILT_IN_CRC16S, BUILT_IN_CRC32S):
>   New builtin functions.
>   * cgraph.cc (cgraph_node::verify_node):
>   Allow const calls without a callgraph edge.
>   * common.opt (fcrc): New option.
>   * doc/invoke.texi (-fcrc): Document.
>   * gimple-match-head.cc: #include predict.h .
>   * internal-fn.cc (crc_direct): Define.
>   (expand_crc_optab_fn): New function.
>   (direct_crc_optab_supported_p): Define.
>   * internal-fn.def (CRC, CRC_BE): New internal optab functions.
>   * match.pd: Match a pair of crc operations.
>   * optabs.def (crc_optab, crc_be_optab): New optabs.
>   * passes.def (pass_crc): Add new pass.
>   * tree-crc.cc: New file.
>   * tree-pass.h (make_pass_crc): Declare.
>
> testsuite:
>   * gcc.c-torture/compile/crc.c: New test.
>   * gcc.dg/tree-ssa/crc.c: Likewise.
>   * gcc.dg/tree-ssa/crc-2.c: likewise.
>   * gcc.dg/tree-ssa/pr59597.c: Add flag -fno-crc .
>
> config/riscv:
>   * crc.md: New file.
>   * riscv-protos.h (expand_crc_lookup, print_crc_table): Declare.
>   * riscv.cc (compute_crc): New function.
>   (print_crc_table, expand_crc_lookup): Likewise.
>   * riscv.md: Include crc.md.
>   * riscv.opt (msmall-memory): New option.
>   * tree-crc-doc.txt: New file.
>
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index 31ff95500c9..a901925511b 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -1612,6 +1612,7 @@ OBJS = \
>   tree-cfgcleanup.o \
>   tree-chrec.o \
>   tree-complex.o \
> + tree-crc.o \
>   tree-data-ref.o \
>   tree-dfa.o \
>   tree-diagnostic.o \

[...]

> diff --git a/gcc/cgraph.cc b/gcc/cgraph.cc
> index b923a59ab0c..9570f5121af 100644
> --- a/gcc/cgraph.cc
> +++ b/gcc/cgraph.cc
> @@ -3793,7 +3793,8 @@ cgraph_node::verify_node (void)
>   }
> e->aux = (void *)1;
>   }
> -   else if (decl)
> +   else if (decl
> +&& !TREE_READONLY (decl) && !DECL_PURE_P (decl))
>   {
> error ("missing callgraph edge for call stmt:");
> cgraph_debug_gimple_stmt (this_cfun, stmt);

Why is this is necessary?  It seems that all other built-ins just have a
cgraph_node and their calls a cgraph_edge.

Thanks,

Martin


Re: [PATCH] riscv: Allow -Wno-psabi to turn off ABI warnings [PR91229]

2022-03-15 Thread Kito Cheng via Gcc-patches
Hi Jakub:

LGTM, Thanks!

On Tue, Mar 15, 2022 at 6:57 PM Jakub Jelinek via Gcc-patches
 wrote:
>
> Hi!
>
> While checking if all targets honor -Wno-psabi for ABI related warnings
> or messages, I found that almost all do, except for riscv.
> In the testsuite when we want to ignore ABI related messages we
> typically use -Wno-psabi -w, but it would be nice to get rid of those
> -w uses eventually.
>
> The following allows silencing those warnings with -Wno-psabi rather than
> just -w even on riscv.
>
> Ok for trunk?
>
> 2022-03-15  Jakub Jelinek  
>
> PR target/91229
> * config/riscv/riscv.cc (riscv_pass_aggregate_in_fpr_pair_p,
> riscv_pass_aggregate_in_fpr_and_gpr_p): Pass OPT_Wpsabi instead of 0
> to warning calls.
>
> --- gcc/config/riscv/riscv.cc.jj2022-03-07 15:00:17.239592719 +0100
> +++ gcc/config/riscv/riscv.cc   2022-03-15 11:20:37.823661044 +0100
> @@ -2918,8 +2918,8 @@ riscv_pass_aggregate_in_fpr_pair_p (cons
>
>if ((n_old != n_new) && (warned == 0))
>  {
> -  warning (0, "ABI for flattened struct with zero-length bit-fields "
> -  "changed in GCC 10");
> +  warning (OPT_Wpsabi, "ABI for flattened struct with zero-length "
> +  "bit-fields changed in GCC 10");
>warned = 1;
>  }
>
> @@ -2960,8 +2960,8 @@ riscv_pass_aggregate_in_fpr_and_gpr_p (c
>&& (num_int_old != num_int_new || num_float_old != num_float_new)))
>&& (warned == 0))
>  {
> -  warning (0, "ABI for flattened struct with zero-length bit-fields "
> -  "changed in GCC 10");
> +  warning (OPT_Wpsabi, "ABI for flattened struct with zero-length "
> +  "bit-fields changed in GCC 10");
>warned = 1;
>  }
>
>
> Jakub
>


Re: [PATCH] Fix PR 101515 (ICE in pp_cxx_unqualified_id, at cp/cxx-pretty-print.c:128)

2022-03-15 Thread Jakub Jelinek via Gcc-patches
On Fri, Feb 11, 2022 at 12:27:49PM -0500, Jason Merrill wrote:
> Yes, that's what the above code would correctly do if TYPE were the
> pointer-to-method type.  It's wrong for this case because TYPE is unrelated
> to TREE_TYPE (field).
> 
> I think the problem is just this line:
> 
> > if (tree ret = c_fold_indirect_ref_for_warn (loc, type, cop,
> >  off))
> >   return ret;
> > return cop;
>   ^^
> 
> The recursive call does the proper type checking, but then the "return cop"
> line returns the COMPONENT_REF even though the type check failed. The
> parallel code in cxx_fold_indirect_ref_1 doesn't have this line, and
> removing it fixes the testcase, so I see
> 
> warning: ‘*(ptrmemfunc*)&x.ptrmemfunc::ptr’ is used uninitialized

The intent of r11-6729 is that it prints something that helps user to figure
out what exactly is being accessed.
When we find a unique non-static data member that is being accessed, even
when we can't fold it nicely, IMNSHO it is better to print
  ((sometype *)&var)->field
or
  (*(sometype *)&var).field
instead of
  *(fieldtype *)((char *)&var + 56)
because the user doesn't know what is at offset 56, we shouldn't ask user
to decipher structure layout etc.

One question is if we could return something better for the TYPE_PTRMEMFUNC_FLAG
RECORD_TYPE members here (something that would print it more naturally/readably
in a C++ way), though the fact that the routine is in c-family makes it
harder.

Another one is whether we shouldn't punt for FIELD_DECLs that don't have
nicely printable name of its containing scope, something like:
if (tree scope = get_containing_scope (field))
  if (TYPE_P (scope) && TYPE_NAME (scope) == NULL_TREE)
break;
return cop;
or so.
Note the returned cop is a COMPONENT_REF where the first argument has a
nicely printable type name (x with type sp), but sp's TYPE_MAIN_VARIANT
is the unnamed TYPE_PTRMEMFUNC_FLAG.  So another possibility would be if
we see such a problem for the FIELD_DECL's scope, check if TYPE_MAIN_VARIANT
of the first COMPONENT_REF's argument is equal to that scope and in that
case use TREE_TYPE of the first COMPONENT_REF's argument as the scope
instead.

Jakub



Fwd: RFA: crc builtin functions & optimizations

2022-03-15 Thread Joern Rennecke
Oops, that was meant to go to the list too.


On Tue, 15 Mar 2022 at 01:04, Andrew Pinski  wrote:
>
> On Mon, Mar 14, 2022 at 5:33 PM Joern Rennecke
>  wrote:
> >
> > Most microprocessors have efficient ways to perform CRC operations, be
> > that with lookup tables, rotates, or even special instructions.
> > However, because we lack a representation for CRC in the compiler, we
> > can't do proper instruction selection.  With this patch I seek out to
> > rectify this,
> > I've avoided using a mode name for the built-in functions because that
> > would tie the semantics to the size of the addressable unit.  We
> > generally use abbreviations like s/l/ll for type names, which is all
> > right when the type can be widened without changing semantics.  For
> > the data input, however, we also have to consider the shift count that
> > is tied to it.  That is why I used a number to designate the width of
> > the data input and shift.
> >
> > For machine support, I made a start with 8 and 16 bit little-endian
> > CRC for RISCV using a
> > lookup table.  I am sure once we have the basic infrastructure in the
> > tree, we'll get more
> > contributions of suitable named patterns for various ports.
>
>
> A few points.
> There are at least 9 different polynomials for the CRC-8 in common use today.
> For CRC-32 there are 5 different polynomials used.
> You don't have a patch to invoke.texi adding the descriptions of the builtins.

You are correct that the documentation could use some work, but that part
would go into extend.texi .

> How is your polynom 3rd argument described? Is it similar to how it is
> done on the wiki for the CRC?

It's a constant integer.
I haven't found a CRC in https://gcc.gnu.org/wiki .
If you mean wikipedia.org, they focus mainly on big endian CRC.  I've added
a function code IFN_CRC_BE for it because for completeness it should be
there, but haven't fleshed out anything further around that.  IFN_CRC and
its associated built-in functions are little-endian.  If you look at the start
of the confg/riscv/crc.md patch, there is a comment with a simple C
implementation of crchihi4.

> Does it make sense to have to list the most common polynomials in the
> documentation?

Maybe.  You could give advice on what makes cryptographic sense for
people who want to use CRCs in their code for integrity checks.
Or once some ports with special-purpose instructons are supported,
there could be comments on which polynoms will result in faster operation
because of the specialized expansion for the respective targets.

> Also I am sorry but micro-optimizing coremarks is just wrong.

The claim for that benchmark is that it tests a set of common
operations, including CRC calculations.  Without compiler support,
what we test instead is how well this particular implementation of CRC is
compiled for the target CPU, which can be very different from the actual
CRC computation performance.  So recognizing the CRC computation
helps the benchmark archive the stated goal of gauging CRC computation
performance.

Moreover, since the benchmark is commonly used, this also makes
it a commonly used idiom, and the license allows to copy the code
into your own programs to a large extent.

> Maybe it
> is better to pick the CRC32 that is inside zip instead for a testcase
> and benchmarking against?
> Or even the CRC32C for iSCSI/ext4.

I'm not sure what's inside there, but in principle, the more the merrier.
I had a look at the bzip2 CRC computation, but that's just a table
lookup.  We can recognize table lookups that compute a CRC if the
array is a constant, but there is no point if you haven't either a faster
implementation or want further optimization to be enabled.  Going
there was beyond the scope of my work at this time.

In principle, it would be interesting to do reduction / vectorization of
block CRC computations.  But you have to start with having a
representation for the CRC computations first.

> I see you also don't optimize the case where you have three other
> variants of polynomials that are reversed, reciprocal and reversed
> reciocal.

Do you want to contribute that?

> Also a huge problem, you don't check to make sure the third argument
> to the crc builtin function is constant in the rsicv backend.
Why is that a huge problem?  I see it as a further refinement not yet
added.  Strictly speaking, there is a check, but it's an assert, OTOH
it shouldn't be triggered with the infrstructure as it is now because the
optimizer only looks for a computation with a constant polynom, and
the third argument of the builtin crc functions is BT_CONST_SIZE for
now.  Variable polynoms are interesting, but before we introduce them,
we must make sure that constants remain inside the builtin function,
lest we get severe perfromance degradation if table lookups and
special instructions are not available.

> Plus
> since you expose the crc builtins as a non target specific builtin, I
> assume there should be a libcall right and therefore eit

Re: RFA: crc builtin functions & optimizations

2022-03-15 Thread Joern Rennecke
On Tue, 15 Mar 2022 at 02:17, Oleg Endo  wrote:
> > In my own CRC library I've got ~30 'commonly used' CRC types, based on
> the following generic definition:
> > This being a library makes it relatively easy to tune and customize for
> various systems.

...

> How would that work together with your proposal?

With optabs, you can put in whatever you like into the
machine-specific expansion.

Or if we could put your library-using code into a default expansion that is used
if there's no optab expansion for the modes given, then the target can override
this for machine-specific methods using the optabs, and otherwise use
your library
method in the default expansion.


rs6000 patch ping: [PATCH 8/8] rs6000: Fix some missing built-in attributes [PR104004]

2022-03-15 Thread Jakub Jelinek via Gcc-patches
On Fri, Jan 28, 2022 at 11:50:26AM -0600, Bill Schmidt via Gcc-patches wrote:
> PR104004 caught some misses on my part in converting to the new built-in
> function infrastructure.  In particular, I forgot to mark all of the "nosoft"
> built-ins, and one of those should also have been marked "no32bit".
> 
> Bootstrapped and tested on powerpc64le-linux-gnu with no regressions.
> Is this okay for trunk?
> 
> Thanks,
> Bill
> 
> 
> 2022-01-27  Bill Schmidt  
> 
> gcc/
>   * config/rs6000/rs6000-builtin.def (MFFSL): Mark nosoft.
>   (MTFSB0): Likewise.
>   (MTFSB1): Likewise.
>   (SET_FPSCR_RN): Likewise.
>   (SET_FPSCR_DRN): Mark nosoft and no32bit.

This patch fixes a P1 regression and from my (limited) understanding
doesn't depend on any other patch in the series.

Is this ok for trunk (I agree some testcase coverage would be nice)?

> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
> b/gcc/config/rs6000/rs6000-builtins.def
> index c8f0cf332eb..98619a649e3 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -215,7 +215,7 @@
>  ; processors, this builtin automatically falls back to mffs on older
>  ; platforms.  Thus it appears here in the [always] stanza.
>double __builtin_mffsl ();
> -MFFSL rs6000_mffsl {}
> +MFFSL rs6000_mffsl {nosoft}
>  
>  ; This is redundant with __builtin_pack_ibm128, as it requires long
>  ; double to be __ibm128.  Should probably be deprecated.
> @@ -226,10 +226,10 @@
>  MFTB rs6000_mftb_di {32bit}
>  
>void __builtin_mtfsb0 (const int<0,31>);
> -MTFSB0 rs6000_mtfsb0 {}
> +MTFSB0 rs6000_mtfsb0 {nosoft}
>  
>void __builtin_mtfsb1 (const int<0,31>);
> -MTFSB1 rs6000_mtfsb1 {}
> +MTFSB1 rs6000_mtfsb1 {nosoft}
>  
>void __builtin_mtfsf (const int<0,255>, double);
>  MTFSF rs6000_mtfsf {}
> @@ -238,7 +238,7 @@
>  PACK_IF packif {}
>  
>void __builtin_set_fpscr_rn (const int[0,3]);
> -SET_FPSCR_RN rs6000_set_fpscr_rn {}
> +SET_FPSCR_RN rs6000_set_fpscr_rn {nosoft}
>  
>const double __builtin_unpack_ibm128 (__ibm128, const int<0,1>);
>  UNPACK_IF unpackif {}
> @@ -2969,7 +2969,7 @@
>  PACK_TD packtd {}
>  
>void __builtin_set_fpscr_drn (const int[0,7]);
> -SET_FPSCR_DRN rs6000_set_fpscr_drn {}
> +SET_FPSCR_DRN rs6000_set_fpscr_drn {nosoft,no32bit}
>  
>const unsigned long long __builtin_unpack_dec128 (_Decimal128, \
>  const int<0,1>);
> -- 
> 2.27.0

Jakub



semi-finished patch: dead zero/sign extension elimination

2022-03-15 Thread Joern Rennecke
This misses some documentation and testing, but it appears to work
well with 64 bit RISC-V.

-fext-dce is best used with aggressive unrolling and/or inlining.  It deletes
zero/sign extensiions where the part of the register that the
zero/sign extension
pertains to is dead.

This is not about multi-word registers (although there might be some
overlap on targets
with somewhat narrow words), but mainly about parts of a register within a word.
So, using BITS_LITTLE_ENDIAN nomenclature,  we consider liveness of the lowest
8 bits, i.e. 0..7, the next more significant 8 bits, i.e. bits 8..15,
then bits 16..31, and
finally bits 32..BITS_PER_WORD-1 .

-fext-dce-pre works better for less aggressive optimization, like a
plain -O3.  It inserts
extensions for return values on edges leading to predecessors of the exit block
where a highpart might be live, before performing the same dead
extension elimination
as -fext-dce .
diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 31ff95500c9..6e7ad5ff966 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1374,6 +1374,7 @@ OBJS = \
explow.o \
expmed.o \
expr.o \
+   ext-dce.o \
fibonacci_heap.o \
file-prefix-map.o \
final.o \
diff --git a/gcc/common.opt b/gcc/common.opt
index 8b6513de47c..80833bea285 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -3607,4 +3607,12 @@ fipa-ra
 Common Var(flag_ipa_ra) Optimization
 Use caller save register across calls if possible.
 
+fext-dce
+Common Var(flag_ext_dce, 1) Optimization Init(0)
+Perform dead code elimination on zero and sign extensions with special 
dataflow analysis.
+
+fext-dce-pre
+Common Var(flag_ext_dce, 2)
+Perform dead code elimination on zero and sign extensions with special 
dataflow analysis.  Insert extensions on edges for partial redundancy 
elimination.
+
 ; This comment is to ensure we retain the blank line above.
diff --git a/gcc/df-scan.cc b/gcc/df-scan.cc
index 9b2375d561b..59b0a82dcc9 100644
--- a/gcc/df-scan.cc
+++ b/gcc/df-scan.cc
@@ -78,7 +78,6 @@ static void df_get_eh_block_artificial_uses (bitmap);
 
 static void df_record_entry_block_defs (bitmap);
 static void df_record_exit_block_uses (bitmap);
-static void df_get_exit_block_use_set (bitmap);
 static void df_get_entry_block_def_set (bitmap);
 static void df_grow_ref_info (struct df_ref_info *, unsigned int);
 static void df_ref_chain_delete_du_chain (df_ref);
@@ -3638,7 +3637,7 @@ df_epilogue_uses_p (unsigned int regno)
 
 /* Set the bit for regs that are considered being used at the exit. */
 
-static void
+void
 df_get_exit_block_use_set (bitmap exit_block_uses)
 {
   unsigned int i;
diff --git a/gcc/df.h b/gcc/df.h
index bd329205d08..9807a3e87f9 100644
--- a/gcc/df.h
+++ b/gcc/df.h
@@ -1090,6 +1090,7 @@ extern bool df_epilogue_uses_p (unsigned int);
 extern void df_set_regs_ever_live (unsigned int, bool);
 extern void df_compute_regs_ever_live (bool);
 extern void df_scan_verify (void);
+extern void df_get_exit_block_use_set (bitmap);
 
 
 /*
diff --git a/gcc/ext-dce.cc b/gcc/ext-dce.cc
new file mode 100644
index 000..9d264972c7f
--- /dev/null
+++ b/gcc/ext-dce.cc
@@ -0,0 +1,545 @@
+/* RTL dead zero/sign extension (code) elimination.
+   Copyright (C) 2000-2022 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "rtl.h"
+#include "tree.h"
+#include "memmodel.h"
+#include "insn-config.h"
+#include "emit-rtl.h"
+#include "recog.h"
+#include "cfganal.h"
+#include "tree-pass.h"
+#include "cfgrtl.h"
+#include "rtl-iter.h"
+#include "df.h"
+
+/* We consider four bit groups for liveness:
+   bit 0..7   (least significant byte)
+   bit 8..15  (second least significant byte)
+   bit 16..31
+   bit 32..BITS_PER_WORD-1  */
+
+bitmap
+ext_dce_process_bb (basic_block bb, bitmap livenow, bool modify)
+{
+  rtx_insn *insn;
+
+  FOR_BB_INSNS_REVERSE (bb, insn)
+{
+  subrtx_iterator::array_type array;
+
+  if (!INSN_P (insn))
+   continue;
+
+  bitmap live_tmp = BITMAP_ALLOC (NULL);
+  int seen_fusage = 0;
+
+  /* First, process the sets.  */
+  for (rtx pat = PATTERN (insn);;)
+   {
+ FOR_EACH_SUBRTX (iter, array, pat, NONCONST)
+   {
+ const_rtx x = *iter;

Re: [PATCH] rs6000: Fix invalid address passed to __builtin_mma_disassemble_acc [PR104923]

2022-03-15 Thread Peter Bergner via Gcc-patches
On 3/14/22 10:06 PM, Peter Bergner wrote:
> On 3/14/22 8:24 PM, Segher Boessenkool wrote:
>> You might want to name that common expression, "rtx addr = XEXP (op, 0);"
>> or something.  Dunno what is best
> 
> Will do.
> 
> 
>> Please put that new MEM_P code first, followed by a blank line, and only
>> then do the SUBREG thing.  As written it will allow subregs of mem.  And
>> the blank line is important of course ;-)
> 
> Will do.
> 
> 
>> Okay for trunk with those changes.  Also okay for 10 and 11 after an
>> appropriate soak period.  Thanks!

Testing of the updates came back clean so I pushed the fix.
I'll wait a few days before pushing the backports.  Thanks!

Peter



Re: [PATCH] c++: fold calls to std::move/forward [PR96780]

2022-03-15 Thread Patrick Palka via Gcc-patches
On Mon, 14 Mar 2022, Jason Merrill wrote:

> On 3/14/22 13:13, Patrick Palka wrote:
> > On Fri, 11 Mar 2022, Jason Merrill wrote:
> > 
> > > On 3/10/22 11:27, Patrick Palka wrote:
> > > > On Wed, 9 Mar 2022, Jason Merrill wrote:
> > > > 
> > > > > On 3/1/22 18:08, Patrick Palka wrote:
> > > > > > A well-formed call to std::move/forward is equivalent to a cast, but
> > > > > > the
> > > > > > former being a function call means it comes with bloated debug info,
> > > > > > which
> > > > > > persists even after the call has been inlined away, for an operation
> > > > > > that
> > > > > > is never interesting to debug.
> > > > > > 
> > > > > > This patch addresses this problem in a relatively ad-hoc way by
> > > > > > folding
> > > > > > calls to std::move/forward into casts as part of the frontend's
> > > > > > general
> > > > > > expression folding routine.  After this patch with -O2 and a
> > > > > > non-checking
> > > > > > compiler, debug info size for some testcases decreases by about ~10%
> > > > > > and
> > > > > > overall compile time and memory usage decreases by ~2%.
> > > > > 
> > > > > Impressive.  Which testcases?
> > > > 
> > > > I saw the largest percent reductions in debug file object size in
> > > > various tests from cmcstl2 and range-v3, e.g.
> > > > test/algorithm/set_symmetric_difference4.cpp and .../rotate_copy.cpp
> > > > (which are among their biggest tests).
> > > > 
> > > > Significant reductions in debug object file size can be observed in
> > > > some libstdc++ testcases too, such as a 5.5% reduction in
> > > > std/ranges/adaptor/join.cc
> > > > 
> > > > > 
> > > > > Do you also want to handle addressof and as_const in this patch, as
> > > > > Jonathan
> > > > > suggested?
> > > > 
> > > > Yes, good idea.  Since each of their argument and return types are
> > > > indirect types, I think we can use the same NOP_EXPR-based folding for
> > > > them.
> > > > 
> > > > > 
> > > > > I think we can do this now, and think about generalizing more in stage
> > > > > 1.
> > > > > 
> > > > > > Bootstrapped and regtested on x86_64-pc-linux-gnu, is this something
> > > > > > we
> > > > > > want to consider for GCC 12?
> > > > > > 
> > > > > > PR c++/96780
> > > > > > 
> > > > > > gcc/cp/ChangeLog:
> > > > > > 
> > > > > > * cp-gimplify.cc (cp_fold) : When optimizing,
> > > > > > fold calls to std::move/forward into simple casts.
> > > > > > * cp-tree.h (is_std_move_p, is_std_forward_p): Declare.
> > > > > > * typeck.cc (is_std_move_p, is_std_forward_p): Export.
> > > > > > 
> > > > > > gcc/testsuite/ChangeLog:
> > > > > > 
> > > > > > * g++.dg/opt/pr96780.C: New test.
> > > > > > ---
> > > > > > gcc/cp/cp-gimplify.cc  | 18 ++
> > > > > > gcc/cp/cp-tree.h   |  2 ++
> > > > > > gcc/cp/typeck.cc   |  6 ++
> > > > > > gcc/testsuite/g++.dg/opt/pr96780.C | 24 
> > > > > > 4 files changed, 46 insertions(+), 4 deletions(-)
> > > > > > create mode 100644 gcc/testsuite/g++.dg/opt/pr96780.C
> > > > > > 
> > > > > > diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
> > > > > > index d7323fb5c09..0b009b631c7 100644
> > > > > > --- a/gcc/cp/cp-gimplify.cc
> > > > > > +++ b/gcc/cp/cp-gimplify.cc
> > > > > > @@ -2756,6 +2756,24 @@ cp_fold (tree x)
> > > > > >   case CALL_EXPR:
> > > > > >   {
> > > > > > +   if (optimize
> > > > > 
> > > > > I think this should check flag_no_inline rather than optimize.
> > > > 
> > > > Sounds good.
> > > > 
> > > > Here's a patch that extends the folding to as_const and addressof (as
> > > > well as __addressof, which I'm kind of unsure about since it's
> > > > non-standard).  I suppose it also doesn't hurt to verify that the return
> > > > and argument type of the function are sane before we commit to folding.
> > > > 
> > > > -- >8 --
> > > > 
> > > > Subject: [PATCH] c++: fold calls to std::move/forward [PR96780]
> > > > 
> > > > A well-formed call to std::move/forward is equivalent to a cast, but the
> > > > former being a function call means the compiler generates debug info for
> > > > it, which persists even after the call has been inlined away, for an
> > > > operation that's never interesting to debug.
> > > > 
> > > > This patch addresses this problem in a relatively ad-hoc way by folding
> > > > calls to std::move/forward and other cast-like functions into simple
> > > > casts as part of the frontend's general expression folding routine.
> > > > After this patch with -O2 and a non-checking compiler, debug info size
> > > > for some testcases decreases by about ~10% and overall compile time and
> > > > memory usage decreases by ~2%.
> > > > 
> > > > PR c++/96780
> > > > 
> > > > gcc/cp/ChangeLog:
> > > > 
> > > > * cp-gimplify.cc (cp_fold) : When optimizing,
> > > > fold calls to std::move/forward and other cast-like functions
> > > > into simple casts.
> > > > 
> > > > gcc/test

Re: [PATCH] libstdc++: Ensure that std::from_chars is declared when supported

2022-03-15 Thread Patrick Palka via Gcc-patches
On Mon, 14 Mar 2022, Jonathan Wakely wrote:

> On Mon, 14 Mar 2022 at 14:17, Patrick Palka via Libstdc++
>  wrote:
> >
> > On Fri, 11 Mar 2022, Jonathan Wakely wrote:
> >
> > > Patrick, I think this is right, but please take a look to double check.
> > >
> > > I think we should fix the feature-test macro conditions for gcc-11 too,
> > > although it's a bit more complicated there. It should depend on IEEE
> > > float and double *and* uselocale. We don't need the other changes on the
> > > branch.
> >
> > Don't we still depend on uselocale in GCC 12 for long double from_chars,
> > at least on targets where long double != binary64?
> 
> Not after this patch:
> 
> from_chars(const char* first, const char* last, long double& value,
>   chars_format fmt) noexcept
> {
> -#if _GLIBCXX_FLOAT_IS_IEEE_BINARY32 && _GLIBCXX_DOUBLE_IS_IEEE_BINARY64 \
> -  && ! USE_STRTOD_FOR_FROM_CHARS
> +#if ! USE_STRTOD_FOR_FROM_CHARS
> +  // Either long double is the same as double, or we can't use strtold.
> +  // In the latter case, this might give an incorrect result (e.g. values
> +  // out of range of double give an error, even if they fit in long double).
> 
> If uselocale isn't available, this defines the long double overload in
> terms of the double one, even if that doesn't always give the right
> answers. That greatly simplifies the preprocessor conditions for when
> it's supported. If the float and double forms are present, so is the
> long double one.

Ah sorry, I had overlooked that part of the patch.  Makes sense and LGTM!



Re: [PATCH] rs6000: Improve .machine

2022-03-15 Thread Sebastian Huber

Hello Segher,

On 10/03/2022 11:11, Segher Boessenkool wrote:

On Thu, Mar 10, 2022 at 09:25:21AM +0100, Sebastian Huber wrote:

On 04/03/2022 17:51, Segher Boessenkool wrote:

This adds more correct .machine for most older CPUs.  It should be
conservative in the sense that everything we handled before we handle at
least as well now.  This does not yet revamp the server CPU handling, it
is too risky at this point in time.

Tested on powerpc64-linux {-m32,-m64}.  Also manually tested with all
-mcpu=, and the output of that passed through the GNU assembler.

I plan to commit this later today.

Could this be back ported to GCC 10 and 11? I would fix the following
issue for -mcpu=405:

Error: unrecognized opcode: `dlmzb.'

Good to hear!

Unfortunately there is PR104829 about this commit.  I don't see how the
commit can break anything (that wasn't already broken); it's not clear
how it happens at all, and neither me nor colleagues could reproduce it
so far.

So I won't yet backport it, but first wait what happens here.


now that the PR104829 is fixed could I back port

Segher Boessenkool (2):
  rs6000: Improve .machine
  rs6000: Do not use rs6000_cpu for .machine ppc and ppc64 (PR104829)

to GCC 10 and 11?

--
embedded brains GmbH
Herr Sebastian HUBER
Dornierstr. 4
82178 Puchheim
Germany
email: sebastian.hu...@embedded-brains.de
phone: +49-89-18 94 741 - 16
fax:   +49-89-18 94 741 - 08

Registergericht: Amtsgericht München
Registernummer: HRB 157899
Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler
Unsere Datenschutzerklärung finden Sie hier:
https://embedded-brains.de/datenschutzerklaerung/


Re: [PATCH v8 00/12] Add LoongArch support.

2022-03-15 Thread Xi Ruoyao via Gcc-patches
On Fri, 2022-03-04 at 15:17 +0800, xucheng...@loongson.cn wrote:

> v7 -> v8
> 1. Add new addressing type ADDRESS_REG_REG support.
> 2. Modify documentation.
> 3. Eliminate compile-time warnings.

Hi,

The v8 series does not build LoongArch Linux kernel tree
(https://github.com/loongson/linux, loongarch-next branch) successfully.
This is a regression: the v7 series built the kernel fine.

A testcase reduced from the __get_data_asm macro in uaccess.h:

$ cat t1.c
char *ptr;
int offset;
struct m
{
  char a[2];
};

char
x (void)
{
  char t;
  asm volatile("ld.b %0, %1" : "=r"(t) : "o"(*(struct m *)(ptr + offset)));
  return t;
}

$ ./gcc/cc1 t1.c -nostdinc -O
t1.c: In function ‘x’:
t1.c:12:3: error: impossible constraint in ‘asm’
   12 |   asm volatile("ld.b %0, %1" : "=r"(t) : "o"(*(struct m *)(ptr + 
offset)));
  |   ^~~

It seems changing the constraint "o" to "m" can work around this issue.
I'm not sure if this is a compiler bug or a kernel bug.
-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v2] x86: Also check _SOFT_FLOAT in

2022-03-15 Thread H.J. Lu via Gcc-patches
On Mon, Mar 14, 2022 at 7:31 AM H.J. Lu  wrote:
>
> Push target("general-regs-only") in  if x87 is enabled.
>
> gcc/
>
> PR target/104890
> * config/i386/x86gprintrin.h: Also check _SOFT_FLOAT before
> pushing target("general-regs-only").
>
> gcc/testsuite/
>
> PR target/104890
> * gcc.target/i386/pr104890.c: New test.
> ---
>  gcc/config/i386/x86gprintrin.h   |  2 +-
>  gcc/testsuite/gcc.target/i386/pr104890.c | 11 +++
>  2 files changed, 12 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr104890.c
>
> diff --git a/gcc/config/i386/x86gprintrin.h b/gcc/config/i386/x86gprintrin.h
> index 017ec299793..e0be01d5e78 100644
> --- a/gcc/config/i386/x86gprintrin.h
> +++ b/gcc/config/i386/x86gprintrin.h
> @@ -24,7 +24,7 @@
>  #ifndef _X86GPRINTRIN_H_INCLUDED
>  #define _X86GPRINTRIN_H_INCLUDED
>
> -#if defined __MMX__ || defined __SSE__
> +#if !defined _SOFT_FLOAT || defined __MMX__ || defined __SSE__
>  #pragma GCC push_options
>  #pragma GCC target("general-regs-only")
>  #define __DISABLE_GENERAL_REGS_ONLY__
> diff --git a/gcc/testsuite/gcc.target/i386/pr104890.c 
> b/gcc/testsuite/gcc.target/i386/pr104890.c
> new file mode 100644
> index 000..cb430eef688
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr104890.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile { target ia32 } } */
> +/* { dg-options "-O2 -mshstk -march=i686" } */
> +
> +#include 
> +
> +__attribute__((target ("general-regs-only")))
> +int
> +foo ()
> +{
> +  return _get_ssp ();
> +}
> --
> 2.35.1
>

It also fixed:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99744#c18

Any comments on this patch?

-- 
H.J.


Re: RFA: crc builtin functions & optimizations

2022-03-15 Thread Richard Biener via Gcc-patches
On Tue, Mar 15, 2022 at 1:32 AM Joern Rennecke
 wrote:
>
> Most microprocessors have efficient ways to perform CRC operations, be
> that with lookup tables, rotates, or even special instructions.
> However, because we lack a representation for CRC in the compiler, we
> can't do proper instruction selection.  With this patch I seek out to
> rectify this,
> I've avoided using a mode name for the built-in functions because that
> would tie the semantics to the size of the addressable unit.  We
> generally use abbreviations like s/l/ll for type names, which is all
> right when the type can be widened without changing semantics.  For
> the data input, however, we also have to consider the shift count that
> is tied to it.  That is why I used a number to designate the width of
> the data input and shift.
>
> For machine support, I made a start with 8 and 16 bit little-endian
> CRC for RISCV using a
> lookup table.  I am sure once we have the basic infrastructure in the
> tree, we'll get more
> contributions of suitable named patterns for various ports.

Why's this a new pass?  Every walk over all insns costs time.  The pass
lacks any comments as to what CFG / stmt structure is matched.  From
a quick look it seems like it first(?) statically matches a stmt sequence
without considering intermediate stmts, so matching should be quite
fragile.  Why not match (sub-)expressions with the help of match.pd?

Any reason why you match CRC before early inlinig and thus even when
not optimizing?  Matching at least after early FRE/DCE/DSE would help
to get rid of abstraction and/or memory temporary uses.

> bootstrapped on x86_64-pc-linux-gnu .


[x86 PATCH] PR target/94680: Clear upper bits of V2DF using movq (like V2DI).

2022-03-15 Thread Roger Sayle

This simple i386 patch unblocks a more significant change.  The testcase
gcc.target/i386/sse2-pr94680.c isn't quite testing what's intended, and
alas the fix for PR target/94680 doesn't (yet) handle V2DF mode.

For the first test from sse2-pr94680.c, below

v2df foo_v2df (v2df x) {
  return __builtin_shuffle (x, (v2df) { 0, 0 }, (v2di) { 0, 2 });
}

GCC on x86_64-pc-linux-gnu with -O2 currently generates:

movhpd  .LC0(%rip), %xmm0
ret
.LC0:
.long   0
.long   0

which passes the test as it contains a mov insn and no xor.
Alas reading a zero from the constant pool isn't quite the
desired implementation.  With this patch we now generate:

movq%xmm0, %xmm0
ret

The same code as we generate for V2DI, and add a stricter
test case.  My first attempt tried using VI8F_128 to generalize
the existing sse2_movq128 define_insn to both V2DI and V2DF.
Alas, CODE_FOR_sse2_movq128 is exposed as a builtin in
i386-builtin.def, requiring some internal name changes, that
ultimately the testsuite was unhappy with.  The simpler solution
(that works) is to clone/specialize a new V2DF *sse2_movq128_2.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check with no new failures.  Ok for mainline?


2022-03-15  Roger Sayle  

gcc/ChangeLog
PR target/94680
* config/i386/sse.md (*sse2_movq128_2): A version of sse2_movq128
for V2DF mode.

gcc/testsuite/ChangeLog
PR target/94680
* gcc.target/i386/sse2-pr94680-2.c: New stricter V2DF test case.


Thanks in advance,
Roger
--

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index e9292e6..d017fb8 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -1599,6 +1599,19 @@
(set_attr "prefix" "maybe_vex")
(set_attr "mode" "TI")])
 
+(define_insn "*sse2_movq128_2"
+  [(set (match_operand:V2DF 0 "register_operand" "=v")
+   (vec_concat:V2DF
+ (vec_select:DF
+   (match_operand:V2DF 1 "nonimmediate_operand" "vm")
+   (parallel [(const_int 0)]))
+ (match_operand:DF 2 "const0_operand")))]
+  "TARGET_SSE2"
+  "%vmovq\t{%1, %0|%0, %q1}"
+  [(set_attr "type" "ssemov")
+   (set_attr "prefix" "maybe_vex")
+   (set_attr "mode" "TI")])
+
 ;; Move a DI from a 32-bit register pair (e.g. %edx:%eax) to an xmm.
 ;; We'd rather avoid this entirely; if the 32-bit reg pair was loaded
 ;; from memory, we'd prefer to load the memory directly into the %xmm
diff --git a/gcc/testsuite/gcc.target/i386/sse2-pr94680-2.c 
b/gcc/testsuite/gcc.target/i386/sse2-pr94680-2.c
new file mode 100644
index 000..abd260a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/sse2-pr94680-2.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -msse2" } */
+typedef double v2df __attribute__ ((vector_size (16)));
+typedef long long v2di __attribute__((vector_size(16)));
+
+v2df foo_v2df (v2df x)
+{
+  return __builtin_shuffle (x, (v2df) { 0, 0 }, (v2di) { 0, 2 });
+}
+
+/* { dg-final { scan-assembler "movq" } } */
+/* { dg-final { scan-assembler-not "pxor" } } */
+


Re: RFA: crc builtin functions & optimizations

2022-03-15 Thread Joern Rennecke
On 15/03/2022, Richard Biener  wrote:

> Why's this a new pass?  Every walk over all insns costs time.

If should typically scan considerably less than all the insns.

>  The pass
> lacks any comments as to what CFG / stmt structure is matched.

I've put a file in:
config/riscv/tree-crc-doc.txt

would this text be suitabe to put in a comment block in tree-crc.cc ?

>  From
> a quick look it seems like it first(?) statically matches a stmt sequence
> without considering intermediate stmts, so matching should be quite
> fragile.

It might be fragile inasmuch as it won't match when things change, but
the matching has remained effective for seven years and across two
architecture families with varying word sizes.
And with regards to matching only what it's supposed to match, I believe
I have checked all the data dependencies and phis so that it's definitely
calculating a CRC.

>  Why not match (sub-)expressions with the help of match.pd?

Can you match a loop with match.pd ?

> Any reason why you match CRC before early inlinig and thus even when
> not optimizing?  Matching at least after early FRE/DCE/DSE would help
> to get rid of abstraction and/or memory temporary uses.

I haven't originally placed it there, but I believe benefits include:
- Getting rid of loop without having to actively deleting it in the
crc pass (this also
  might be safer as we just have to make sure we're are computing the CRC, and
  DCE will determine if there is any ancillary result that is left,
and only delete the
  loop if it's really dead.
- The optimized function is available for inlining.


Re: [PATCH] c++: fold calls to std::move/forward [PR96780]

2022-03-15 Thread Jason Merrill via Gcc-patches

On 3/15/22 10:03, Patrick Palka wrote:

On Mon, 14 Mar 2022, Jason Merrill wrote:


On 3/14/22 13:13, Patrick Palka wrote:

On Fri, 11 Mar 2022, Jason Merrill wrote:


On 3/10/22 11:27, Patrick Palka wrote:

On Wed, 9 Mar 2022, Jason Merrill wrote:


On 3/1/22 18:08, Patrick Palka wrote:

A well-formed call to std::move/forward is equivalent to a cast, but
the
former being a function call means it comes with bloated debug info,
which
persists even after the call has been inlined away, for an operation
that
is never interesting to debug.

This patch addresses this problem in a relatively ad-hoc way by
folding
calls to std::move/forward into casts as part of the frontend's
general
expression folding routine.  After this patch with -O2 and a
non-checking
compiler, debug info size for some testcases decreases by about ~10%
and
overall compile time and memory usage decreases by ~2%.


Impressive.  Which testcases?


I saw the largest percent reductions in debug file object size in
various tests from cmcstl2 and range-v3, e.g.
test/algorithm/set_symmetric_difference4.cpp and .../rotate_copy.cpp
(which are among their biggest tests).

Significant reductions in debug object file size can be observed in
some libstdc++ testcases too, such as a 5.5% reduction in
std/ranges/adaptor/join.cc



Do you also want to handle addressof and as_const in this patch, as
Jonathan
suggested?


Yes, good idea.  Since each of their argument and return types are
indirect types, I think we can use the same NOP_EXPR-based folding for
them.



I think we can do this now, and think about generalizing more in stage
1.


Bootstrapped and regtested on x86_64-pc-linux-gnu, is this something
we
want to consider for GCC 12?

PR c++/96780

gcc/cp/ChangeLog:

* cp-gimplify.cc (cp_fold) : When optimizing,
fold calls to std::move/forward into simple casts.
* cp-tree.h (is_std_move_p, is_std_forward_p): Declare.
* typeck.cc (is_std_move_p, is_std_forward_p): Export.

gcc/testsuite/ChangeLog:

* g++.dg/opt/pr96780.C: New test.
---
 gcc/cp/cp-gimplify.cc  | 18 ++
 gcc/cp/cp-tree.h   |  2 ++
 gcc/cp/typeck.cc   |  6 ++
 gcc/testsuite/g++.dg/opt/pr96780.C | 24 
 4 files changed, 46 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/opt/pr96780.C

diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
index d7323fb5c09..0b009b631c7 100644
--- a/gcc/cp/cp-gimplify.cc
+++ b/gcc/cp/cp-gimplify.cc
@@ -2756,6 +2756,24 @@ cp_fold (tree x)
   case CALL_EXPR:
   {
+   if (optimize


I think this should check flag_no_inline rather than optimize.


Sounds good.

Here's a patch that extends the folding to as_const and addressof (as
well as __addressof, which I'm kind of unsure about since it's
non-standard).  I suppose it also doesn't hurt to verify that the return
and argument type of the function are sane before we commit to folding.

-- >8 --

Subject: [PATCH] c++: fold calls to std::move/forward [PR96780]

A well-formed call to std::move/forward is equivalent to a cast, but the
former being a function call means the compiler generates debug info for
it, which persists even after the call has been inlined away, for an
operation that's never interesting to debug.

This patch addresses this problem in a relatively ad-hoc way by folding
calls to std::move/forward and other cast-like functions into simple
casts as part of the frontend's general expression folding routine.
After this patch with -O2 and a non-checking compiler, debug info size
for some testcases decreases by about ~10% and overall compile time and
memory usage decreases by ~2%.

PR c++/96780

gcc/cp/ChangeLog:

* cp-gimplify.cc (cp_fold) : When optimizing,
fold calls to std::move/forward and other cast-like functions
into simple casts.

gcc/testsuite/ChangeLog:

* g++.dg/opt/pr96780.C: New test.
---
gcc/cp/cp-gimplify.cc  | 36 +++-
gcc/testsuite/g++.dg/opt/pr96780.C | 38
++
2 files changed, 73 insertions(+), 1 deletion(-)
create mode 100644 gcc/testsuite/g++.dg/opt/pr96780.C

diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
index d7323fb5c09..efc4c8f0eb9 100644
--- a/gcc/cp/cp-gimplify.cc
+++ b/gcc/cp/cp-gimplify.cc
@@ -2756,9 +2756,43 @@ cp_fold (tree x)
  case CALL_EXPR:
  {
-   int sv = optimize, nw = sv;
tree callee = get_callee_fndecl (x);
+   /* "Inline" calls to std::move/forward and other cast-like
functions
+  by simply folding them into the corresponding cast determined by
+  their return type.  This is cheaper than relying on the middle-end
+  to do so, and also means we avoid generating useless debug info for
+  them at all.
+
+  At this point the argument has already been converted into a

Re: [PATCH v2] middle-end/104854: Limit strncmp overread warnings

2022-03-15 Thread Martin Sebor via Gcc-patches

On 3/14/22 23:31, Siddhesh Poyarekar wrote:

The size argument in strncmp only describe the maximum length to which
to compare two strings and is not an indication of sizes of the two
source strings.  Do not warn if it is larger than the two input strings
because it is entirely likely that the size argument is a conservative
maximum to accommodate inputs of different lengths and only a subset is
reachable through the current code path or that it is some other
application-specific property completely unrelated to the sizes of the
input strings.


The strncmp function takes arrays as arguments (not necessarily
strings).  The main purpose of the -Wstringop-overread warning
for calls to it is to detect calls where one of the arrays is
not a nul-terminated string and the bound is larger than the size
of the array.  For example:

  char a[4], b[4];

  int f (void)
  {
return strncmp (a, b, 8);   // -Wstringop-overread
  }

Such a call is suspect: if one of the arrays isn't nul-terminated
the call is undefined.  Otherwise, if both are nul-terminated there
is no point in calling strncmp with a bound greater than their sizes.

With no evidence that this warning is ever harmful I'd consider
suppressing it a regression.  Since the warning is a deliberate
feature in a released compiler and GCC is now in a regression
fixing stage, this patch is out of scope even if a case where
the warning wasn't helpful did turn up (none has been reported
so far).



gcc/ChangeLog:

middle-end/104854
* gimple-ssa-warn-access.cc
(pass_waccess::warn_zero_sized_strncmp_inputs): New function.
(pass_waccess::check_strncmp): Use it.

gcc/testsuite/ChangeLog:

middle-end/104854
* gcc.dg/Wstringop-overread.c (test_strncmp_array): Don't expect
failures for non-zero sizes.

Signed-off-by: Siddhesh Poyarekar 
---

Changes from v1:

A little better approach, ensuring that it tries to warn on zero length
inputs if the size of at least one of the two sources is known.

Also cc'ing Martin so that we can discuss approach on the list instead
of on the bug.  To summarize the discussion so far, Martin suggests that
the warning be split into levels but I'm contesting the utility of the
heuristics as a compiler warning given the looseness of the relationship
between the size argument and the inputs in the case of these functions.


Thanks for CC'ing me.  The motivating example in pr104854 that we have
been discussing there involves strndup with a string literal.  That's
an entirely different case than the one your patch changes, and I don't
understand in what way you think they are related.

Martin




  gcc/gimple-ssa-warn-access.cc | 69 +--
  gcc/testsuite/gcc.dg/Wstringop-overread.c |  2 +-
  2 files changed, 28 insertions(+), 43 deletions(-)

diff --git a/gcc/gimple-ssa-warn-access.cc b/gcc/gimple-ssa-warn-access.cc
index 75297ed7c9e..15299770e29 100644
--- a/gcc/gimple-ssa-warn-access.cc
+++ b/gcc/gimple-ssa-warn-access.cc
@@ -2137,6 +2137,9 @@ private:
/* Return true if use follows an invalidating statement.  */
bool use_after_inval_p (gimple *, gimple *, bool = false);
  
+  /* Emit an overread warning for zero sized inputs to strncmp.  */

+  void warn_zero_sized_strncmp_inputs (gimple *, tree *, access_data *);
+
/* A pointer_query object to store information about pointers and
   their targets in.  */
pointer_query m_ptr_qry;
@@ -2619,8 +2622,20 @@ pass_waccess::check_stxncpy (gcall *stmt)
data.mode, &data, m_ptr_qry.rvals);
  }
  
-/* Check a call STMT to stpncpy() or strncpy() for overflow and warn

-   if it does.  */
+/* Warn for strncmp on a zero sized source or when an argument isn't
+   nul-terminated.  */
+void
+pass_waccess::warn_zero_sized_strncmp_inputs (gimple *stmt, tree *bndrng,
+ access_data *pad)
+{
+  tree func = get_callee_fndecl (stmt);
+  location_t loc = gimple_location (stmt);
+  maybe_warn_for_bound (OPT_Wstringop_overread, loc, stmt, func, bndrng,
+   size_zero_node, pad);
+}
+
+/* Check a call STMT to strncmp () for overflow and warn if it does.  This is
+   limited to checking for NUL terminated arrays for now.  */
  
  void

  pass_waccess::check_strncmp (gcall *stmt)
@@ -2678,46 +2693,16 @@ pass_waccess::check_strncmp (gcall *stmt)
if (!bndrng[0] || integer_zerop (bndrng[0]))
  return;
  
-  if (len1 && tree_int_cst_lt (len1, bndrng[0]))

-bndrng[0] = len1;
-  if (len2 && tree_int_cst_lt (len2, bndrng[0]))
-bndrng[0] = len2;
-
-  /* compute_objsize almost never fails (and ultimately should never
- fail).  Don't bother to handle the rare case when it does.  */
-  if (!compute_objsize (arg1, stmt, 1, &adata1.src, &m_ptr_qry)
-  || !compute_objsize (arg2, stmt, 1, &adata2.src, &m_ptr_qry))
-return;
-
-  /* Compute the size of the remaining space in each array after
- subtracting any offset into it.  

Re: [PATCH] Fix PR 101515 (ICE in pp_cxx_unqualified_id, at cp/cxx-pretty-print.c:128)

2022-03-15 Thread Jason Merrill via Gcc-patches

On 3/15/22 08:32, Jakub Jelinek wrote:

On Fri, Feb 11, 2022 at 12:27:49PM -0500, Jason Merrill wrote:

Yes, that's what the above code would correctly do if TYPE were the
pointer-to-method type.  It's wrong for this case because TYPE is unrelated
to TREE_TYPE (field).

I think the problem is just this line:


 if (tree ret = c_fold_indirect_ref_for_warn (loc, type, cop,
  off))
   return ret;
 return cop;

   ^^

The recursive call does the proper type checking, but then the "return cop"
line returns the COMPONENT_REF even though the type check failed. The
parallel code in cxx_fold_indirect_ref_1 doesn't have this line, and
removing it fixes the testcase, so I see

warning: ‘*(ptrmemfunc*)&x.ptrmemfunc::ptr’ is used uninitialized


The intent of r11-6729 is that it prints something that helps user to figure
out what exactly is being accessed.
When we find a unique non-static data member that is being accessed, even
when we can't fold it nicely, IMNSHO it is better to print
   ((sometype *)&var)->field
or
   (*(sometype *)&var).field
instead of
   *(fieldtype *)((char *)&var + 56)
because the user doesn't know what is at offset 56, we shouldn't ask user
to decipher structure layout etc.


The problem is that the reference is *not* to any non-static data 
member, it's to the PMF as a whole.  But c_fold_indirect_ref_for_warn 
wrongly turns it into a reference to the first non-static data member.


We asked c_fold_indirect_ref_warn to fold a MEM_REF with RECORD_TYPE, 
and it gave us back a COMPONENT_REF with POINTER_TYPE.  That seems 
clearly wrong.



One question is if we could return something better for the TYPE_PTRMEMFUNC_FLAG
RECORD_TYPE members here (something that would print it more naturally/readably
in a C++ way), though the fact that the routine is in c-family makes it
harder.

Another one is whether we shouldn't punt for FIELD_DECLs that don't have
nicely printable name of its containing scope, something like:
if (tree scope = get_containing_scope (field))
  if (TYPE_P (scope) && TYPE_NAME (scope) == NULL_TREE)
break;
return cop;
or so.
Note the returned cop is a COMPONENT_REF where the first argument has a
nicely printable type name (x with type sp), but sp's TYPE_MAIN_VARIANT
is the unnamed TYPE_PTRMEMFUNC_FLAG.  So another possibility would be if
we see such a problem for the FIELD_DECL's scope, check if TYPE_MAIN_VARIANT
of the first COMPONENT_REF's argument is equal to that scope and in that
case use TREE_TYPE of the first COMPONENT_REF's argument as the scope
instead.

Jakub





Re: [PATCH] Fix PR 101515 (ICE in pp_cxx_unqualified_id, at cp/cxx-pretty-print.c:128)

2022-03-15 Thread Jakub Jelinek via Gcc-patches
On Tue, Mar 15, 2022 at 11:57:22AM -0400, Jason Merrill wrote:
> > The intent of r11-6729 is that it prints something that helps user to figure
> > out what exactly is being accessed.
> > When we find a unique non-static data member that is being accessed, even
> > when we can't fold it nicely, IMNSHO it is better to print
> >((sometype *)&var)->field
> > or
> >(*(sometype *)&var).field
> > instead of
> >*(fieldtype *)((char *)&var + 56)
> > because the user doesn't know what is at offset 56, we shouldn't ask user
> > to decipher structure layout etc.
> 
> The problem is that the reference is *not* to any non-static data member,
> it's to the PMF as a whole.  But c_fold_indirect_ref_for_warn wrongly turns
> it into a reference to the first non-static data member.
> 
> We asked c_fold_indirect_ref_warn to fold a MEM_REF with RECORD_TYPE, and it
> gave us back a COMPONENT_REF with POINTER_TYPE.  That seems clearly wrong.

That is not what I see on the testcase.
I see the outer c_fold_indirect_ref_for_warn call with type ptrmemfunc
which is a 64-bit RECORD_TYPE containing a single ptr member which has
pointer to function type, and op which is the x VAR_DECL with sp type which
is 128-bit RECORD_TYPE containing 64-bit __pfn member and 64-bit __delta
member.
As all the bits of the ptrmemfunc RECORD_TYPE fit within the __pfn member
(they are equal size), it wants to print (cast)(something.__pfn).

Jakub



Re: [PATCH] wwwdocs: fedora-devel-list archives changes

2022-03-15 Thread Jonathan Wakely via Gcc-patches

On 12/03/22 22:55 +0100, Gerald Pfeifer wrote:

I have *NOT* pushed this yet, looking for feedback:

It appears redhat.com has lost Fedora mailing list archives, which are
now at lists.fedoraproject.org using completely different tooling.

Jakub, is there a better way than the patch below?


This looks right to me, I don't think there's a better way to link to
those archives.



Gerald

diff --git a/htdocs/gcc-4.3/porting_to.html b/htdocs/gcc-4.3/porting_to.html
index 630290ce..5301729f 100644
--- a/htdocs/gcc-4.3/porting_to.html
+++ b/htdocs/gcc-4.3/porting_to.html
@@ -527,7 +527,7 @@ svn diff -r529854:529855 
http://svn.apache.org/repos/asf/ant/core/trunk/src/main


Jakub Jelinek,
-https://listman.redhat.com/archives/fedora-devel-list/2008-January/msg00128.html";>
+https://lists.fedoraproject.org/archives/list/de...@lists.fedoraproject.org/thread/WV3KUDEP2JNOWGWES42RQZFYFNLFLAMJ/";>
Mass rebuild status with gcc-4.3.0-0.4 of rawhide-20071220






Re: [PATCH v2] middle-end/104854: Limit strncmp overread warnings

2022-03-15 Thread Siddhesh Poyarekar

On 15/03/2022 21:09, Martin Sebor wrote:

The strncmp function takes arrays as arguments (not necessarily
strings).  The main purpose of the -Wstringop-overread warning
for calls to it is to detect calls where one of the arrays is
not a nul-terminated string and the bound is larger than the size
of the array.  For example:

   char a[4], b[4];

   int f (void)
   {
     return strncmp (a, b, 8);   // -Wstringop-overread
   }

Such a call is suspect: if one of the arrays isn't nul-terminated
the call is undefined.  Otherwise, if both are nul-terminated there


Isn't "suspect" too harsh a description though?  The bound does not 
specify the size of a or b, it specifies the maximum extent to which to 
compare a and b, the extent being any application-specific limit.  In 
fact the limit could be the size of some arbitrary third buffer that the 
contents of a or b must be copied to, truncating to the bound.


I agree the call is undefined if one of the arrays is not nul-terminated 
and that's the thing; nothing about the bound is undefined in this 
context, it's the NUL termination that is key.



is no point in calling strncmp with a bound greater than their sizes.


There is, when the bound describes something else, e.g. the size of a 
third destination buffer into which one of the input buffers may get 
copied into.  Or when the bound describes the maximum length of a set of 
strings where only a subset of the strings are reachable in the current 
function and ranger sees it, allowing us to reduce our input string size 
estimate.  The bounds being the maximum of the lengths of two input 
strings is just one of many possibilities.



With no evidence that this warning is ever harmful I'd consider


There is, the false positives were seen in Fedora/RHEL builds.


suppressing it a regression.  Since the warning is a deliberate
feature in a released compiler and GCC is now in a regression
fixing stage, this patch is out of scope even if a case where
the warning wasn't helpful did turn up (none has been reported
so far).


Wait, I just reported an issue and it's across multiple packages in 
Fedora/RHEL :)


I think this is a regression since gcc 11 due to misunderstanding the 
specification and assuming too strong a relationship between the size 
argument of strncmp (and indeed strnlen and strndup) and the size of 
objects being passed to it.  Compliant code relies on the compiler to do 
the right thing here, i.e. optimize the strncmp call to strcmp and not 
panic about the size argument being larger than the input buffer size. 
If at all such a diagnostic needs to stay, it ought to go into the 
analyzer, where such looser heuristic suggestions are more acceptable 
and sometimes even appreciated.


FWIW, I'm open to splitting the warning levels as you suggested if 
that's the consensus since it at least provides a way to make these 
warnings saner. However I still haven't found the rationale presented so 
far compelling enough to justify these false positives; I just don't see 
a proportional enough reward.  Hopefully more people can chime in with 
their perspective on this.


Thanks,
Siddhesh


[PATCH] OpenMP, Fortran: Bugfix for omp_set_num_teams.

2022-03-15 Thread Marcel Vollweiler

Hi,

This patch fixes a small bug for omp_set_num_teams in fortran.c.

Marcel
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
OpenMP, Fortran: Bugfix for omp_set_num_teams.

This patch fixes a small bug in the omp_set_num_teams implementation.

libgomp/ChangeLog:

* fortran.c (omp_set_num_teams_8_): Fix bug.

diff --git a/libgomp/fortran.c b/libgomp/fortran.c
index 8c1cfd1..d984ce5 100644
--- a/libgomp/fortran.c
+++ b/libgomp/fortran.c
@@ -491,7 +491,7 @@ omp_set_num_teams_ (const int32_t *num_teams)
 void
 omp_set_num_teams_8_ (const int64_t *num_teams)
 {
-  omp_set_max_active_levels (TO_INT (*num_teams));
+  omp_set_num_teams (TO_INT (*num_teams));
 }
 
 int32_t


Re: [PATCH] c++: fold calls to std::move/forward [PR96780]

2022-03-15 Thread Patrick Palka via Gcc-patches
On Tue, 15 Mar 2022, Jason Merrill wrote:

> On 3/15/22 10:03, Patrick Palka wrote:
> > On Mon, 14 Mar 2022, Jason Merrill wrote:
> > 
> > > On 3/14/22 13:13, Patrick Palka wrote:
> > > > On Fri, 11 Mar 2022, Jason Merrill wrote:
> > > > 
> > > > > On 3/10/22 11:27, Patrick Palka wrote:
> > > > > > On Wed, 9 Mar 2022, Jason Merrill wrote:
> > > > > > 
> > > > > > > On 3/1/22 18:08, Patrick Palka wrote:
> > > > > > > > A well-formed call to std::move/forward is equivalent to a cast,
> > > > > > > > but
> > > > > > > > the
> > > > > > > > former being a function call means it comes with bloated debug
> > > > > > > > info,
> > > > > > > > which
> > > > > > > > persists even after the call has been inlined away, for an
> > > > > > > > operation
> > > > > > > > that
> > > > > > > > is never interesting to debug.
> > > > > > > > 
> > > > > > > > This patch addresses this problem in a relatively ad-hoc way by
> > > > > > > > folding
> > > > > > > > calls to std::move/forward into casts as part of the frontend's
> > > > > > > > general
> > > > > > > > expression folding routine.  After this patch with -O2 and a
> > > > > > > > non-checking
> > > > > > > > compiler, debug info size for some testcases decreases by about
> > > > > > > > ~10%
> > > > > > > > and
> > > > > > > > overall compile time and memory usage decreases by ~2%.
> > > > > > > 
> > > > > > > Impressive.  Which testcases?
> > > > > > 
> > > > > > I saw the largest percent reductions in debug file object size in
> > > > > > various tests from cmcstl2 and range-v3, e.g.
> > > > > > test/algorithm/set_symmetric_difference4.cpp and .../rotate_copy.cpp
> > > > > > (which are among their biggest tests).
> > > > > > 
> > > > > > Significant reductions in debug object file size can be observed in
> > > > > > some libstdc++ testcases too, such as a 5.5% reduction in
> > > > > > std/ranges/adaptor/join.cc
> > > > > > 
> > > > > > > 
> > > > > > > Do you also want to handle addressof and as_const in this patch,
> > > > > > > as
> > > > > > > Jonathan
> > > > > > > suggested?
> > > > > > 
> > > > > > Yes, good idea.  Since each of their argument and return types are
> > > > > > indirect types, I think we can use the same NOP_EXPR-based folding
> > > > > > for
> > > > > > them.
> > > > > > 
> > > > > > > 
> > > > > > > I think we can do this now, and think about generalizing more in
> > > > > > > stage
> > > > > > > 1.
> > > > > > > 
> > > > > > > > Bootstrapped and regtested on x86_64-pc-linux-gnu, is this
> > > > > > > > something
> > > > > > > > we
> > > > > > > > want to consider for GCC 12?
> > > > > > > > 
> > > > > > > > PR c++/96780
> > > > > > > > 
> > > > > > > > gcc/cp/ChangeLog:
> > > > > > > > 
> > > > > > > > * cp-gimplify.cc (cp_fold) : When 
> > > > > > > > optimizing,
> > > > > > > > fold calls to std::move/forward into simple casts.
> > > > > > > > * cp-tree.h (is_std_move_p, is_std_forward_p): Declare.
> > > > > > > > * typeck.cc (is_std_move_p, is_std_forward_p): Export.
> > > > > > > > 
> > > > > > > > gcc/testsuite/ChangeLog:
> > > > > > > > 
> > > > > > > > * g++.dg/opt/pr96780.C: New test.
> > > > > > > > ---
> > > > > > > >  gcc/cp/cp-gimplify.cc  | 18 ++
> > > > > > > >  gcc/cp/cp-tree.h   |  2 ++
> > > > > > > >  gcc/cp/typeck.cc   |  6 ++
> > > > > > > >  gcc/testsuite/g++.dg/opt/pr96780.C | 24
> > > > > > > > 
> > > > > > > >  4 files changed, 46 insertions(+), 4 deletions(-)
> > > > > > > >  create mode 100644 gcc/testsuite/g++.dg/opt/pr96780.C
> > > > > > > > 
> > > > > > > > diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
> > > > > > > > index d7323fb5c09..0b009b631c7 100644
> > > > > > > > --- a/gcc/cp/cp-gimplify.cc
> > > > > > > > +++ b/gcc/cp/cp-gimplify.cc
> > > > > > > > @@ -2756,6 +2756,24 @@ cp_fold (tree x)
> > > > > > > >case CALL_EXPR:
> > > > > > > >{
> > > > > > > > +   if (optimize
> > > > > > > 
> > > > > > > I think this should check flag_no_inline rather than optimize.
> > > > > > 
> > > > > > Sounds good.
> > > > > > 
> > > > > > Here's a patch that extends the folding to as_const and addressof
> > > > > > (as
> > > > > > well as __addressof, which I'm kind of unsure about since it's
> > > > > > non-standard).  I suppose it also doesn't hurt to verify that the
> > > > > > return
> > > > > > and argument type of the function are sane before we commit to
> > > > > > folding.
> > > > > > 
> > > > > > -- >8 --
> > > > > > 
> > > > > > Subject: [PATCH] c++: fold calls to std::move/forward [PR96780]
> > > > > > 
> > > > > > A well-formed call to std::move/forward is equivalent to a cast, but
> > > > > > the
> > > > > > former being a function call means the compiler generates debug info
> > > > > > for
> > > > > > it, which persists even after the call has been inlined away, for an
> > > > > > operation that's never interes

[wwwdocs] cxx-dr-status: Update from C++ Core Language Issue TOC, Revision 108

2022-03-15 Thread Marek Polacek via Gcc-patches
It was high time I updated our C++ DR table.

Pushed.

---
 htdocs/projects/cxx-dr-status.html | 153 +++--
 1 file changed, 122 insertions(+), 31 deletions(-)

diff --git a/htdocs/projects/cxx-dr-status.html 
b/htdocs/projects/cxx-dr-status.html
index b49a97f2..63ec6d51 100644
--- a/htdocs/projects/cxx-dr-status.html
+++ b/htdocs/projects/cxx-dr-status.html
@@ -8770,7 +8770,7 @@
 
 
   https://wg21.link/cwg1249";>1249
-  DR
+  DRWP
   Cv-qualification of nested lambda capture
   ?
   
@@ -12099,7 +12099,7 @@
 
 
   https://wg21.link/cwg1724";>1724
-  DR
+  DRWP
   Unclear rules for deduction failure
   ?
   
@@ -12111,11 +12111,11 @@
   ?
   
 
-
+
   https://wg21.link/cwg1726";>1726
-  ready
+  DR
   Declarator operators and conversion function
-  -
+  No
   https://gcc.gnu.org/PR79318";>PR79318
 
 
@@ -12162,7 +12162,7 @@
 
 
   https://wg21.link/cwg1733";>1733
-  DR
+  DRWP
   Return type and value for operator= with 
ref-qualifier
   ?
   
@@ -17217,7 +17217,7 @@
 
 
   https://wg21.link/cwg2455";>2455
-  accepted
+  WP
   Concatenation of string literals vs translation phases 5 and 6
   ?
   
@@ -17406,7 +17406,7 @@
 
 
   https://wg21.link/cwg2482";>2482
-  accepted
+  WP
   bit_cast and indeterminate values
   ?
   
@@ -17420,7 +17420,7 @@
 
 
   https://wg21.link/cwg2484";>2484
-  DR
+  DRWP
   char8_t and char16_t in integral 
promotions
   ?
   
@@ -17434,7 +17434,7 @@
 
 
   https://wg21.link/cwg2486";>2486
-  DR
+  DRWP
   Call to noexcept function via 
noexcept(false) pointer/lvalue
   ?
   
@@ -17462,14 +17462,14 @@
 
 
   https://wg21.link/cwg2490";>2490
-  DR
+  DRWP
   Restrictions on destruction in constant expressions
   ?
   
 
 
   https://wg21.link/cwg2491";>2491
-  DR
+  DRWP
   Export of typedef after its first declaration
   ?
   
@@ -17488,11 +17488,11 @@
   -
   Dup of issue 1670
 
-
+
   https://wg21.link/cwg2494";>2494
-  ready
+  DR
   Multiple definitions of non-odr-used entities
-  -
+  ?
   
 
 
@@ -17504,7 +17504,7 @@
 
 
   https://wg21.link/cwg2496";>2496
-  DR
+  DRWP
   ref-qualifiers and virtual overriding
   ?
   
@@ -17525,9 +17525,9 @@
 
 
   https://wg21.link/cwg2499";>2499
-  ready
+  DR
   Inconsistency in definition of pointer-interconvertibility
-  -
+  ?
   
 
 
@@ -17544,11 +17544,11 @@
   -
   
 
-
+
   https://wg21.link/cwg2502";>2502
-  ready
+  accepted
   Unintended declaration conflicts in nested statement scopes
-  -
+  ?
   
 
 
@@ -17572,11 +17572,11 @@
   -
   
 
-
+
   https://wg21.link/cwg2506";>2506
-  ready
+  DR
   Structured bindings and array cv-qualifiers
-  -
+  ?
   
 
 
@@ -17593,11 +17593,11 @@
   ?
   
 
-
+
   https://wg21.link/cwg2509";>2509
-  ready
+  DR
   decl-specifier-seq in lambda-specifiers
-  -
+  ?
   
 
 
@@ -17607,11 +17607,11 @@
   -
   
 
-
+
   https://wg21.link/cwg2511";>2511
-  ready
+  DR
   cv-qualified bit-fields
-  -
+  ?
   
 
 
@@ -17656,11 +17656,102 @@
   -
   
 
+
+  https://wg21.link/cwg2518";>2518
+  open
+  Conformance requirements and #error/#warning
+  -
+  
+
+
+  https://wg21.link/cwg2519";>2519
+  open
+  Object representation of a bit-field
+  -
+  
+
+
+  https://wg21.link/cwg2520";>2520
+  open
+  Template signature and default template arguments
+  -
+  
+
+
+  https://wg21.link/cwg2521";>2521
+  open
+  User-defined literals and reserved identifiers
+  -
+  
+
+
+  https://wg21.link/cwg2522";>2522
+  open
+  Removing placemarker tokens and retention of whitespace
+  -
+  
+
+
+  https://wg21.link/cwg2523";>2523
+  open
+  Undefined behavior via omitted destructor call in constant 
expressions
+  -
+  
+
+
+  https://wg21.link/cwg2524";>2524
+  open
+  Distinguishing user-defined conversion sequences by 
ref-qualifier
+  -
+  
+
+
+  https://wg21.link/cwg2525";>2525
+  open
+  Incorrect definition of implicit conversion sequence
+  -
+  
+
+
+  https://wg21.link/cwg2526";>2526
+  open
+  Relational comparison of void* pointers
+  -
+  
+
+
+  https://wg21.link/cwg2527";>2527
+  open
+  Non-class potentially-over

[PATCH] c++: further lookup_member simplification

2022-03-15 Thread Patrick Palka via Gcc-patches
As a followup to r12-7656-gffe9c0a0d3564a, this minor patch condenses
the handling of ambiguity and access w.r.t. the value of 'protect' so
that it more clearly matches the function comment.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

gcc/cp/ChangeLog:

* search.cc (lookup_member): Simplify by handling all values
of protect at once in case of ambiguous lookup.  Don't modify
protect.
---
 gcc/cp/search.cc | 32 +---
 1 file changed, 13 insertions(+), 19 deletions(-)

diff --git a/gcc/cp/search.cc b/gcc/cp/search.cc
index 85e3e7cb487..b86b3a24080 100644
--- a/gcc/cp/search.cc
+++ b/gcc/cp/search.cc
@@ -1168,27 +1168,21 @@ lookup_member (tree xbasetype, tree name, int protect, 
bool want_type,
   if (rval_binfo)
 type = BINFO_TYPE (rval_binfo);
 
-  /* If we are not interested in ambiguities, don't report them;
- just return NULL_TREE.  */
-  if (!protect && lfi.ambiguous)
-return NULL_TREE;
-
-  if (protect == 2)
-{
-  if (lfi.ambiguous)
-   return lfi.ambiguous;
-  else
-   protect = 0;
-}
-
-  if (protect == 1 && lfi.ambiguous)
+  if (lfi.ambiguous)
 {
-  if (complain & tf_error)
+  if (protect == 0)
+   return NULL_TREE;
+  else if (protect == 1)
{
- error ("request for member %qD is ambiguous", name);
- print_candidates (lfi.ambiguous);
+ if (complain & tf_error)
+   {
+ error ("request for member %qD is ambiguous", name);
+ print_candidates (lfi.ambiguous);
+   }
+ return error_mark_node;
}
-  return error_mark_node;
+  else if (protect == 2)
+   return lfi.ambiguous;
 }
 
   if (!rval)
@@ -1213,7 +1207,7 @@ lookup_member (tree xbasetype, tree name, int protect, 
bool want_type,
 
 only the first call to "f" is valid.  However, if the function is
 static, we can check.  */
-  if (protect && !really_overloaded_fn (rval))
+  if (protect == 1 && !really_overloaded_fn (rval))
 {
   tree decl = is_overloaded_fn (rval) ? get_first_fn (rval) : rval;
   decl = strip_using_decl (decl);
-- 
2.35.1.500.gb896f729e2



Re: [PATCH] OpenMP, Fortran: Bugfix for omp_set_num_teams.

2022-03-15 Thread Jakub Jelinek via Gcc-patches
On Tue, Mar 15, 2022 at 06:05:48PM +0100, Marcel Vollweiler wrote:
> Hi,
> 
> This patch fixes a small bug for omp_set_num_teams in fortran.c.
> 
> Marcel
> -
> Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
> München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
> Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
> München, HRB 106955

> OpenMP, Fortran: Bugfix for omp_set_num_teams.
> 
> This patch fixes a small bug in the omp_set_num_teams implementation.
> 
> libgomp/ChangeLog:
> 
>   * fortran.c (omp_set_num_teams_8_): Fix bug.

Thanks for spotting this, but would be nice to cover it with
a testcase.

! { dg-do run }
! { dg-additional-options "-fdefault-integer-8" }

program set_num_teams_8
  use omp_lib
  omp_set_num_teams (42)
  if (omp_get_num_teams () .ne. 42) stop 1
end program

or so would IMHO do it, please test that it FAILs without your fortran.c
fix and succeeds with it.

Ok for trunk with that change.

> diff --git a/libgomp/fortran.c b/libgomp/fortran.c
> index 8c1cfd1..d984ce5 100644
> --- a/libgomp/fortran.c
> +++ b/libgomp/fortran.c
> @@ -491,7 +491,7 @@ omp_set_num_teams_ (const int32_t *num_teams)
>  void
>  omp_set_num_teams_8_ (const int64_t *num_teams)
>  {
> -  omp_set_max_active_levels (TO_INT (*num_teams));
> +  omp_set_num_teams (TO_INT (*num_teams));
>  }
>  
>  int32_t


Jakub



[PATCH] aarch64: Fix up RTL sharing bug in aarch64_load_symref_appropriately [PR104910]

2022-03-15 Thread Jakub Jelinek via Gcc-patches
Hi!

We unshare all RTL created during expansion, but when
aarch64_load_symref_appropriately is called after expansion like in the
following testcases, we use imm in both HIGH and LO_SUM operands.
If imm is some RTL that shouldn't be shared like a non-sharable CONST,
we get at least with --enable-checking=rtl a checking ICE, otherwise might
just get silently wrong code.

The following patch fixes that by copying it if it can't be shared.

Bootstrapped/regtested on aarch64-linux, ok for trunk?

2022-03-15  Jakub Jelinek  

PR target/104910
* config/aarch64/aarch64.cc (aarch64_load_symref_appropriately): Copy
imm rtx.

* gcc.dg/pr104910.c: New test.

--- gcc/config/aarch64/aarch64.cc.jj2022-02-22 10:38:02.404689359 +0100
+++ gcc/config/aarch64/aarch64.cc   2022-03-14 12:42:00.218975192 +0100
@@ -3971,7 +3971,7 @@ aarch64_load_symref_appropriately (rtx d
if (can_create_pseudo_p ())
  tmp_reg = gen_reg_rtx (mode);
 
-   emit_move_insn (tmp_reg, gen_rtx_HIGH (mode, imm));
+   emit_move_insn (tmp_reg, gen_rtx_HIGH (mode, copy_rtx (imm)));
emit_insn (gen_add_losym (dest, tmp_reg, imm));
return;
   }
--- gcc/testsuite/gcc.dg/pr104910.c.jj  2022-03-14 12:46:20.983327114 +0100
+++ gcc/testsuite/gcc.dg/pr104910.c 2022-03-14 12:46:06.064535794 +0100
@@ -0,0 +1,14 @@
+/* PR target/104910 */
+/* { dg-do compile } */
+/* { dg-options "-Os -fno-forward-propagate" } */
+/* { dg-additional-options "-fstack-protector-all" { target fstack_protector } 
} */
+
+void
+bar (void);
+
+void
+foo (int x)
+{
+  if (x)
+bar ();
+}

Jakub



Re: [PATCH] rs6000: Allow using -mlong-double-64 after -mabi={ibm,ieee}longdouble [PR104208, PR87496]

2022-03-15 Thread Peter Bergner via Gcc-patches
On 3/4/22 8:14 PM, Peter Bergner wrote:
> On 3/4/22 11:33 AM, Peter Bergner wrote:
>>> Ok pushed to trunk.  I haven't determined yet whether we need this on GCC 
>>> 11 yet.
>>> I'll check on that and report back.  Thanks!
>>
>> I've confirmed that GCC 11 fails the same way and that the backported patch
>> fixes the issue there too.  Ok for GCC 11 assuming my full regression testing
>> is clean?
>>
>> GCC 10 has the same checking code, so it looks to need the backport as well.
>> I'll go ahead and backport and regression test it there too.
> 
> The backports to GCC 11 and GCC 10 bootstrapped and regtested with no 
> regressions.
> Ok for the GCC 11 and GCC 10 release branches after a day or two of baking on
> trunk?

Ping.

The trunk patch has been confirmed to fix the glibc build errors and no issues
with the patch has surfaced, so ok for the GCC11 and GCC10 release branches?

Peter





Re: RFA: crc builtin functions & optimizations

2022-03-15 Thread Joern Rennecke
On 15/03/2022, Richard Biener  wrote:

> Why's this a new pass?  Every walk over all insns costs time.  The pass
> lacks any comments as to what CFG / stmt structure is matched.  From
> a quick look it seems like it first(?) statically matches a stmt sequence
> without considering intermediate stmts, so matching should be quite
> fragile.  Why not match (sub-)expressions with the help of match.pd?

Thinking about this a bit more, I suppose I could change the match.pd
framework to allow to set a bit or add a list element for a basic block where
an expression match is found.  That wouldn't make it any simpler - on the
contrary, much more complicated, since there need to be another check
for the same expression that makes sure all the inputs and outputs line up
with the other basic blocks constituting the loop - but it could avoid scanning
functions that don't have anything that looks like a match in a separate pass.

The proper check and actual transformation would still have to be in its own
pass, but that could return immediately if no expression match for a starting
block was found.
It'd have to be early enough, though, to happen before all inlining
and unrolling,
since both operations would hinder recognition, and we also want them applied
to outer loops / inlining functions after the transformation of the
crc computing
loop into a built-in function.
I suppose if no gimple pass is early enough, we could resort to use a
generic match.


Re: [PATCH] c++: Fix up constexpr evaluation of new with zero sized types [PR104568]

2022-03-15 Thread Jason Merrill via Gcc-patches

On 3/15/22 07:44, Jakub Jelinek wrote:

On Fri, Mar 11, 2022 at 11:28:09PM -0500, Jason Merrill wrote:

@@ -7264,9 +7265,66 @@ cxx_eval_constant_expression (const cons
DECL_NAME (var)
  = (DECL_NAME (var) == heap_uninit_identifier
 ? heap_identifier : heap_vec_identifier);
+   /* For zero sized elt_type, try to recover how many outer_nelts
+  it should have.  */
+   if ((cookie_size ? tree_int_cst_equal (var_size, cookie_size)
+: integer_zerop (var_size))
+   && !int_size_in_bytes (elt_type)
+   && TREE_CODE (oldop) == CALL_EXPR
+   && call_expr_nargs (oldop) >= 1)
+ if (tree fun = get_function_named_in_call (oldop))
+   if (cxx_replaceable_global_alloc_fn (fun)
+   && IDENTIFIER_NEW_OP_P (DECL_NAME (fun)))
+ {
+   tree arg0 = CALL_EXPR_ARG (oldop, 0);


How about setting var_size to arg0 at this point, and moving the
decomposition of the size expression into build_new_constexpr_heap_type?


That would be more difficult, because for the cxx_eval_constant_expression
calls we need ctx, non_constant_p and overflow_p arguments, so
build_new_constexpr_heap_type would need to remove that one bool arg
added by this patch but instead pass around those 3 new ones.
As build_new_constexpr_heap_type is called only from 2 spots where the
other one passes NULL as full_size, the decomposition is only useful
for this caller and not the other one.

But if you strongly prefer it that way, I can do that.
Note, probably not 3 new args but 4, depends on whether we could turn
all those cases where the tree arg0 = CALL_EXPR_ARG (oldop, 0);
is done but var_size_adjusted is false into assertion failures.
I'm worried that with the zero size of element we could end up with
a variable number of elements which when multiplied by 0 gives constant 0,
though hopefully that would be rejected earlier during constant evaluation.


Or we could move all the adjustment into a separate function and only 
ever pass the number of elements to build_new_constexpr_heap_type?



+   STRIP_NOPS (arg0);
+   if (cookie_size)
+ {
+   if (TREE_CODE (arg0) != PLUS_EXPR)
+ arg0 = NULL_TREE;
+   else if (TREE_CODE (TREE_OPERAND (arg0, 0))
+== INTEGER_CST
+&& tree_int_cst_equal (cookie_size,
+   TREE_OPERAND (arg0,
+ 0)))
+ {
+   arg0 = TREE_OPERAND (arg0, 1);
+   STRIP_NOPS (arg0);
+ }
+   else if (TREE_CODE (TREE_OPERAND (arg0, 1))
+== INTEGER_CST
+&& tree_int_cst_equal (cookie_size,
+   TREE_OPERAND (arg0,
+ 1)))
+ {
+   arg0 = TREE_OPERAND (arg0, 0);
+   STRIP_NOPS (arg0);
+ }
+   else
+ arg0 = NULL_TREE;
+ }
+   if (arg0 && TREE_CODE (arg0) == MULT_EXPR)
+ {
+   tree op0 = TREE_OPERAND (arg0, 0);
+   tree op1 = TREE_OPERAND (arg0, 1);
+   var_size_adjusted = true;
+   if (integer_zerop (op0))
+ var_size
+   = cxx_eval_constant_expression (ctx, op1, false,
+   non_constant_p,
+   overflow_p);
+   else if (integer_zerop (op1))
+ var_size
+   = cxx_eval_constant_expression (ctx, op0, false,
+   non_constant_p,
+   overflow_p);
+   else
+ var_size_adjusted = false;
+ }
+ }
TREE_TYPE (var)
  = build_new_constexpr_heap_type (elt_type, cookie_size,
-  var_size);
+  var_size, var_size_adjusted);
TREE_TYPE (TREE_OPERAND (op, 0))
  = build_pointer_type (TREE_TYPE (var));
  }


Jakub





Re: [PATCH v2] middle-end/104854: Limit strncmp overread warnings

2022-03-15 Thread Martin Sebor via Gcc-patches

On 3/15/22 10:40, Siddhesh Poyarekar wrote:

On 15/03/2022 21:09, Martin Sebor wrote:

The strncmp function takes arrays as arguments (not necessarily
strings).  The main purpose of the -Wstringop-overread warning
for calls to it is to detect calls where one of the arrays is
not a nul-terminated string and the bound is larger than the size
of the array.  For example:

   char a[4], b[4];

   int f (void)
   {
 return strncmp (a, b, 8);   // -Wstringop-overread
   }

Such a call is suspect: if one of the arrays isn't nul-terminated
the call is undefined.  Otherwise, if both are nul-terminated there


Isn't "suspect" too harsh a description though?  The bound does not 
specify the size of a or b, it specifies the maximum extent to which to 
compare a and b, the extent being any application-specific limit.  In 
fact the limit could be the size of some arbitrary third buffer that the 
contents of a or b must be copied to, truncating to the bound.


The intended use of the strncmp bound is to limit the comparison to
at most the size of the arrays or (in a subset of cases) the length
of an initial substring. Providing an arbitrary bound that's not
related to the sizes as you describe sounds very much like a misuse.

As a historical note, strncmp was first introduced in UNIX v7 where
its purpose, alongside strncpy, was to manipulate (potentially)
unterminated character arrays like file names stored in fixed size
arrays (typically 14 bytes).  Strncpy would fill the buffers with
ASCII data up to their size and pad the rest with nuls only if there
was room.

Strncmp was then used to compare these potentially unterminated
character arrays (e.g., archive headers in ld and ranlib).  The bound
was the size of the fixed size array.  Its other use case was to compare
leading portions of strings (e.g, when looking for an environment
variable or when stripping "./" from path names).

Since the early UNIX days, both strncpy and to a lesser extent strncmp
have been widely misused and, along with many other functions in
, a frequent source of bugs due to common misunderstanding
of their intended purpose.  The aim of these warnings is to detect
the common (and sometimes less common) misuses and bugs.

I agree the call is undefined if one of the arrays is not nul-terminated 
and that's the thing; nothing about the bound is undefined in this 
context, it's the NUL termination that is key.



is no point in calling strncmp with a bound greater than their sizes.


There is, when the bound describes something else, e.g. the size of a 
third destination buffer into which one of the input buffers may get 
copied into.  Or when the bound describes the maximum length of a set of 
strings where only a subset of the strings are reachable in the current 
function and ranger sees it, allowing us to reduce our input string size 
estimate.  The bounds being the maximum of the lengths of two input 
strings is just one of many possibilities.



With no evidence that this warning is ever harmful I'd consider


There is, the false positives were seen in Fedora/RHEL builds.


I haven't seen these so I can't very well comment on them.  But I can
assure you that warning for the code above is intentional.  Whether
or not the arrays are nul-terminated, the expected way to call
the function is with a bound no greater than their size (some coding
guidelines are explicit about this; see for example the CERT C Secure
Coding standard rule ARR38-C).

(Granted, the manual makes it sound like -Wstringop-overread only
detects provable past-the-end reads.  That's a mistake in
the documentation that should be fixed.  The warning was never quite
so limited, nor was it intended to be.)

Martin




suppressing it a regression.  Since the warning is a deliberate
feature in a released compiler and GCC is now in a regression
fixing stage, this patch is out of scope even if a case where
the warning wasn't helpful did turn up (none has been reported
so far).


Wait, I just reported an issue and it's across multiple packages in 
Fedora/RHEL :)


I think this is a regression since gcc 11 due to misunderstanding the 
specification and assuming too strong a relationship between the size 
argument of strncmp (and indeed strnlen and strndup) and the size of 
objects being passed to it.  Compliant code relies on the compiler to do 
the right thing here, i.e. optimize the strncmp call to strcmp and not 
panic about the size argument being larger than the input buffer size. 
If at all such a diagnostic needs to stay, it ought to go into the 
analyzer, where such looser heuristic suggestions are more acceptable 
and sometimes even appreciated.


FWIW, I'm open to splitting the warning levels as you suggested if 
that's the consensus since it at least provides a way to make these 
warnings saner. However I still haven't found the rationale presented so 
far compelling enough to justify these false positives; I just don't see 
a proportional enough reward.  Hopefully more peop

Re: [PATCH] c++: fold calls to std::move/forward [PR96780]

2022-03-15 Thread Jason Merrill via Gcc-patches

On 3/15/22 13:09, Patrick Palka wrote:

On Tue, 15 Mar 2022, Jason Merrill wrote:


On 3/15/22 10:03, Patrick Palka wrote:

On Mon, 14 Mar 2022, Jason Merrill wrote:


On 3/14/22 13:13, Patrick Palka wrote:

On Fri, 11 Mar 2022, Jason Merrill wrote:


On 3/10/22 11:27, Patrick Palka wrote:

On Wed, 9 Mar 2022, Jason Merrill wrote:


On 3/1/22 18:08, Patrick Palka wrote:

A well-formed call to std::move/forward is equivalent to a cast,
but
the
former being a function call means it comes with bloated debug
info,
which
persists even after the call has been inlined away, for an
operation
that
is never interesting to debug.

This patch addresses this problem in a relatively ad-hoc way by
folding
calls to std::move/forward into casts as part of the frontend's
general
expression folding routine.  After this patch with -O2 and a
non-checking
compiler, debug info size for some testcases decreases by about
~10%
and
overall compile time and memory usage decreases by ~2%.


Impressive.  Which testcases?


I saw the largest percent reductions in debug file object size in
various tests from cmcstl2 and range-v3, e.g.
test/algorithm/set_symmetric_difference4.cpp and .../rotate_copy.cpp
(which are among their biggest tests).

Significant reductions in debug object file size can be observed in
some libstdc++ testcases too, such as a 5.5% reduction in
std/ranges/adaptor/join.cc



Do you also want to handle addressof and as_const in this patch,
as
Jonathan
suggested?


Yes, good idea.  Since each of their argument and return types are
indirect types, I think we can use the same NOP_EXPR-based folding
for
them.



I think we can do this now, and think about generalizing more in
stage
1.


Bootstrapped and regtested on x86_64-pc-linux-gnu, is this
something
we
want to consider for GCC 12?

PR c++/96780

gcc/cp/ChangeLog:

* cp-gimplify.cc (cp_fold) : When optimizing,
fold calls to std::move/forward into simple casts.
* cp-tree.h (is_std_move_p, is_std_forward_p): Declare.
* typeck.cc (is_std_move_p, is_std_forward_p): Export.

gcc/testsuite/ChangeLog:

* g++.dg/opt/pr96780.C: New test.
---
  gcc/cp/cp-gimplify.cc  | 18 ++
  gcc/cp/cp-tree.h   |  2 ++
  gcc/cp/typeck.cc   |  6 ++
  gcc/testsuite/g++.dg/opt/pr96780.C | 24

  4 files changed, 46 insertions(+), 4 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/opt/pr96780.C

diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
index d7323fb5c09..0b009b631c7 100644
--- a/gcc/cp/cp-gimplify.cc
+++ b/gcc/cp/cp-gimplify.cc
@@ -2756,6 +2756,24 @@ cp_fold (tree x)
case CALL_EXPR:
{
+   if (optimize


I think this should check flag_no_inline rather than optimize.


Sounds good.

Here's a patch that extends the folding to as_const and addressof
(as
well as __addressof, which I'm kind of unsure about since it's
non-standard).  I suppose it also doesn't hurt to verify that the
return
and argument type of the function are sane before we commit to
folding.

-- >8 --

Subject: [PATCH] c++: fold calls to std::move/forward [PR96780]

A well-formed call to std::move/forward is equivalent to a cast, but
the
former being a function call means the compiler generates debug info
for
it, which persists even after the call has been inlined away, for an
operation that's never interesting to debug.

This patch addresses this problem in a relatively ad-hoc way by
folding
calls to std::move/forward and other cast-like functions into simple
casts as part of the frontend's general expression folding routine.
After this patch with -O2 and a non-checking compiler, debug info
size
for some testcases decreases by about ~10% and overall compile time
and
memory usage decreases by ~2%.

PR c++/96780

gcc/cp/ChangeLog:

* cp-gimplify.cc (cp_fold) : When optimizing,
fold calls to std::move/forward and other cast-like functions
into simple casts.

gcc/testsuite/ChangeLog:

* g++.dg/opt/pr96780.C: New test.
---
 gcc/cp/cp-gimplify.cc  | 36
+++-
 gcc/testsuite/g++.dg/opt/pr96780.C | 38
++
 2 files changed, 73 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/opt/pr96780.C

diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
index d7323fb5c09..efc4c8f0eb9 100644
--- a/gcc/cp/cp-gimplify.cc
+++ b/gcc/cp/cp-gimplify.cc
@@ -2756,9 +2756,43 @@ cp_fold (tree x)
   case CALL_EXPR:
   {
-   int sv = optimize, nw = sv;
tree callee = get_callee_fndecl (x);
 +  /* "Inline" calls to std::move/forward and other cast-like
functions
+  by simply folding them into the corresponding cast
determined by
+  their return type.  This is cheaper than relying on the
middle-end
+  to do so, and also means we avoid generating useless debug
info for

Re: [PATCH] c++: further lookup_member simplification

2022-03-15 Thread Jason Merrill via Gcc-patches

On 3/15/22 13:18, Patrick Palka wrote:

As a followup to r12-7656-gffe9c0a0d3564a, this minor patch condenses
the handling of ambiguity and access w.r.t. the value of 'protect' so
that it more clearly matches the function comment.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?


OK.


gcc/cp/ChangeLog:

* search.cc (lookup_member): Simplify by handling all values
of protect at once in case of ambiguous lookup.  Don't modify
protect.
---
  gcc/cp/search.cc | 32 +---
  1 file changed, 13 insertions(+), 19 deletions(-)

diff --git a/gcc/cp/search.cc b/gcc/cp/search.cc
index 85e3e7cb487..b86b3a24080 100644
--- a/gcc/cp/search.cc
+++ b/gcc/cp/search.cc
@@ -1168,27 +1168,21 @@ lookup_member (tree xbasetype, tree name, int protect, 
bool want_type,
if (rval_binfo)
  type = BINFO_TYPE (rval_binfo);
  
-  /* If we are not interested in ambiguities, don't report them;

- just return NULL_TREE.  */
-  if (!protect && lfi.ambiguous)
-return NULL_TREE;
-
-  if (protect == 2)
-{
-  if (lfi.ambiguous)
-   return lfi.ambiguous;
-  else
-   protect = 0;
-}
-
-  if (protect == 1 && lfi.ambiguous)
+  if (lfi.ambiguous)
  {
-  if (complain & tf_error)
+  if (protect == 0)
+   return NULL_TREE;
+  else if (protect == 1)
{
- error ("request for member %qD is ambiguous", name);
- print_candidates (lfi.ambiguous);
+ if (complain & tf_error)
+   {
+ error ("request for member %qD is ambiguous", name);
+ print_candidates (lfi.ambiguous);
+   }
+ return error_mark_node;
}
-  return error_mark_node;
+  else if (protect == 2)
+   return lfi.ambiguous;
  }
  
if (!rval)

@@ -1213,7 +1207,7 @@ lookup_member (tree xbasetype, tree name, int protect, 
bool want_type,
  
  only the first call to "f" is valid.  However, if the function is

  static, we can check.  */
-  if (protect && !really_overloaded_fn (rval))
+  if (protect == 1 && !really_overloaded_fn (rval))
  {
tree decl = is_overloaded_fn (rval) ? get_first_fn (rval) : rval;
decl = strip_using_decl (decl);




[PATCH] Pass PKG_CONFIG_PATH down from top-level Makefile

2022-03-15 Thread Simon Marchi via Gcc-patches
From: Simon Marchi 

[Sending to binutils, gdb-patches and gcc-patches, since it touches the
top-level Makefile/configure]

I have my debuginfod library installed in a non-standard location
(/opt/debuginfod), which requires me to set
PKG_CONFIG_PATH=/opt/debuginfod/lib/pkg-config.  If I just set it during
configure:

$ PKG_CONFIG_PATH=/opt/debuginfod/lib/pkg-config ./configure 
--with-debuginfod
$ make

or

$ ./configure --with-debuginfod 
PKG_CONFIG_PATH=/opt/debuginfod/lib/pkg-config
$ make

Then PKG_CONFIG_PATH is only present (and ignored) during the top-level
configure.  When running make (which runs gdb's and binutils'
configure), PKG_CONFIG_PATH is not set, which results in their configure
script not finding the library:

checking for libdebuginfod >= 0.179... no
configure: error: "--with-debuginfod was given, but libdebuginfod is 
missing or unusable."

Change the top-level configure/Makefile system to capture the value
passed when configuring the top-level and pass it down to
subdirectories (similar to CFLAGS, LDFLAGS, etc).

I don't know much about the top-level build system, so I really don't
know if I did this correctly.  The changes are:

 - Use AC_SUBST(PKG_CONFIG_PATH) in configure.ac, so that
   @PKG_CONFIG_PATH@ gets replaced with the actual PKG_CONFIG_PATH value
   in config files (i.e. Makefile)
 - Add a PKG_CONFIG_PATH Makefile variable in Makefile.tpl, initialized
   to @PKG_CONFIG_PATH@
 - Add PKG_CONFIG_PATH to HOST_EXPORTS in Makefile.tpl, which are the
   variables set when running the sub-configures

I initially added PKG_CONFIG_PATH to flags_to_pass, in Makefile.def, but
I don't think it's needed.  AFAIU, this defines the flags to pass down
when calling "make" in subdirectories.  We only need PKG_CONFIG_PATH to
be passed down during configure.  After that, it's captured in
gdb/config.status, so even if a "make" causes a re-configure later
(because gdb/configure has changed, for example), the PKG_CONFIG_PATH
value will be remembered.

ChangeLog:

* configure.ac: Add AC_SUBST(PKG_CONFIG_PATH).
* configure: Re-generate.
* Makefile.tpl (HOST_EXPORTS): Pass PKG_CONFIG_PATH.
(PKG_CONFIG_PATH): New.
* Makefile.in: Re-generate.

Change-Id: I91138dfca41c43b05e53e445f62e4b27882536bf
---
 Makefile.in  | 3 +++
 Makefile.tpl | 3 +++
 configure| 2 ++
 configure.ac | 1 +
 4 files changed, 9 insertions(+)

diff --git a/Makefile.in b/Makefile.in
index 3aacd2daac9c..cb39e4790d69 100644
--- a/Makefile.in
+++ b/Makefile.in
@@ -218,6 +218,7 @@ HOST_EXPORTS = \
OBJCOPY="$(OBJCOPY)"; export OBJCOPY; \
OBJDUMP="$(OBJDUMP)"; export OBJDUMP; \
OTOOL="$(OTOOL)"; export OTOOL; \
+   PKG_CONFIG_PATH="$(PKG_CONFIG_PATH)"; export PKG_CONFIG_PATH; \
READELF="$(READELF)"; export READELF; \
AR_FOR_TARGET="$(AR_FOR_TARGET)"; export AR_FOR_TARGET; \
AS_FOR_TARGET="$(AS_FOR_TARGET)"; export AS_FOR_TARGET; \
@@ -444,6 +445,8 @@ LIBCXXFLAGS = $(CXXFLAGS) -fno-implicit-templates
 GOCFLAGS = $(CFLAGS)
 GDCFLAGS = $(CFLAGS)
 
+PKG_CONFIG_PATH = @PKG_CONFIG_PATH@
+
 # Pass additional PGO and LTO compiler options to the PGO build.
 BUILD_CFLAGS = $(PGO_BUILD_CFLAGS) $(PGO_BUILD_LTO_CFLAGS)
 override CFLAGS += $(BUILD_CFLAGS)
diff --git a/Makefile.tpl b/Makefile.tpl
index 9df77788345a..88db8f44d53f 100644
--- a/Makefile.tpl
+++ b/Makefile.tpl
@@ -221,6 +221,7 @@ HOST_EXPORTS = \
OBJCOPY="$(OBJCOPY)"; export OBJCOPY; \
OBJDUMP="$(OBJDUMP)"; export OBJDUMP; \
OTOOL="$(OTOOL)"; export OTOOL; \
+   PKG_CONFIG_PATH="$(PKG_CONFIG_PATH)"; export PKG_CONFIG_PATH; \
READELF="$(READELF)"; export READELF; \
AR_FOR_TARGET="$(AR_FOR_TARGET)"; export AR_FOR_TARGET; \
AS_FOR_TARGET="$(AS_FOR_TARGET)"; export AS_FOR_TARGET; \
@@ -447,6 +448,8 @@ LIBCXXFLAGS = $(CXXFLAGS) -fno-implicit-templates
 GOCFLAGS = $(CFLAGS)
 GDCFLAGS = $(CFLAGS)
 
+PKG_CONFIG_PATH = @PKG_CONFIG_PATH@
+
 # Pass additional PGO and LTO compiler options to the PGO build.
 BUILD_CFLAGS = $(PGO_BUILD_CFLAGS) $(PGO_BUILD_LTO_CFLAGS)
 override CFLAGS += $(BUILD_CFLAGS)
diff --git a/configure b/configure
index 26935ebda249..1badcb314f8f 100755
--- a/configure
+++ b/configure
@@ -618,6 +618,7 @@ CXX_FOR_TARGET
 CC_FOR_TARGET
 RANLIB_PLUGIN_OPTION
 AR_PLUGIN_OPTION
+PKG_CONFIG_PATH
 READELF
 OBJDUMP
 OBJCOPY
@@ -10310,6 +10311,7 @@ fi
 
 
 
+
 { $as_echo "$as_me:${as_lineno-$LINENO}: checking for -plugin option" >&5
 $as_echo_n "checking for -plugin option... " >&6; }
 
diff --git a/configure.ac b/configure.ac
index da4e41d72479..5b6e20485143 100644
--- a/configure.ac
+++ b/configure.ac
@@ -3465,6 +3465,7 @@ AC_SUBST(CC)
 AC_SUBST(CXX)
 AC_SUBST(CFLAGS)
 AC_SUBST(CXXFLAGS)
+AC_SUBST(PKG_CONFIG_PATH)
 
 GCC_PLUGIN_OPTION(PLUGIN_OPTION)
 AR_PLUGIN_OPTION=

base-commit: 6aa03e9c1769c8d925f4d23d72af93483bfd31f3
-- 
2.35.1



[PATCH] libgompd: add OMPD support, libgompd initialization and global ICVs functions

2022-03-15 Thread Mohamed Atef via Gcc-patches
This patch added OMPD support to libgomp, api version funcitos and global
ICVs functions.
I hope you review it as soon as possible, to fix the problems.
I tried as much as I could to follow GNU standards.
We have a seminar at the college next week, so we need this to be reviewed.
Thanks


libgomp/ChangeLog

2022-03-15  Mohamed Atef  

*config/darwin/plugin-suffix.h (SONAME_SUFFIX): Remove ()s.
*config/hpux/plugin-suffix.h (SONAME_SUFFIX): Remove ()s.
*config/posix/plugin-suffix.h (SONAME_SUFFIX): Remove ()s.
*configure: Regenerate.
* Makefile.am (toolexeclib_LTLIBRARIES): Add libgompd.la
(libgompd_la_LDFLAGS, libgompd_la_DEPENDENCIES, libgompd_la_LINK,
libgompd_la_SOURCES, libgompd_version_dep, libgompd_version_script,
libgompd.ver-sun, libgompd.ver, libgompd_version_info): New.
*Makefile.in: Regenerate.
*aclocal.m4: Regenerate.
*env.c: Include ompd-support.h.
(initialize_env): Call gompd_load.
*team.c: Include ompd-support.h.
(gomp_team_start): Call ompd_bp_parallel_begin.
(gomp_team_end): Call ompd_bp_parallel_end.
*libgomp.map: Add OMP_5.0.3 symbol versions.
*libgompd.map: New.
*omp-tools.h.in: New.
*ompd-types.h.in: New.
*ompd-support.h: New.
*ompd-support.c: New.
*ompd-helper.h: New.
*ompd-helper.c: New.
*ompd-init.c: New.
*ompd-icv.c: New.
*configure.ac (AC_CONFIG_FILES): Add omp-tools.h and ompd-types.h.


Re: [PATCH] libgompd: add OMPD support, libgompd initialization and global ICVs functions

2022-03-15 Thread Mohamed Atef via Gcc-patches
On Tue, Mar 15, 2022 at 11:32 PM Mohamed Atef 
wrote:

> This patch added OMPD support to libgomp, api version funcitos and global
> ICVs functions.
> I hope you review it as soon as possible, to fix the problems.
> I tried as much as I could to follow GNU standards.
> We have a seminar at the college next week, so we need this to be reviewed.
> Thanks
>
>
> libgomp/ChangeLog
>
> 2022-03-15  Mohamed Atef  
>
> *config/darwin/plugin-suffix.h (SONAME_SUFFIX): Remove ()s.
> *config/hpux/plugin-suffix.h (SONAME_SUFFIX): Remove ()s.
> *config/posix/plugin-suffix.h (SONAME_SUFFIX): Remove ()s.
> *configure: Regenerate.
> * Makefile.am (toolexeclib_LTLIBRARIES): Add libgompd.la
> (libgompd_la_LDFLAGS, libgompd_la_DEPENDENCIES, libgompd_la_LINK,
> libgompd_la_SOURCES, libgompd_version_dep, libgompd_version_script,
> libgompd.ver-sun, libgompd.ver, libgompd_version_info): New.
> *Makefile.in: Regenerate.
> *aclocal.m4: Regenerate.
> *env.c: Include ompd-support.h.
> (initialize_env): Call gompd_load.
> *team.c: Include ompd-support.h.
> (gomp_team_start): Call ompd_bp_parallel_begin.
> (gomp_team_end): Call ompd_bp_parallel_end.
> *libgomp.map: Add OMP_5.0.3 symbol versions.
> *libgompd.map: New.
> *omp-tools.h.in: New.
> *ompd-types.h.in: New.
> *ompd-support.h: New.
> *ompd-support.c: New.
> *ompd-helper.h: New.
> *ompd-helper.c: New.
> *ompd-init.c: New.
> *ompd-icv.c: New.
> *configure.ac (AC_CONFIG_FILES): Add omp-tools.h and ompd-types.h.
>
diff --git a/libgomp/Makefile.am b/libgomp/Makefile.am
index f8b2a06d63e..f530a730ea8 100644
--- a/libgomp/Makefile.am
+++ b/libgomp/Makefile.am
@@ -20,7 +20,7 @@ AM_CPPFLAGS = $(addprefix -I, $(search_path))
 AM_CFLAGS = $(XCFLAGS)
 AM_LDFLAGS = $(XLDFLAGS) $(SECTION_LDFLAGS) $(OPT_LDFLAGS)

-toolexeclib_LTLIBRARIES = libgomp.la
+toolexeclib_LTLIBRARIES = libgomp.la libgompd.la
 nodist_toolexeclib_HEADERS = libgomp.spec

 if LIBGOMP_BUILD_VERSIONED_SHLIB
@@ -32,13 +32,21 @@ libgomp.ver: $(top_srcdir)/libgomp.map
$(EGREP) -v '#(#| |$$)' $< | \
  $(PREPROCESS) -P -include config.h - > $@ || (rm -f $@ ; exit 1)

+libgompd.ver: $(top_srcdir)/libgompd.map
+   $(EGREP) -v '#(#| |$$)' $< | \
+   $(PREPROCESS) -P -include config.h - > $@ || (rm -f $@ ; exit 1)
+
 if LIBGOMP_BUILD_VERSIONED_SHLIB_GNU
 libgomp_version_script = -Wl,--version-script,libgomp.ver
+libgompd_version_script = -Wl,--version-script,libgompd.ver
 libgomp_version_dep = libgomp.ver
+libgompd_version_dep = libgompd.ver
 endif
 if LIBGOMP_BUILD_VERSIONED_SHLIB_SUN
 libgomp_version_script = -Wl,-M,libgomp.ver-sun
+libgompd_version_script = -Wl,-M,libgompd.ver-sun
 libgomp_version_dep = libgomp.ver-sun
+libgompd_version_dep = libgompd.ver-sun
 libgomp.ver-sun : libgomp.ver \
$(top_srcdir)/../contrib/make_sunver.pl \
$(libgomp_la_OBJECTS) $(libgomp_la_LIBADD)
@@ -48,16 +56,34 @@ libgomp.ver-sun : libgomp.ver \
 `echo $(libgomp_la_LIBADD) | \
sed 's,/\([^/.]*\)\.la,/.libs/\1.a,g'` \
 > $@ || (rm -f $@ ; exit 1)
+
+libgompd.ver-sun : libgompd.ver \
+   $(top_srcdir)/../contrib/make_sunver.pl \
+   $(libgompd_la_OBJECTS) $(libgompd_la_LIBADD)
+   perl $(top_srcdir)/../contrib/make_sunver.pl \
+   libgompd.ver \
+   $(libgompd_la_OBJECTS:%.lo=.libs/%.o) \
+   `echo $(libgompd_la_LIBADD) | \
+   sed 's,/\([^/.]*\)\.la,/.libs/\1.a,g'` \
+   > $@ || (rm -f $@ ; exit 1)
+
 endif
 else
 libgomp_version_script =
+libgompd_version_script =
 libgomp_version_dep =
+libgompd_version_dep =
 endif
 libgomp_version_info = -version-info $(libtool_VERSION)
+libgompd_version_info = -version-info $(libtool_VERSION)
 libgomp_la_LDFLAGS = $(libgomp_version_info) $(libgomp_version_script) \
 $(lt_host_flags)
+libgompd_la_LDFLAGS = $(libgompd_version_info) $(libgompd_version_script) \
+   $(lt_host_flags)
 libgomp_la_DEPENDENCIES = $(libgomp_version_dep)
+libgompd_la_DEPENDENCIES = $(libgompd_version_dep)
 libgomp_la_LINK = $(LINK) $(libgomp_la_LDFLAGS)
+libgompd_la_LINK = $(LINK) $(libgompd_la_LDFLAGS)

 libgomp_la_SOURCES = alloc.c atomic.c barrier.c critical.c env.c error.c \
icv.c icv-device.c iter.c iter_ull.c loop.c loop_ull.c ordered.c \
@@ -66,8 +92,9 @@ libgomp_la_SOURCES = alloc.c atomic.c barrier.c critical.c 
env.c error.c \
target.c splay-tree.c libgomp-plugin.c oacc-parallel.c oacc-host.c \
oacc-init.c oacc-mem.c oacc-async.c oacc-plugin.c oacc-cuda.c \
priority_queue.c affinity-fmt.c teams.c allocator.c oacc-profiling.c \
-   oacc-target.c
+   oacc-target.c ompd-support.c

+libgompd_la_SOURCES = ompd-init.c ompd-helper.c ompd-icv.c
 include $(top_srcdir)/plugin/Makefrag.am

 if USE_FORTRAN
@@ -75,7 +102,7 @@ libgomp_la_SOURCES += openacc.f90
 endif

 nodist_noinst_HEADERS = libgomp_f.h
-nodist_libsubinclude_HEADERS = omp.h openacc.h acc_prof.h
+nodist_libsubinclude_

[committed] analyzer: presize m_cluster_map in store copy ctor

2022-03-15 Thread David Malcolm via Gcc-patches
Testing cc1 on pr93032-mztools-unsigned-char.c

Benchmark #1: (without patch)
  Time (mean ± σ): 338.8 ms ±  13.6 ms[User: 323.2 ms, System: 14.2 ms]
  Range (min … max):   326.7 ms … 363.1 ms10 runs

Benchmark #2: (with patch)
  Time (mean ± σ): 332.3 ms ±  12.8 ms[User: 316.6 ms, System: 14.3 ms]
  Range (min … max):   322.5 ms … 357.4 ms10 runs

Summary
  ./cc1.new ran 1.02 ± 0.06 times faster than ./cc1.old

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r12-7658-ga58e342d8869c5.

gcc/analyzer/ChangeLog:
* store.cc (store::store): Presize m_cluster_map.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/store.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/analyzer/store.cc b/gcc/analyzer/store.cc
index 58df7aab8a2..ec11433dffc 100644
--- a/gcc/analyzer/store.cc
+++ b/gcc/analyzer/store.cc
@@ -2032,7 +2032,8 @@ store::store ()
 /* store's copy ctor.  */
 
 store::store (const store &other)
-: m_called_unknown_fn (other.m_called_unknown_fn)
+: m_cluster_map (other.m_cluster_map.elements ()),
+  m_called_unknown_fn (other.m_called_unknown_fn)
 {
   for (cluster_map_t::iterator iter = other.m_cluster_map.begin ();
iter != other.m_cluster_map.end ();
-- 
2.26.3



[committed] analyzer: add test coverage for PR 95000

2022-03-15 Thread David Malcolm via Gcc-patches
PR analyzer/95000 isn't fixed yet; add test coverage with XFAILs.

Successfully regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r12-7659-gd1d95846e3c901.

gcc/testsuite/ChangeLog:
PR analyzer/95000
* gcc.dg/analyzer/pr95000-1.c: New test.

Signed-off-by: David Malcolm 
---
 gcc/testsuite/gcc.dg/analyzer/pr95000-1.c | 38 +++
 1 file changed, 38 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr95000-1.c

diff --git a/gcc/testsuite/gcc.dg/analyzer/pr95000-1.c 
b/gcc/testsuite/gcc.dg/analyzer/pr95000-1.c
new file mode 100644
index 000..bb23ab7488e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/pr95000-1.c
@@ -0,0 +1,38 @@
+#include "analyzer-decls.h"
+
+void test_1 (char* x)
+{
+  char* y=0;
+  switch (*x) {
+  case 'a': /* { dg-message "to here" } */
+y="foo";
+  case 'b':
+if (*x=='a') *y='b'; /* { dg-bogus "dereference of NULL 'y'" "deref of 
null (PR analyzer/95000)" { xfail *-*-* } } */
+/* { dg-warning "write to string literal" "write to string literal" { 
target *-*-* } .-1 } */
+  }
+}
+
+void test_switch_char(char x) {
+  switch (x) {
+  case 'b':
+__analyzer_eval (x == 'b'); /* { dg-warning "TRUE" "expected" { xfail 
*-*-* } } */
+/* { dg-bogus "UNKNOWN" "status quo (PR analyzer/95000)" { xfail *-*-* } 
.-1 } */
+  }
+}
+
+void test_switch_int(int x) {
+  switch (x) {
+  case 97:
+__analyzer_eval (x == 97); /* { dg-warning "TRUE" } */
+  }
+}
+
+void test_if_char(char x) {
+  if (x == 'b')
+__analyzer_eval (x == 'b'); /* { dg-warning "TRUE" } */
+}
+
+void test_if_int(int x) {
+  if (x == 97)
+__analyzer_eval (x == 97); /* { dg-warning "TRUE" } */
+}
-- 
2.26.3



Re: [PATCH] rs6000: Improve .machine

2022-03-15 Thread Segher Boessenkool
Hi!

On Tue, Mar 15, 2022 at 03:29:23PM +0100, Sebastian Huber wrote:
> now that the PR104829 is fixed could I back port
> 
> Segher Boessenkool (2):
>   rs6000: Improve .machine
>   rs6000: Do not use rs6000_cpu for .machine ppc and ppc64 (PR104829)
> 
> to GCC 10 and 11?

I will do it, in a few days though.

Thanks for your enthusiasm :-),


Segher


Re: [PATCH v2] middle-end/104854: Limit strncmp overread warnings

2022-03-15 Thread Siddhesh Poyarekar

On 16/03/2022 02:06, Martin Sebor wrote:

The intended use of the strncmp bound is to limit the comparison to
at most the size of the arrays or (in a subset of cases) the length
of an initial substring. Providing an arbitrary bound that's not
related to the sizes as you describe sounds very much like a misuse.


Nothing in the standard says that the bound is related to the sizes of 
input buffers.  I don't think deducing that intent makes sense either, 
nor concluding that any other use case is misuse.



As a historical note, strncmp was first introduced in UNIX v7 where
its purpose, alongside strncpy, was to manipulate (potentially)
unterminated character arrays like file names stored in fixed size
arrays (typically 14 bytes).  Strncpy would fill the buffers with
ASCII data up to their size and pad the rest with nuls only if there
was room.

Strncmp was then used to compare these potentially unterminated
character arrays (e.g., archive headers in ld and ranlib).  The bound
was the size of the fixed size array.  Its other use case was to compare
leading portions of strings (e.g, when looking for an environment
variable or when stripping "./" from path names).


Thanks for sharing the historical perspective.


Since the early UNIX days, both strncpy and to a lesser extent strncmp
have been widely misused and, along with many other functions in
, a frequent source of bugs due to common misunderstanding
of their intended purpose.  The aim of these warnings is to detect
the common (and sometimes less common) misuses and bugs.


They're all valid uses however since they do not violate the standard. 
If we find at compile time that the strings don't terminate at the 
bounds, emitting the warning is OK but the more pessimistic check seems 
like overkill.



I haven't seen these so I can't very well comment on them.  But I can
assure you that warning for the code above is intentional.  Whether
or not the arrays are nul-terminated, the expected way to call
the function is with a bound no greater than their size (some coding
guidelines are explicit about this; see for example the CERT C Secure
Coding standard rule ARR38-C).

(Granted, the manual makes it sound like -Wstringop-overread only
detects provable past-the-end reads.  That's a mistake in
the documentation that should be fixed.  The warning was never quite
so limited, nor was it intended to be.)


The contention is not that it's not provable, it's more that it's 
doesn't even pass the "based on available information this is definitely 
buggy" assertion, making it more a strong suggestion than a warning that 
something is definitely amiss.  Which is why IMO it is more suitable as 
an analyzer check than a warning.


Thanks,
Siddhesh


Re: [PATCH v2] x86: Also check _SOFT_FLOAT in

2022-03-15 Thread Hongtao Liu via Gcc-patches
On Tue, Mar 15, 2022 at 10:40 PM H.J. Lu  wrote:
>
> On Mon, Mar 14, 2022 at 7:31 AM H.J. Lu  wrote:
> >
> > Push target("general-regs-only") in  if x87 is enabled.
> >
> > gcc/
> >
> > PR target/104890
> > * config/i386/x86gprintrin.h: Also check _SOFT_FLOAT before
> > pushing target("general-regs-only").
> >
> > gcc/testsuite/
> >
> > PR target/104890
> > * gcc.target/i386/pr104890.c: New test.
> > ---
> >  gcc/config/i386/x86gprintrin.h   |  2 +-
> >  gcc/testsuite/gcc.target/i386/pr104890.c | 11 +++
> >  2 files changed, 12 insertions(+), 1 deletion(-)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr104890.c
> >
> > diff --git a/gcc/config/i386/x86gprintrin.h b/gcc/config/i386/x86gprintrin.h
> > index 017ec299793..e0be01d5e78 100644
> > --- a/gcc/config/i386/x86gprintrin.h
> > +++ b/gcc/config/i386/x86gprintrin.h
> > @@ -24,7 +24,7 @@
> >  #ifndef _X86GPRINTRIN_H_INCLUDED
> >  #define _X86GPRINTRIN_H_INCLUDED
> >
> > -#if defined __MMX__ || defined __SSE__
> > +#if !defined _SOFT_FLOAT || defined __MMX__ || defined __SSE__
The patch LGTM.
> >  #pragma GCC push_options
> >  #pragma GCC target("general-regs-only")
> >  #define __DISABLE_GENERAL_REGS_ONLY__
> > diff --git a/gcc/testsuite/gcc.target/i386/pr104890.c 
> > b/gcc/testsuite/gcc.target/i386/pr104890.c
> > new file mode 100644
> > index 000..cb430eef688
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/pr104890.c
> > @@ -0,0 +1,11 @@
> > +/* { dg-do compile { target ia32 } } */
> > +/* { dg-options "-O2 -mshstk -march=i686" } */
> > +
> > +#include 
> > +
> > +__attribute__((target ("general-regs-only")))
> > +int
> > +foo ()
> > +{
> > +  return _get_ssp ();
> > +}
> > --
> > 2.35.1
> >
>
> It also fixed:
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99744#c18
>
> Any comments on this patch?
>
> --
> H.J.



-- 
BR,
Hongtao


[PATCH] [i386] Add extra cost for unsigned_load which may have stall forward issue.

2022-03-15 Thread liuhongt via Gcc-patches
This patch only handle pure-slp for by-value passed parameter which
has nothing to do with IPA but psABI. For by-reference passed
parameter IPA is required.

The patch is aggressive in determining STLF failure, any
unaligned_load for parm_decl passed by stack is thought to have STLF
stall issue. It could lose some perf where there's no such issue(1
vector_load vs n scalar_load + CTOR).

According to microbenchmark in PR, cost of STLF failure is generally
between 8 scalar_loads and 16 scalar loads on most latest Intel/AMD
processors.

gcc/ChangeLog:

PR target/101908
* config/i386/i386.cc (ix86_load_maybe_stfs_p): New.
(ix86_vector_costs::add_stmt_cost): Add extra cost for
unsigned_load which may have store forwarding stall issue.
* config/i386/i386.h (processor_costs): Add new member
stfs.
* config/i386/x86-tune-costs.h (i386_size_cost): Initialize
stfs.
(i386_cost, i486_cost, pentium_cost, lakemont_cost,
pentiumpro_cost, geode_cost, k6_cost, athlon_cost, k8_cost,
amdfam10_cost, bdver_cost, znver1_cost, znver2_cost,
znver3_cost, skylake_cost, icelake_cost, alderlake_cost,
btver1_cost, btver2_cost, pentium4_cost, nocano_cost,
atom_cost, slm_cost, tremont_cost, intel_cost, generic_cost,
core_cost): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr101908-1.c: New test.
* gcc.target/i386/pr101908-2.c: New test.
* gcc.target/i386/pr101908-3.c: New test.
* gcc.target/i386/pr101908-v16hi.c: New test.
* gcc.target/i386/pr101908-v16qi.c: New test.
* gcc.target/i386/pr101908-v16sf.c: New test.
* gcc.target/i386/pr101908-v16si.c: New test.
* gcc.target/i386/pr101908-v2df.c: New test.
* gcc.target/i386/pr101908-v2di.c: New test.
* gcc.target/i386/pr101908-v2hi.c: New test.
* gcc.target/i386/pr101908-v2qi.c: New test.
* gcc.target/i386/pr101908-v2sf.c: New test.
* gcc.target/i386/pr101908-v2si.c: New test.
* gcc.target/i386/pr101908-v4df.c: New test.
* gcc.target/i386/pr101908-v4di.c: New test.
* gcc.target/i386/pr101908-v4hi.c: New test.
* gcc.target/i386/pr101908-v4qi.c: New test.
* gcc.target/i386/pr101908-v4sf.c: New test.
* gcc.target/i386/pr101908-v4si.c: New test.
* gcc.target/i386/pr101908-v8df-adl.c: New test.
* gcc.target/i386/pr101908-v8df.c: New test.
* gcc.target/i386/pr101908-v8di-adl.c: New test.
* gcc.target/i386/pr101908-v8di.c: New test.
* gcc.target/i386/pr101908-v8hi-adl.c: New test.
* gcc.target/i386/pr101908-v8hi.c: New test.
* gcc.target/i386/pr101908-v8qi-adl.c: New test.
* gcc.target/i386/pr101908-v8qi.c: New test.
* gcc.target/i386/pr101908-v8sf-adl.c: New test.
* gcc.target/i386/pr101908-v8sf.c: New test.
* gcc.target/i386/pr101908-v8si-adl.c: New test.
* gcc.target/i386/pr101908-v8si.c: New test.
---
 gcc/config/i386/i386.cc   | 51 +++
 gcc/config/i386/i386.h|  1 +
 gcc/config/i386/x86-tune-costs.h  | 28 ++
 gcc/testsuite/gcc.target/i386/pr101908-1.c| 12 +++
 gcc/testsuite/gcc.target/i386/pr101908-2.c| 12 +++
 gcc/testsuite/gcc.target/i386/pr101908-3.c| 90 +++
 .../gcc.target/i386/pr101908-v16hi.c  |  6 ++
 .../gcc.target/i386/pr101908-v16qi.c  | 30 +++
 .../gcc.target/i386/pr101908-v16sf.c  |  6 ++
 .../gcc.target/i386/pr101908-v16si.c  |  6 ++
 gcc/testsuite/gcc.target/i386/pr101908-v2df.c |  6 ++
 gcc/testsuite/gcc.target/i386/pr101908-v2di.c |  7 ++
 gcc/testsuite/gcc.target/i386/pr101908-v2hi.c |  6 ++
 gcc/testsuite/gcc.target/i386/pr101908-v2qi.c | 16 
 gcc/testsuite/gcc.target/i386/pr101908-v2sf.c |  6 ++
 gcc/testsuite/gcc.target/i386/pr101908-v2si.c |  6 ++
 gcc/testsuite/gcc.target/i386/pr101908-v4df.c |  6 ++
 gcc/testsuite/gcc.target/i386/pr101908-v4di.c |  7 ++
 gcc/testsuite/gcc.target/i386/pr101908-v4hi.c |  6 ++
 gcc/testsuite/gcc.target/i386/pr101908-v4qi.c | 18 
 gcc/testsuite/gcc.target/i386/pr101908-v4sf.c |  6 ++
 gcc/testsuite/gcc.target/i386/pr101908-v4si.c |  6 ++
 .../gcc.target/i386/pr101908-v8df-adl.c   |  6 ++
 gcc/testsuite/gcc.target/i386/pr101908-v8df.c |  6 ++
 .../gcc.target/i386/pr101908-v8di-adl.c   |  7 ++
 gcc/testsuite/gcc.target/i386/pr101908-v8di.c |  7 ++
 .../gcc.target/i386/pr101908-v8hi-adl.c   |  6 ++
 gcc/testsuite/gcc.target/i386/pr101908-v8hi.c |  6 ++
 .../gcc.target/i386/pr101908-v8qi-adl.c   | 22 +
 gcc/testsuite/gcc.target/i386/pr101908-v8qi.c | 22 +
 .../gcc.target/i386/pr101908-v8sf-adl.c   |  6 ++
 gcc/testsuite/gcc.target/i386/pr101908-v8sf.c |  6 ++
 .../gcc.target/i386/pr101908-v8si-adl.c   |  6 ++
 gcc/testsuite/gcc.target/i386/pr101908-v8si.c |  6 ++
 34 files changed, 444 insertions(+)
 create mo

Re: [x86 PATCH] PR target/94680: Clear upper bits of V2DF using movq (like V2DI).

2022-03-15 Thread Hongtao Liu via Gcc-patches
On Tue, Mar 15, 2022 at 10:52 PM Roger Sayle  wrote:
>
>
> This simple i386 patch unblocks a more significant change.  The testcase
> gcc.target/i386/sse2-pr94680.c isn't quite testing what's intended, and
> alas the fix for PR target/94680 doesn't (yet) handle V2DF mode.
>
> For the first test from sse2-pr94680.c, below
>
> v2df foo_v2df (v2df x) {
>   return __builtin_shuffle (x, (v2df) { 0, 0 }, (v2di) { 0, 2 });
> }
>
> GCC on x86_64-pc-linux-gnu with -O2 currently generates:
>
> movhpd  .LC0(%rip), %xmm0
> ret
> .LC0:
> .long   0
> .long   0
>
> which passes the test as it contains a mov insn and no xor.
> Alas reading a zero from the constant pool isn't quite the
> desired implementation.  With this patch we now generate:
>
> movq%xmm0, %xmm0
> ret
>
> The same code as we generate for V2DI, and add a stricter
> test case.  My first attempt tried using VI8F_128 to generalize
> the existing sse2_movq128 define_insn to both V2DI and V2DF.
> Alas, CODE_FOR_sse2_movq128 is exposed as a builtin in
> i386-builtin.def, requiring some internal name changes, that
You can turn sse2_movq128 into a expander to avoid builtin-related
change, then use VI8F_128 in the new define_insn.
With that change, patch LGTM.
> ultimately the testsuite was unhappy with.  The simpler solution
> (that works) is to clone/specialize a new V2DF *sse2_movq128_2.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check with no new failures.  Ok for mainline?
>
>
> 2022-03-15  Roger Sayle  
>
> gcc/ChangeLog
> PR target/94680
> * config/i386/sse.md (*sse2_movq128_2): A version of sse2_movq128
> for V2DF mode.
>
> gcc/testsuite/ChangeLog
> PR target/94680
> * gcc.target/i386/sse2-pr94680-2.c: New stricter V2DF test case.
>
>
> Thanks in advance,
> Roger
> --
>


-- 
BR,
Hongtao


Re: [PATCH] libgompd: add OMPD support, libgompd initialization and global ICVs functions

2022-03-15 Thread Mohamed Atef via Gcc-patches
Hi,
   we found some typos in the ChangeLog and some wrong spaces (nightmare)
in the files.
So here's the best we can do.
and please don't be disappointed and trust us we're doing our best.
I hope you could review it by Sunday night.

Thanks.


libgomp/ChangeLog

2022-03-15  Mohamed Atef  

*config/darwin/plugin-suffix.h (SONAME_SUFFIX): Remove ()s.
*config/hpux/plugin-suffix.h (SONAME_SUFFIX): Remove ()s.
*config/posix/plugin-suffix.h (SONAME_SUFFIX): Remove ()s.
*configure: Regenerate.
* Makefile.am (toolexeclib_LTLIBRARIES): Add libgompd.la.

(libgompd_la_LDFLAGS, libgompd_la_DEPENDENCIES, libgompd_la_LINK,
libgompd_la_SOURCES, libgompd_version_dep, libgompd_version_script,
libgompd.ver-sun, libgompd.ver, libgompd_version_info): New.
*Makefile.in: Regenerate.
*aclocal.m4: Regenerate.
*env.c: Include ompd-support.h.
(initialize_env): Call gompd_load.
*team.c: Include ompd-support.h.
(gomp_team_start): Call ompd_bp_parallel_begin.
(gomp_team_end): Call ompd_bp_parallel_end.
*libgomp.map: Add OMP_5.0.3 symbol versions.
*libgompd.map: New.
*omp-tools.h.in: New.
*ompd-types.h.in: New.
*ompd-support.h: New.
*ompd-support.c: New.
*ompd-helper.h: New.
*ompd-helper.c: New.
*ompd-init.c: New.
*ompd-icv.c: New.
*configure.ac (AC_CONFIG_FILES): Add omp-tools.h and ompd-types.h.


On Tue, Mar 15, 2022 at 11:32 PM Mohamed Atef 
wrote:

>
>
> On Tue, Mar 15, 2022 at 11:32 PM Mohamed Atef 
> wrote:
>
>> This patch added OMPD support to libgomp, api version funcitos and global
>> ICVs functions.
>> I hope you review it as soon as possible, to fix the problems.
>> I tried as much as I could to follow GNU standards.
>> We have a seminar at the college next week, so we need this to be
>> reviewed.
>> Thanks
>>
>>
>> libgomp/ChangeLog
>>
>> 2022-03-15  Mohamed Atef  
>>
>> *config/darwin/plugin-suffix.h (SONAME_SUFFIX): Remove ()s.
>> *config/hpux/plugin-suffix.h (SONAME_SUFFIX): Remove ()s.
>> *config/posix/plugin-suffix.h (SONAME_SUFFIX): Remove ()s.
>> *configure: Regenerate.
>> * Makefile.am (toolexeclib_LTLIBRARIES): Add libgompd.la
>> (libgompd_la_LDFLAGS, libgompd_la_DEPENDENCIES, libgompd_la_LINK,
>> libgompd_la_SOURCES, libgompd_version_dep, libgompd_version_script,
>> libgompd.ver-sun, libgompd.ver, libgompd_version_info): New.
>> *Makefile.in: Regenerate.
>> *aclocal.m4: Regenerate.
>> *env.c: Include ompd-support.h.
>> (initialize_env): Call gompd_load.
>> *team.c: Include ompd-support.h.
>> (gomp_team_start): Call ompd_bp_parallel_begin.
>> (gomp_team_end): Call ompd_bp_parallel_end.
>> *libgomp.map: Add OMP_5.0.3 symbol versions.
>> *libgompd.map: New.
>> *omp-tools.h.in: New.
>> *ompd-types.h.in: New.
>> *ompd-support.h: New.
>> *ompd-support.c: New.
>> *ompd-helper.h: New.
>> *ompd-helper.c: New.
>> *ompd-init.c: New.
>> *ompd-icv.c: New.
>> *configure.ac (AC_CONFIG_FILES): Add omp-tools.h and ompd-types.h.
>>
>
diff --git a/libgomp/Makefile.am b/libgomp/Makefile.am
index f8b2a06d63e..f530a730ea8 100644
--- a/libgomp/Makefile.am
+++ b/libgomp/Makefile.am
@@ -20,7 +20,7 @@ AM_CPPFLAGS = $(addprefix -I, $(search_path))
 AM_CFLAGS = $(XCFLAGS)
 AM_LDFLAGS = $(XLDFLAGS) $(SECTION_LDFLAGS) $(OPT_LDFLAGS)

-toolexeclib_LTLIBRARIES = libgomp.la
+toolexeclib_LTLIBRARIES = libgomp.la libgompd.la
 nodist_toolexeclib_HEADERS = libgomp.spec

 if LIBGOMP_BUILD_VERSIONED_SHLIB
@@ -32,13 +32,21 @@ libgomp.ver: $(top_srcdir)/libgomp.map
$(EGREP) -v '#(#| |$$)' $< | \
  $(PREPROCESS) -P -include config.h - > $@ || (rm -f $@ ; exit 1)

+libgompd.ver: $(top_srcdir)/libgompd.map
+   $(EGREP) -v '#(#| |$$)' $< | \
+   $(PREPROCESS) -P -include config.h - > $@ || (rm -f $@ ; exit 1)
+
 if LIBGOMP_BUILD_VERSIONED_SHLIB_GNU
 libgomp_version_script = -Wl,--version-script,libgomp.ver
+libgompd_version_script = -Wl,--version-script,libgompd.ver
 libgomp_version_dep = libgomp.ver
+libgompd_version_dep = libgompd.ver
 endif
 if LIBGOMP_BUILD_VERSIONED_SHLIB_SUN
 libgomp_version_script = -Wl,-M,libgomp.ver-sun
+libgompd_version_script = -Wl,-M,libgompd.ver-sun
 libgomp_version_dep = libgomp.ver-sun
+libgompd_version_dep = libgompd.ver-sun
 libgomp.ver-sun : libgomp.ver \
$(top_srcdir)/../contrib/make_sunver.pl \
$(libgomp_la_OBJECTS) $(libgomp_la_LIBADD)
@@ -48,16 +56,34 @@ libgomp.ver-sun : libgomp.ver \
 `echo $(libgomp_la_LIBADD) | \
sed 's,/\([^/.]*\)\.la,/.libs/\1.a,g'` \
 > $@ || (rm -f $@ ; exit 1)
+
+libgompd.ver-sun : libgompd.ver \
+   $(top_srcdir)/../contrib/make_sunver.pl \
+   $(libgompd_la_OBJECTS) $(libgompd_la_LIBADD)
+   perl $(top_srcdir)/../contrib/make_sunver.pl \
+   libgompd.ver \
+   $(libgompd_la_OBJECTS:%.lo=.libs/%.o) \
+   `echo $(libgompd_la_LIBADD) | \
+   sed 's,/\([^/.]*\)\.la,/.libs/\1.a,g'` \
+   > $@ || (rm -f $@ ; exit 1)
+
 endif
 else
 libgomp_version_script =
+libgompd_version_script =
 libgomp_

[committed] MAINTAINERS: Add myself to DCO section

2022-03-15 Thread Chung-Ju Wu via Gcc-patches

I would like to add myself to DCO section for some contributions.


commit  088a51a0abb5497cac32055bf373fa6039b924f8
Author: Chung-Ju Wu 
Date:   Wed, 16 Mar 2022 03:20:00 +

 MAINTAINERS: Add myself to DCO section

 ChangeLog:

 * MAINTAINERS: Add myself to DCO section.


Committed to trunk.

---
  MAINTAINERS | 1 +
  1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index ccb79f5d2f4..54d55f01e99 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -725,3 +725,4 @@ Edward Smith-Rowland

  Petter Tomner 
  Martin Uecker 
  Jonathan Wakely   
+Chung-Ju Wu