[PATCH] i386: Correct target attribute for crc32 intrinsics

2022-04-14 Thread Hongyu Wang via Gcc-patches
Hi,

Complile _mm_crc32_u8/16/32/64 intrinsics with -mcrc32
would meet target specific option mismatch. Correct target pragma
to fix.

Bootstrapped/regtest on x86_64-pc-linux-gnu{-m32,}.

Ok for master and backport to GCC 11?

gcc/ChangeLog:

* config/i386/smmintrin.h: Correct target pragma from sse4.1
and sse4.2 to crc32 for crc32 intrinsics.

gcc/testsuite/ChangeLog:

* gcc.target/i386/crc32-6.c: Adjust to call builtin.
* gcc.target/i386/crc32-7.c: New test.
---
 gcc/config/i386/smmintrin.h | 25 +-
 gcc/testsuite/gcc.target/i386/crc32-6.c |  2 +-
 gcc/testsuite/gcc.target/i386/crc32-7.c | 34 +
 3 files changed, 42 insertions(+), 19 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/crc32-7.c

diff --git a/gcc/config/i386/smmintrin.h b/gcc/config/i386/smmintrin.h
index b42b212300f..eb6a451c10a 100644
--- a/gcc/config/i386/smmintrin.h
+++ b/gcc/config/i386/smmintrin.h
@@ -810,17 +810,11 @@ _mm_cmpgt_epi64 (__m128i __X, __m128i __Y)
 
 #include 
 
-#ifndef __SSE4_1__
+#ifndef __CRC32__
 #pragma GCC push_options
-#pragma GCC target("sse4.1")
-#define __DISABLE_SSE4_1__
-#endif /* __SSE4_1__ */
-
-#ifndef __SSE4_2__
-#pragma GCC push_options
-#pragma GCC target("sse4.2")
-#define __DISABLE_SSE4_2__
-#endif /* __SSE4_1__ */
+#pragma GCC target("crc32")
+#define __DISABLE_CRC32__
+#endif /* __CRC32__ */
 
 /* Accumulate CRC32 (polynomial 0x11EDC6F41) value.  */
 extern __inline unsigned int __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
@@ -849,14 +843,9 @@ _mm_crc32_u64 (unsigned long long __C, unsigned long long 
__V)
 }
 #endif
 
-#ifdef __DISABLE_SSE4_2__
-#undef __DISABLE_SSE4_2__
+#ifdef __DISABLE_CRC32__
+#undef __DISABLE_CRC32__
 #pragma GCC pop_options
-#endif /* __DISABLE_SSE4_2__ */
-
-#ifdef __DISABLE_SSE4_1__
-#undef __DISABLE_SSE4_1__
-#pragma GCC pop_options
-#endif /* __DISABLE_SSE4_1__ */
+#endif /* __DISABLE_CRC32__ */
 
 #endif /* _SMMINTRIN_H_INCLUDED */
diff --git a/gcc/testsuite/gcc.target/i386/crc32-6.c 
b/gcc/testsuite/gcc.target/i386/crc32-6.c
index 464e3444069..1f306534bb8 100644
--- a/gcc/testsuite/gcc.target/i386/crc32-6.c
+++ b/gcc/testsuite/gcc.target/i386/crc32-6.c
@@ -7,7 +7,7 @@
 unsigned int
 test_mm_crc32_u8 (unsigned int CRC, unsigned char V)
 {
-  return _mm_crc32_u8 (CRC, V);
+  return __builtin_ia32_crc32qi (CRC, V);
 }
 
 /* { dg-error "needs isa option -mcrc32" "" { target *-*-* } 0  } */
diff --git a/gcc/testsuite/gcc.target/i386/crc32-7.c 
b/gcc/testsuite/gcc.target/i386/crc32-7.c
new file mode 100644
index 000..2e310e38b82
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/crc32-7.c
@@ -0,0 +1,34 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mcrc32" } */
+/* { dg-final { scan-assembler "crc32b\[^\\n\]*eax" } } */
+/* { dg-final { scan-assembler "crc32w\[^\\n\]*eax" } } */
+/* { dg-final { scan-assembler "crc32l\[^\\n\]*eax" } } */
+/* { dg-final { scan-assembler "crc32q\[^\\n\]*rax" { target { ! ia32 } } } } 
*/
+
+#include 
+
+unsigned int
+test_mm_crc32_u8 (unsigned int CRC, unsigned char V)
+{
+  return _mm_crc32_u8 (CRC, V);
+}
+
+unsigned int
+test_mm_crc32_u16 (unsigned int CRC, unsigned short V)
+{
+  return _mm_crc32_u16 (CRC, V);
+}
+
+unsigned int
+test_mm_crc32_u32 (unsigned int CRC, unsigned int V)
+{
+  return _mm_crc32_u32 (CRC, V);
+}
+
+#ifdef __x86_64__
+unsigned long long
+test_mm_crc32_u64 (unsigned long long CRC, unsigned long long V)
+{
+  return _mm_crc32_u64 (CRC, V);
+}
+#endif
-- 
2.18.1



Re: [PATCH v2] c, c++: attribute format on a ctor with a vbase [PR101833, PR47634]

2022-04-14 Thread Jason Merrill via Gcc-patches

On 4/13/22 19:17, Marek Polacek wrote:

On Tue, Apr 12, 2022 at 04:01:14PM -0400, Jason Merrill wrote:

On 4/12/22 14:38, Marek Polacek wrote:

On Mon, Apr 11, 2022 at 04:39:22PM -0400, Jason Merrill wrote:

On 4/8/22 15:21, Marek Polacek wrote:

On Wed, Apr 06, 2022 at 04:55:54PM -0400, Jason Merrill wrote:

On 4/1/22 15:14, Marek Polacek wrote:

Attribute format takes three arguments: archetype, string-index, and
first-to-check.  The last two specify the position in the function
parameter list.  r63030 clarified that "Since non-static C++ methods have
an implicit this argument, the arguments of such methods should be counted
from two, not one, when giving values for string-index and first-to-check."
Therefore one has to write

  struct D {
D(const char *, ...) __attribute__((format(printf, 2, 3)));
  };

However -- and this is the problem in this PR -- ctors with virtual
bases also get two additional parameters: the in-charge parameter and
the VTT parameter (added in maybe_retrofit_in_chrg).  In fact we'll end up
with two clones of the ctor: an in-charge and a not-in-charge version (see
build_cdtor_clones).  That means that the argument position the user
specified in the attribute argument will refer to different arguments,
depending on which constructor we're currently dealing with.  This can
cause a range of problems: wrong errors, confusing warnings, or crashes.

This patch corrects that; for C we don't have to do anything, and in C++
we can use num_artificial_parms_for.  It would be wrong to rewrite the
attributes the user supplied, so I've added an extra parameter called
adjust_pos.

Attribute format_arg is not affected, because it requires that the
function returns "const char *" which will never be the case for cdtors.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

PR c++/101833
PR c++/47634

gcc/c-family/ChangeLog:

* c-attribs.cc (positional_argument): Add new argument adjust_pos,
use it.
* c-common.cc (check_function_arguments): Pass fndecl to
check_function_format.
* c-common.h (check_function_format): Adjust declaration.
(maybe_adjust_arg_pos_for_attribute): Add.
(positional_argument): Adjust declaration.
* c-format.cc (decode_format_attr): Add fndecl argument.  Pass it to
maybe_adjust_arg_pos_for_attribute.  Adjust calls to get_constant.


I wonder about, instead of adding another parameter, allowing the current
fntype parameter to be the fndecl when we have one.

And then that gets passed down into positional_argument, so we can call
maybe_adjust_arg_pos_for_attribute there, and adjust the return value
appropriately so we don't need the extra adjustment in get_constant?


Unfortunately I can't do that.  positional_argument can't return the
adjusted position, because get_constant returns it and in decode_format_attr
it's used to rewrite the arguments in the attribute list:

 tree *format_num_expr = _VALUE (TREE_CHAIN (args));
 tree *first_arg_num_expr = _VALUE (TREE_CHAIN (TREE_CHAIN (args)));
 ...
   if (tree val = get_constant (fntype, atname, *format_num_expr,
  2, >format_num, 0, validated_p,
  adjust_pos))
   *format_num_expr = val;


Could we not do that?  Currently isn't it just overwriting the value with
the same value after default_conversion?


I think it is.


Maybe do that conversion directly in decode_format_attr instead?


I'm afraid I can't move the default_conversion call out of positional_argument
because positional_argument is called from a lot of handle_*_attribute
functions, and each of those would have to call default_conversion, which
we don't want to do.  (Failure to call default_conversion would break e.g.
g++.dg/cpp0x/constexpr-attribute2.C.)


Maybe pass in the pointer where we want to store the converted value?


... or use pass-by-reference so that I don't have to adjust the call sites
of positional_argument, since we never pass a NULL_TREE.  That makes the
patch a bit smaller.

Now maybe_adjust_arg_pos_for_attribute is only called in positional_argument
and the adjustment is done only there.  It's still somewhat messier than I
hoped so I'm happy to defer to GCC 13.  Thanks,

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/11.3?

-- >8 --
Attribute format takes three arguments: archetype, string-index, and
first-to-check.  The last two specify the position in the function
parameter list.  r63030 clarified that "Since non-static C++ methods have
an implicit this argument, the arguments of such methods should be counted
from two, not one, when giving values for string-index and first-to-check."
Therefore one has to write

   struct D {
 D(const char *, ...) __attribute__((format(printf, 2, 3)));
   };

However -- and this is the problem in this PR -- ctors with virtual
bases also get two additional parameters: the in-charge parameter and
the VTT 

Re: [PATCH] c++: wrong error with variadic concept [PR105268]

2022-04-14 Thread Jason Merrill via Gcc-patches

On 4/14/22 16:24, Marek Polacek wrote:

Here we issue a wrong error for the

   template>> void g();

line in the testcase.  I surmise that's because we mistakenly parse
C_many as a placeholder-type-specifier, and things go wrong from
there.  We are in a default argument so we should reject parsing C_many
as a placeholder-type-specifier, which would mean creating a new parameter.
We want C_many to be a concept-id instead.

It's interesting to see why the same problem didn't occur for C_one.
In that case, cp_parser_placeholder_type_specifier -> finish_type_constraints
-> build_type_constraint -> build_concept_check -> build_standard_check ->
coerce_template_parms fails the parse here:

  8916   nargs = inner_args ? NUM_TMPL_ARGS (inner_args) : 0;
  8917   if ((nargs - variadic_args_p > nparms && !variadic_p)
  8918   || (nargs < nparms - variadic_p
  8919   && require_all_args
  8920   && !variadic_args_p
  8921   && (!use_default_args
  8922   || (TREE_VEC_ELT (parms, nargs) != error_mark_node
  8923   && !TREE_PURPOSE (TREE_VEC_ELT (parms, nargs))
  8924 {
  8925 bad_nargs:
  ...
  8943   return error_mark_node;

because nargs is 2 (the targs are ) while nparms is
1 (for the one 'typename' in the tparam list of C_one).  But for
C_many variadic_p is true so we don't return error_mark_node but
.

This patch does not issue any error for the !tentative case because
I didn't figure out how to trigger that.


Then I'd add an assert that tentative is true.  OK with that change.


Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

PR c++/105268

gcc/cp/ChangeLog:

* parser.cc (cp_parser_placeholder_type_specifier): Return
error_mark_node when trying to build up a constrained parameter in
a default argument.

gcc/testsuite/ChangeLog:

* g++.dg/concepts/variadic6.C: New test.
---
  gcc/cp/parser.cc  |  7 ++-
  gcc/testsuite/g++.dg/concepts/variadic6.C | 20 
  2 files changed, 26 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/concepts/variadic6.C

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index bfd16e1ef62..dfb613168b6 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -20041,7 +20041,12 @@ cp_parser_placeholder_type_specifier (cp_parser 
*parser, location_t loc,
/* In a template parameter list, a type-parameter can be introduced
   by type-constraints alone.  */
if (processing_template_parmlist && !placeholder)
-return build_constrained_parameter (con, proto, args);
+{
+  /* In a default argument we may not be creating new parameters.  */
+  if (parser->local_variables_forbidden_p & LOCAL_VARS_FORBIDDEN)
+   return error_mark_node;
+  return build_constrained_parameter (con, proto, args);
+}
  
/* Diagnose issues placeholder issues.  */

if (!flag_concepts_ts
diff --git a/gcc/testsuite/g++.dg/concepts/variadic6.C 
b/gcc/testsuite/g++.dg/concepts/variadic6.C
new file mode 100644
index 000..0b386b0cd6d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/concepts/variadic6.C
@@ -0,0 +1,20 @@
+// PR c++/105268
+// { dg-do compile { target concepts } }
+
+template concept C_one = true;
+template concept C_many = true;
+
+template struct S { };
+
+template>> void f();
+template>> void g();
+
+void
+fn (auto a = S>{})
+{
+}
+
+void
+fn2 (auto a = S>{})
+{
+}

base-commit: 82536fbb8a7d150b829650378e0ba07dad5c8fb8




[pushed] c++: unsigned int32_t enum promotion [PR102804]

2022-04-14 Thread Jason Merrill via Gcc-patches
There's been an extension for a long time to allow applying 'unsigned' to an
int typedef, but that was confusing the integer promotion code.  Fixed by
forgetting about the typedef in that case.

I'm going to make this an unconditional pedwarn in stage 1.

Tested x86_64-pc-linux-gnu, applying to trunk.

PR c++/102804

gcc/cp/ChangeLog:

* decl.cc (grokdeclarator): Drop typedef used with 'unsigned'.

gcc/testsuite/ChangeLog:

* g++.dg/ext/unsigned-typedef1.C: New test.
---
 gcc/cp/decl.cc   | 2 ++
 gcc/testsuite/g++.dg/ext/unsigned-typedef1.C | 9 +
 2 files changed, 11 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/ext/unsigned-typedef1.C

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index d51fd75b003..2852093d624 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -12243,6 +12243,8 @@ grokdeclarator (const cp_declarator *declarator,
  pedwarn (loc, OPT_Wpedantic, "%qs specified with %qT",
   key, type);
  ok = !flag_pedantic_errors;
+ type = DECL_ORIGINAL_TYPE (typedef_decl);
+ typedef_decl = NULL_TREE;
}
  else if (declspecs->decltype_p)
error_at (loc, "%qs specified with %", key);
diff --git a/gcc/testsuite/g++.dg/ext/unsigned-typedef1.C 
b/gcc/testsuite/g++.dg/ext/unsigned-typedef1.C
new file mode 100644
index 000..360b5f81edf
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/unsigned-typedef1.C
@@ -0,0 +1,9 @@
+// PR c++/102804
+// { dg-do compile { target c++11 } }
+// { dg-options "-Wpedantic" }
+
+using int32_t = int;
+enum: unsigned int32_t { foo };// { dg-warning "int32_t" }
+int f(int) = delete;
+int f(unsigned);
+auto x = f(1 ? foo : 1);

base-commit: 6364a39907bd68624a30df0c8e380c40d2a646c4
-- 
2.27.0



[pushed] c++: using in diagnostics [PR102987]

2022-04-14 Thread Jason Merrill via Gcc-patches
The expression pretty-printing code crashed on a location wrapper with no
type, and didn't know what to do with a USING_DECL.

Tested x86_64-pc-linux-gnu, applying to trunk.

PR c++/102987

gcc/cp/ChangeLog:

* error.cc (dump_expr): Handle USING_DECL.
[VIEW_CONVERT_EXPR]: Just look through location wrapper.

gcc/testsuite/ChangeLog:

* g++.dg/diagnostic/using1.C: New test.
---
 gcc/cp/error.cc  |  8 
 gcc/testsuite/g++.dg/diagnostic/using1.C | 16 
 2 files changed, 24 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/diagnostic/using1.C

diff --git a/gcc/cp/error.cc b/gcc/cp/error.cc
index e76842e1a2a..1e944ca3f75 100644
--- a/gcc/cp/error.cc
+++ b/gcc/cp/error.cc
@@ -2203,6 +2203,7 @@ dump_expr (cxx_pretty_printer *pp, tree t, int flags)
 case WILDCARD_DECL:
 case OVERLOAD:
 case TYPE_DECL:
+case USING_DECL:
 case IDENTIFIER_NODE:
   dump_decl (pp, t, ((flags & ~(TFF_DECL_SPECIFIERS|TFF_RETURN_TYPE
 |TFF_TEMPLATE_HEADER))
@@ -2584,6 +2585,13 @@ dump_expr (cxx_pretty_printer *pp, tree t, int flags)
 case VIEW_CONVERT_EXPR:
   {
tree op = TREE_OPERAND (t, 0);
+
+   if (location_wrapper_p (t))
+ {
+   dump_expr (pp, op, flags);
+   break;
+ }
+
tree ttype = TREE_TYPE (t);
tree optype = TREE_TYPE (op);
 
diff --git a/gcc/testsuite/g++.dg/diagnostic/using1.C 
b/gcc/testsuite/g++.dg/diagnostic/using1.C
new file mode 100644
index 000..eb4f18d1d8b
--- /dev/null
+++ b/gcc/testsuite/g++.dg/diagnostic/using1.C
@@ -0,0 +1,16 @@
+// PR c++/102987
+// { dg-do compile { target c++11 } }
+
+struct a {
+  bool b();
+};
+template  struct d : c {
+  using c::e;
+  using f = d;
+  constexpr int g(decltype(e.b())) { return buh; } // { dg-error "buh" }
+};
+struct h {
+  a e;
+};
+using i = d;
+auto j = i{}.g(1);

base-commit: 031bd52e482a53314d3dfac2d375c1033a6b7031
-- 
2.27.0



[committed] analyzer: fix escaping of pointer arithmetic [PR105264]

2022-04-14 Thread David Malcolm via Gcc-patches
PR analyzer/105264 reports that the analyzer can fail to treat
(PTR + IDX) and PTR[IDX] as referring to the same memory under
some situations.

There are various ways in which this can happen when IDX is a
symbolic value, due to having several ways in which such memory
regions can be referred to symbolically.  I attempted to fix this by
being smarter when folding svalues and regions, but this fix
seems too fiddly to attempt in stage 4.

Instead, this less ambitious patch fixes a false positive from
-Wanalyzer-use-of-uninitialized-value by making the analyzer's escape
analysis smarter, so that it treats *PTR as escaping when
(PTR + OFFSET) is passed to an external function, and thus
it treats *PTR as possibly-initialized (the "passing [IDX]" case
was already working).

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r12-8169-ga358e4b60815b4.

gcc/analyzer/ChangeLog:
PR analyzer/105264
* region-model-reachability.cc (reachable_regions::handle_parm):
Use maybe_get_deref_base_region rather than just region_svalue, to
handle pointer arithmetic also.
* svalue.cc (svalue::maybe_get_deref_base_region): New.
* svalue.h (svalue::maybe_get_deref_base_region): New decl.

gcc/testsuite/ChangeLog:
PR analyzer/105264
* gcc.dg/analyzer/torture/symbolic-10.c: New test.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/region-model-reachability.cc |  8 +---
 gcc/analyzer/svalue.cc| 42 +++
 gcc/analyzer/svalue.h |  2 +
 .../gcc.dg/analyzer/torture/symbolic-10.c | 40 ++
 4 files changed, 86 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/torture/symbolic-10.c

diff --git a/gcc/analyzer/region-model-reachability.cc 
b/gcc/analyzer/region-model-reachability.cc
index b876b8f0363..12d09c3e500 100644
--- a/gcc/analyzer/region-model-reachability.cc
+++ b/gcc/analyzer/region-model-reachability.cc
@@ -252,12 +252,8 @@ reachable_regions::handle_parm (const svalue *sval, tree 
param_type)
 m_mutable_svals.add (sval);
   else
 m_reachable_svals.add (sval);
-  if (const region_svalue *parm_ptr
-  = sval->dyn_cast_region_svalue ())
-{
-  const region *pointee_reg = parm_ptr->get_pointee ();
-  add (pointee_reg, is_mutable);
-}
+  if (const region *base_reg = sval->maybe_get_deref_base_region ())
+add (base_reg, is_mutable);
   /* Treat all svalues within a compound_svalue as reachable.  */
   if (const compound_svalue *compound_sval
   = sval->dyn_cast_compound_svalue ())
diff --git a/gcc/analyzer/svalue.cc b/gcc/analyzer/svalue.cc
index 536bc288dbf..a1403f0fbef 100644
--- a/gcc/analyzer/svalue.cc
+++ b/gcc/analyzer/svalue.cc
@@ -651,6 +651,48 @@ svalue::all_zeroes_p () const
   return false;
 }
 
+/* If this svalue is a pointer, attempt to determine the base region it points
+   to.  Return NULL on any problems.  */
+
+const region *
+svalue::maybe_get_deref_base_region () const
+{
+  const svalue *iter = this;
+  while (1)
+{
+  switch (iter->get_kind ())
+   {
+   default:
+ return NULL;
+
+   case SK_REGION:
+ {
+   const region_svalue *region_sval
+ = as_a  (iter);
+   return region_sval->get_pointee ()->get_base_region ();
+ }
+
+   case SK_BINOP:
+ {
+   const binop_svalue *binop_sval
+ = as_a  (iter);
+   switch (binop_sval->get_op ())
+ {
+ case POINTER_PLUS_EXPR:
+   /* If we have a symbolic value expressing pointer arithmetic,
+  use the LHS.  */
+   iter = binop_sval->get_arg0 ();
+   continue;
+
+ default:
+   return NULL;
+ }
+   return NULL;
+ }
+   }
+}
+}
+
 /* class region_svalue : public svalue.  */
 
 /* Implementation of svalue::dump_to_pp vfunc for region_svalue.  */
diff --git a/gcc/analyzer/svalue.h b/gcc/analyzer/svalue.h
index 4bbe8588b8d..29ea2ee6408 100644
--- a/gcc/analyzer/svalue.h
+++ b/gcc/analyzer/svalue.h
@@ -175,6 +175,8 @@ public:
  per-type and thus it's meaningless for them to "have state".  */
   virtual bool can_have_associated_state_p () const { return true; }
 
+  const region *maybe_get_deref_base_region () const;
+
  protected:
   svalue (complexity c, tree type)
   : m_complexity (c), m_type (type)
diff --git a/gcc/testsuite/gcc.dg/analyzer/torture/symbolic-10.c 
b/gcc/testsuite/gcc.dg/analyzer/torture/symbolic-10.c
new file mode 100644
index 000..b2f3a8a1d86
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/torture/symbolic-10.c
@@ -0,0 +1,40 @@
+/* Verify that -fanalyzer considers that mmfs escapes when passing either:
+ *(mmfs + i)
+   and
+ ([i])
+   to an external function (for symbolic i).  */
+
+typedef struct s_mmfile {
+  char *ptr;
+  long size;
+} mmfile_t;
+
+void 

[PATCH] ppc: testsuite: p9-vec-length: add -mno-strict-align and -misel (was: Re: enable __ieee128 for p9vector tests)

2022-04-14 Thread Alexandre Oliva via Gcc-patches
On Apr 14, 2022, Segher Boessenkool  wrote:

> Yes, that is a problem.  None of our testcases are set up for compilers
> with weird defaults (and this is not specific to rs6000).

> I do not want to change many thousands of test cases to not use defaults
> anymore, to specify everything everywhere instead :-(  This would make
> things more unmaintainable than they already are.

I guess you're not going to like this one, then :-(

Our (AdaCore's) ppc64-vx7r2 builds have -mstrict-align and -mno-isel as
defaults, but several p9-vec-length* tests fail with those defaults.
Given your statement above, I'm not very hopeful that adding options to
make such expectations of the tests more explicit would be well
received, but at least the mailing list records will hold that
information, in case it happens to be useful for someone else.


The p9-vec-length tests expect vectorization on loop bodies and
epilogues that reference arrays that are not known to be more aligned
than their small element types.

Though VSX vectors work best with 32- or 64-bit alignment, unaligned
vector loads and stores are expected by the tests.  However, with our
implicit default to -mstrict-align, vector loads and stores not known
to be aligned end up open coded, which doesn't match the asm output
expectations coded in the tests.

Adding -mno-strict-align restores the unaligned vector loads and
stores, and this is enough for some of these tests to pass.

Some also require -misel, without which conditional stores end up open
coded into compares and branches.  That, in turn, makes some of the
epilogue blocks short enough that bbro duplicates them, so that
expected vector loads and stores with limited length diverge from the
expectation.

Restoring the defaults with both options, all of these tests pass
on x86_64-linux-gnu x ppc64-vx7r2.  Ok to install?

for  gcc/testsuite/ChangeLog

* gcc.target/powerpc/p9-vec-length-epil-1.c: Add
-mno-strict-align and -misel.
* gcc.target/powerpc/p9-vec-length-epil-2.c: Likewise.
* gcc.target/powerpc/p9-vec-length-epil-3.c: Likewise.
* gcc.target/powerpc/p9-vec-length-epil-4.c: Likewise.
* gcc.target/powerpc/p9-vec-length-epil-5.c: Likewise.
* gcc.target/powerpc/p9-vec-length-epil-6.c: Likewise.
* gcc.target/powerpc/p9-vec-length-epil-7.c: Likewise.
* gcc.target/powerpc/p9-vec-length-epil-8.c: Likewise.
* gcc.target/powerpc/p9-vec-length-epil-run-1.c: Likewise.
* gcc.target/powerpc/p9-vec-length-epil-run-2.c: Likewise.
* gcc.target/powerpc/p9-vec-length-epil-run-3.c: Likewise.
* gcc.target/powerpc/p9-vec-length-epil-run-4.c: Likewise.
* gcc.target/powerpc/p9-vec-length-epil-run-5.c: Likewise.
* gcc.target/powerpc/p9-vec-length-epil-run-6.c: Likewise.
* gcc.target/powerpc/p9-vec-length-epil-run-7.c: Likewise.
* gcc.target/powerpc/p9-vec-length-epil-run-8.c: Likewise.
* gcc.target/powerpc/p9-vec-length-full-1.c: Likewise.
* gcc.target/powerpc/p9-vec-length-full-2.c: Likewise.
* gcc.target/powerpc/p9-vec-length-full-3.c: Likewise.
* gcc.target/powerpc/p9-vec-length-full-4.c: Likewise.
* gcc.target/powerpc/p9-vec-length-full-5.c: Likewise.
* gcc.target/powerpc/p9-vec-length-full-6.c: Likewise.
* gcc.target/powerpc/p9-vec-length-full-7.c: Likewise.
* gcc.target/powerpc/p9-vec-length-full-8.c: Likewise.
* gcc.target/powerpc/p9-vec-length-full-run-1.c: Likewise.
* gcc.target/powerpc/p9-vec-length-full-run-2.c: Likewise.
* gcc.target/powerpc/p9-vec-length-full-run-3.c: Likewise.
* gcc.target/powerpc/p9-vec-length-full-run-4.c: Likewise.
* gcc.target/powerpc/p9-vec-length-full-run-5.c: Likewise.
* gcc.target/powerpc/p9-vec-length-full-run-6.c: Likewise.
* gcc.target/powerpc/p9-vec-length-full-run-7.c: Likewise.
* gcc.target/powerpc/p9-vec-length-full-run-8.c: Likewise.

TN: V413-044
---
 .../gcc.target/powerpc/p9-vec-length-epil-1.c  |2 +-
 .../gcc.target/powerpc/p9-vec-length-epil-2.c  |2 +-
 .../gcc.target/powerpc/p9-vec-length-epil-3.c  |2 +-
 .../gcc.target/powerpc/p9-vec-length-epil-4.c  |2 +-
 .../gcc.target/powerpc/p9-vec-length-epil-5.c  |2 +-
 .../gcc.target/powerpc/p9-vec-length-epil-6.c  |2 +-
 .../gcc.target/powerpc/p9-vec-length-epil-7.c  |2 +-
 .../gcc.target/powerpc/p9-vec-length-epil-8.c  |2 +-
 .../gcc.target/powerpc/p9-vec-length-epil-run-1.c  |2 +-
 .../gcc.target/powerpc/p9-vec-length-epil-run-2.c  |2 +-
 .../gcc.target/powerpc/p9-vec-length-epil-run-3.c  |2 +-
 .../gcc.target/powerpc/p9-vec-length-epil-run-4.c  |2 +-
 .../gcc.target/powerpc/p9-vec-length-epil-run-5.c  |2 +-
 .../gcc.target/powerpc/p9-vec-length-epil-run-6.c  |2 +-
 .../gcc.target/powerpc/p9-vec-length-epil-run-7.c  |2 +-
 .../gcc.target/powerpc/p9-vec-length-epil-run-8.c  |2 

Re: [PATCH v4] libgo: Don't use pt_regs member in mcontext_t

2022-04-14 Thread Ian Lance Taylor via Gcc-patches
On Mon, Apr 11, 2022 at 11:28 AM Sören Tempel  wrote:
>
> Ian Lance Taylor  wrote:
> > What I was hoping from my earlier question was that you could tell me
> > the exact lines to write in the current sources that will compile on
> > MUSL. Don't include , don't refer to earlier patches as
> > that is what I tried to do earlier but failed, don't add new #define
> > macros, just add #ifdef and appropriate lines.  Thanks.  If the new
> > lines also work on glibc using register indexes rather than names,
> > that would be a bonus.
>
> Sorry, may bad. Here you go:

Thanks!  I tested a version of that code with glibc, and it works
there too, so I've committed this patch after testing on
powerpc-linux-gnu and x86_64-linux-gnu.  Please let me know about any
problems.

Ian
5c66a1182acceebf9fbcf02039d85a53c9c18bf1
diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE
index f93eaf48e28..75ee2e3aaca 100644
--- a/gcc/go/gofrontend/MERGE
+++ b/gcc/go/gofrontend/MERGE
@@ -1,4 +1,4 @@
-45108f37070afb696b069768700e39a269f1fecb
+323ab0e6fab89978bdbd83dca9c2ad9c5dcd690f
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
diff --git a/libgo/runtime/go-signal.c b/libgo/runtime/go-signal.c
index 9c919e1568a..2caddd068d6 100644
--- a/libgo/runtime/go-signal.c
+++ b/libgo/runtime/go-signal.c
@@ -230,15 +230,10 @@ getSiginfo(siginfo_t *info, void *context 
__attribute__((unused)))
ret.sigpc = ((ucontext_t*)(context))->uc_mcontext.gregs[REG_EIP];
 #elif defined(__alpha__) && defined(__linux__)
ret.sigpc = ((ucontext_t*)(context))->uc_mcontext.sc_pc;
+#elif defined(__PPC64__) && defined(__linux__)
+   ret.sigpc = ((ucontext_t*)(context))->uc_mcontext.gp_regs[32];
 #elif defined(__PPC__) && defined(__linux__)
-   // For some reason different libc implementations use
-   // different names.
-#if defined(__PPC64__) || defined(__GLIBC__)
-   ret.sigpc = ((ucontext_t*)(context))->uc_mcontext.regs->nip;
-#else
-   // Assumed to be ppc32 musl.
ret.sigpc = ((ucontext_t*)(context))->uc_mcontext.gregs[32];
-#endif
 #elif defined(__PPC__) && defined(_AIX)
ret.sigpc = ((ucontext_t*)(context))->uc_mcontext.jmp_context.iar;
 #elif defined(__aarch64__) && defined(__linux__)
@@ -354,15 +349,15 @@ dumpregs(siginfo_t *info __attribute__((unused)), void 
*context __attribute__((u
mcontext_t *m = &((ucontext_t*)(context))->uc_mcontext;
int i;
 
-#if defined(__PPC64__) || defined(__GLIBC__)
+#if defined(__PPC64__)
for (i = 0; i < 32; i++)
-   runtime_printf("r%d %X\n", i, m->regs->gpr[i]);
-   runtime_printf("pc  %X\n", m->regs->nip);
-   runtime_printf("msr %X\n", m->regs->msr);
-   runtime_printf("cr  %X\n", m->regs->ccr);
-   runtime_printf("lr  %X\n", m->regs->link);
-   runtime_printf("ctr %X\n", m->regs->ctr);
-   runtime_printf("xer %X\n", m->regs->xer);
+   runtime_printf("r%d %X\n", i, m->gp_regs[i]);
+   runtime_printf("pc  %X\n", m->gp_regs[32]);
+   runtime_printf("msr %X\n", m->gp_regs[33]);
+   runtime_printf("cr  %X\n", m->gp_regs[38]);
+   runtime_printf("lr  %X\n", m->gp_regs[36]);
+   runtime_printf("ctr %X\n", m->gp_regs[35]);
+   runtime_printf("xer %X\n", m->gp_regs[37]);
 #else
for (i = 0; i < 32; i++)
runtime_printf("r%d %X\n", i, m->gregs[i]);


[pushed] c++: constexpr trivial -fno-elide-ctors [PR104646]

2022-04-14 Thread Jason Merrill via Gcc-patches
The constexpr constructor checking code got confused by the expansion of a
trivial copy constructor; we don't need to do that checking for defaulted
ctors, anyway.

Tested x86_64-pc-linux-gnu, applying to trunk.

PR c++/104646

gcc/cp/ChangeLog:

* constexpr.cc (maybe_save_constexpr_fundef): Don't do extra
checks for defaulted ctors.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/constexpr-fno-elide-ctors1.C: New test.
---
 gcc/cp/constexpr.cc   |  3 +-
 .../g++.dg/cpp0x/constexpr-fno-elide-ctors1.C | 89 +++
 2 files changed, 91 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/constexpr-fno-elide-ctors1.C

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index b170053e8e1..e89440e770f 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -920,7 +920,8 @@ maybe_save_constexpr_fundef (tree fun)
   if (!potential && complain)
 require_potential_rvalue_constant_expression (massaged);
 
-  if (DECL_CONSTRUCTOR_P (fun) && potential)
+  if (DECL_CONSTRUCTOR_P (fun) && potential
+  && !DECL_DEFAULTED_FN (fun))
 {
   if (cx_check_missing_mem_inits (DECL_CONTEXT (fun),
  massaged, complain))
diff --git a/gcc/testsuite/g++.dg/cpp0x/constexpr-fno-elide-ctors1.C 
b/gcc/testsuite/g++.dg/cpp0x/constexpr-fno-elide-ctors1.C
new file mode 100644
index 000..71c76fa0247
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/constexpr-fno-elide-ctors1.C
@@ -0,0 +1,89 @@
+// PR c++/104646
+// { dg-do compile { target c++11 } }
+// { dg-additional-options -fno-elide-constructors }
+
+template  struct pair {
+  _T1 first;
+  int second;
+};
+template  class __normal_iterator {
+  _Iterator __traits_type;
+
+public:
+  constexpr __normal_iterator() {}
+};
+template  class allocator;
+template  struct allocator_traits;
+template  struct allocator_traits> {
+  using value_type = _Tp;
+  template  using rebind_alloc = allocator<_Up>;
+};
+template  struct __alloc_traits {
+  typedef allocator_traits<_Alloc> _Base_type;
+  typedef typename _Base_type::value_type _reference;
+  template  struct rebind {
+typedef typename _Base_type::template rebind_alloc<_Tp> other;
+  };
+};
+template  struct _Vector_base {
+  typedef typename __alloc_traits<_Alloc>::template rebind<_Tp>::other 
_Tp_alloc_type;
+};
+template > class vector {
+public:
+  typename __alloc_traits<
+  typename _Vector_base<_Tp, _Alloc>::_Tp_alloc_type>::const_reference
+  operator[](long);
+};
+enum match_flag_type {};
+template  class Trans_NS___cxx11_basic_regex;
+class Trans_NS___cxx11_match_results;
+enum _RegexExecutorPolicy { _S_auto };
+template 
+bool __regex_algo_impl(Trans_NS___cxx11_match_results &,
+   const Trans_NS___cxx11_basic_regex<_CharT, _TraitsT> &);
+template  class _Executor;
+template 
+class Trans_NS___cxx11_basic_regex {};
+class Trans_NS___cxx11_match_results : vector {
+  template 
+  friend bool __regex_algo_impl(Trans_NS___cxx11_match_results &,
+const Trans_NS___cxx11_basic_regex<_Cp, _Rp> 
&);
+};
+template 
+void regex_search(_Bi_iter, _Alloc,
+  Trans_NS___cxx11_basic_regex<_Ch_type, _Rx_traits>) {
+  __regex_algo_impl<_Bi_iter, _Alloc, _Ch_type, _Rx_traits, _S_auto, false>;
+}
+match_flag_type __regex_algo_impl___flags;
+template 
+bool __regex_algo_impl(
+Trans_NS___cxx11_match_results &__m,
+const Trans_NS___cxx11_basic_regex<_CharT, _TraitsT> &__re) {
+  __normal_iterator __e, __s;
+  _Executor __executor(__s, __e, __m, __re,
+  __regex_algo_impl___flags);
+  __executor._M_match();
+  return false;
+}
+template  class _Executor {
+public:
+  _Executor(__normal_iterator, __normal_iterator,
+vector, Trans_NS___cxx11_basic_regex, match_flag_type);
+  void _M_match() { _M_dfs(); }
+  void _M_dfs();
+  vector>> _M_rep_count;
+};
+long _M_rep_once_more___i;
+template 
+void _Executor<_BiIter, _Alloc, _TraitsT, __dfs_mode>::_M_dfs() {
+  auto __rep_count = _M_rep_count[_M_rep_once_more___i];
+}
+char main___trans_tmp_1;
+void main___trans_tmp_2() {
+  Trans_NS___cxx11_basic_regex re;
+  regex_search(main___trans_tmp_1, main___trans_tmp_2, re);
+}

base-commit: 74b2e20222cf4fb24b90561ddb6f0989738bb722
-- 
2.27.0



[PATCH] PR105169 Fix references to discarded sections

2022-04-14 Thread Giuliano Belinassi via Gcc-patches
When -fpatchable-function-entry= is enabled, certain C++ codes fails to
link because of generated references to discarded sections in
__patchable_function_entry section. This commit fixes this problem by
puting those references in a COMDAT section.

Boostrapped and regtested on x86_64 linux.

OK for Stage4?

2022-04-13  Giuliano Belinassi  

PR c++/105169
* targhooks.cc (default_print_patchable_function_entry_1): Handle 
COMDAT case.
* varasm.cc (handle_vtv_comdat_section): Rename to...
(switch_to_comdat_section): Generalize to also cover
__patchable_function_entry case.
(assemble_variable): Rename call from handle_vtv_comdat_section to
switch_to_comdat_section.
(output_object_block): Same as above.
* varasm.h: Declare switch_to_comdat_section.

2022-04-13  Giuliano Belinassi  

PR c++/105169
* g++.dg/modules/pr105169.h: New file.
* g++.dg/modules/pr105169_a.C: New test.
* g++.dg/modules/pr105169_b.C: New file.

Signed-off-by: Giuliano Belinassi 
---
 gcc/targhooks.cc  |  8 ++--
 gcc/testsuite/ChangeLog   |  7 +++
 gcc/testsuite/g++.dg/modules/pr105169.h   | 22 
 gcc/testsuite/g++.dg/modules/pr105169_a.C | 25 +++
 gcc/testsuite/g++.dg/modules/pr105169_b.C | 12 +++
 gcc/varasm.cc | 25 +--
 gcc/varasm.h  |  1 +
 7 files changed, 87 insertions(+), 13 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/modules/pr105169.h
 create mode 100644 gcc/testsuite/g++.dg/modules/pr105169_a.C
 create mode 100644 gcc/testsuite/g++.dg/modules/pr105169_b.C

diff --git a/gcc/targhooks.cc b/gcc/targhooks.cc
index e22bc66a6c8..540460e7db9 100644
--- a/gcc/targhooks.cc
+++ b/gcc/targhooks.cc
@@ -1995,8 +1995,12 @@ default_print_patchable_function_entry_1 (FILE *file,
   patch_area_number++;
   ASM_GENERATE_INTERNAL_LABEL (buf, "LPFE", patch_area_number);
 
-  switch_to_section (get_section ("__patchable_function_entries",
- flags, current_function_decl));
+  section *sect = get_section ("__patchable_function_entries",
+ flags, current_function_decl);
+  if (HAVE_COMDAT_GROUP && DECL_COMDAT_GROUP (current_function_decl))
+   switch_to_comdat_section (sect, current_function_decl);
+  else
+   switch_to_section (sect);
   assemble_align (POINTER_SIZE);
   fputs (asm_op, file);
   assemble_name_raw (file, buf);
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 9ab7a178bf8..524a546a832 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,10 @@
+2022-04-13  Giuliano Belinassi  
+
+   PR c++/105169
+   * g++.dg/modules/pr105169.h: New file.
+   * g++.dg/modules/pr105169_a.C: New test.
+   * g++.dg/modules/pr105169_b.C: New file.
+
 2022-04-12  Antoni Boucher  
 
PR jit/104293
diff --git a/gcc/testsuite/g++.dg/modules/pr105169.h 
b/gcc/testsuite/g++.dg/modules/pr105169.h
new file mode 100644
index 000..a7e76270531
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/pr105169.h
@@ -0,0 +1,22 @@
+class IPXAddressClass
+{
+public:
+IPXAddressClass(void);
+};
+
+class WinsockInterfaceClass
+{
+
+public:
+WinsockInterfaceClass(void);
+
+virtual void Set_Broadcast_Address(void*){};
+
+virtual int Get_Protocol(void)
+{
+return 0;
+};
+
+protected:
+};
+
diff --git a/gcc/testsuite/g++.dg/modules/pr105169_a.C 
b/gcc/testsuite/g++.dg/modules/pr105169_a.C
new file mode 100644
index 000..66dc4b7901f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/pr105169_a.C
@@ -0,0 +1,25 @@
+/* { dg-module-do link } */
+/* { dg-options "-std=c++11 -fpatchable-function-entry=1 -O2" } */
+/* { dg-additional-options "-std=c++11 -fpatchable-function-entry=1 -O2" } */
+
+/* This test is in the "modules" package because it supports multiple files
+   linkage.  */
+
+#include "pr105169.h"
+
+WinsockInterfaceClass* PacketTransport;
+
+IPXAddressClass::IPXAddressClass(void)
+{
+}
+
+int function()
+{
+  return PacketTransport->Get_Protocol();
+}
+
+int main()
+{
+  IPXAddressClass ipxaddr;
+  return 0;
+}
diff --git a/gcc/testsuite/g++.dg/modules/pr105169_b.C 
b/gcc/testsuite/g++.dg/modules/pr105169_b.C
new file mode 100644
index 000..5f8b00dfe51
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/pr105169_b.C
@@ -0,0 +1,12 @@
+/* { dg-module-do link } */
+/* { dg-options "-std=c++11 -fpatchable-function-entry=1 -O2" } */
+/* { dg-additional-options "-std=c++11 -fpatchable-function-entry=1 -O2" } */
+
+/* This test is in the "modules" package because it supports multiple files
+   linkage.  */
+
+#include "pr105169.h"
+
+WinsockInterfaceClass::WinsockInterfaceClass(void)
+{
+}
diff --git a/gcc/varasm.cc b/gcc/varasm.cc
index c41f17d64f7..7cd91e2bb56 100644
--- 

[PATCH] c++: wrong error with variadic concept [PR105268]

2022-04-14 Thread Marek Polacek via Gcc-patches
Here we issue a wrong error for the

  template>> void g();

line in the testcase.  I surmise that's because we mistakenly parse
C_many as a placeholder-type-specifier, and things go wrong from
there.  We are in a default argument so we should reject parsing C_many
as a placeholder-type-specifier, which would mean creating a new parameter.
We want C_many to be a concept-id instead.

It's interesting to see why the same problem didn't occur for C_one.
In that case, cp_parser_placeholder_type_specifier -> finish_type_constraints
-> build_type_constraint -> build_concept_check -> build_standard_check ->
coerce_template_parms fails the parse here:

 8916   nargs = inner_args ? NUM_TMPL_ARGS (inner_args) : 0;
 8917   if ((nargs - variadic_args_p > nparms && !variadic_p)
 8918   || (nargs < nparms - variadic_p
 8919   && require_all_args
 8920   && !variadic_args_p
 8921   && (!use_default_args
 8922   || (TREE_VEC_ELT (parms, nargs) != error_mark_node
 8923   && !TREE_PURPOSE (TREE_VEC_ELT (parms, nargs))
 8924 {
 8925 bad_nargs:
 ...
 8943   return error_mark_node;

because nargs is 2 (the targs are ) while nparms is
1 (for the one 'typename' in the tparam list of C_one).  But for
C_many variadic_p is true so we don't return error_mark_node but
.

This patch does not issue any error for the !tentative case because
I didn't figure out how to trigger that.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

PR c++/105268

gcc/cp/ChangeLog:

* parser.cc (cp_parser_placeholder_type_specifier): Return
error_mark_node when trying to build up a constrained parameter in
a default argument.

gcc/testsuite/ChangeLog:

* g++.dg/concepts/variadic6.C: New test.
---
 gcc/cp/parser.cc  |  7 ++-
 gcc/testsuite/g++.dg/concepts/variadic6.C | 20 
 2 files changed, 26 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/concepts/variadic6.C

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index bfd16e1ef62..dfb613168b6 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -20041,7 +20041,12 @@ cp_parser_placeholder_type_specifier (cp_parser 
*parser, location_t loc,
   /* In a template parameter list, a type-parameter can be introduced
  by type-constraints alone.  */
   if (processing_template_parmlist && !placeholder)
-return build_constrained_parameter (con, proto, args);
+{
+  /* In a default argument we may not be creating new parameters.  */
+  if (parser->local_variables_forbidden_p & LOCAL_VARS_FORBIDDEN)
+   return error_mark_node;
+  return build_constrained_parameter (con, proto, args);
+}
 
   /* Diagnose issues placeholder issues.  */
   if (!flag_concepts_ts
diff --git a/gcc/testsuite/g++.dg/concepts/variadic6.C 
b/gcc/testsuite/g++.dg/concepts/variadic6.C
new file mode 100644
index 000..0b386b0cd6d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/concepts/variadic6.C
@@ -0,0 +1,20 @@
+// PR c++/105268
+// { dg-do compile { target concepts } }
+
+template concept C_one = true;
+template concept C_many = true;
+
+template struct S { };
+
+template>> void f();
+template>> void g();
+
+void
+fn (auto a = S>{})
+{
+}
+
+void
+fn2 (auto a = S>{})
+{
+}

base-commit: 82536fbb8a7d150b829650378e0ba07dad5c8fb8
-- 
2.35.1



[PATCH] libstdc++: Stop defining _GLIBCXX_ASSERTIONS in floating_to_chars.cc

2022-04-14 Thread Patrick Palka via Gcc-patches
Assertions were originally enabled in the compiled-in floating-point
std::to_chars implementation to help shake out any bugs, but they
apparently impose a significant performance penalty, in particular for
the hex formatting which is around 25% slower with assertions enabled.
This seems too high of a cost for unconditionally enabling them.

The newly added calls to __builtin_unreachable work around the compiler
no longer knowing that the set of valid values of 'fmt' is limited (which
was previously upheld by an assert).

Tested on x86_64-pc-linux-gnu, does this look OK for trunk?

libstdc++-v3/ChangeLog:

* src/c++17/floating_to_chars.cc: Don't define
_GLIBCXX_ASSERTIONS.
(__floating_to_chars_shortest): Add __builtin_unreachable calls to
squelch false-positive -Wmaybe-uninitialized and -Wreturn-type
warnings.
(__floating_to_chars_precision): Likewise.
---
 libstdc++-v3/src/c++17/floating_to_chars.cc | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/libstdc++-v3/src/c++17/floating_to_chars.cc 
b/libstdc++-v3/src/c++17/floating_to_chars.cc
index 66bd457cbe2..4599d68a39c 100644
--- a/libstdc++-v3/src/c++17/floating_to_chars.cc
+++ b/libstdc++-v3/src/c++17/floating_to_chars.cc
@@ -22,9 +22,6 @@
 // see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 // .
 
-// Activate __glibcxx_assert within this file to shake out any bugs.
-#define _GLIBCXX_ASSERTIONS 1
-
 #include 
 
 #include 
@@ -1114,6 +,7 @@ template
   }
 
 __glibcxx_assert(false);
+__builtin_unreachable();
   }
 
 template
@@ -1202,6 +1200,8 @@ template
effective_precision = min(precision, max_eff_scientific_precision);
output_specifier = "%.*Lg";
  }
+   else
+ __builtin_unreachable();
const int excess_precision = (fmt != chars_format::general
  ? precision - effective_precision : 0);
 
@@ -1234,6 +1234,8 @@ template
  output_length_upper_bound = sign + strlen("0");
output_length_upper_bound += sizeof(radix) + effective_precision;
  }
+   else
+ __builtin_unreachable();
 
// Do the sprintf into the local buffer.
char buffer[output_length_upper_bound+1];
@@ -1570,6 +1572,7 @@ template
   }
 
 __glibcxx_assert(false);
+__builtin_unreachable();
   }
 
 // Define the overloads for float.
-- 
2.36.0.rc2.10.g1ac7422e39



Re: [PATCH] libstdc++: Optimize std::has_single_bit

2022-04-14 Thread Patrick Palka via Gcc-patches
On Thu, Apr 14, 2022 at 2:59 PM Jonathan Wakely  wrote:
>
> On Thu, 14 Apr 2022 at 19:17, Patrick Palka via Libstdc++
>  wrote:
> >
> > This reimplements std::has_single_bit using the well-known bit-twiddilng
> > trick[1], which is much faster than popcount on x86_64.
>
> Is that always true for all microarchitectures? We have
> https://gcc.gnu.org/PR97759 on this topic, and I think we agreed that
> the compiler should match the popcount pattern and Do The Right Thing
> for the target and current -march.

Whoops, I completely forgot that we had a PR about this!  Makes sense
to fix this on the compiler side instead.

>
> If we're confident it's always better, that PR number should go in the
> changelog.
>
> > Note that when __x is signed and maximally negative then this
> > implementation invokes UB due to signed overflow, whereas the previous
> > implementation would return true.  This isn't a problem for
> > has_single_bit because it accepts only unsigned types, but it is a
> > potential problem for the unconstrained __has_single_bit.  Should
> > __has_single_bit continue to handle this non-standard case correctly for
> > sake of backwards compatibility?
>
> No. The extensions have the same preconditions as the corresponding
> standard functions, we just don't check them. The code using them is
> internal to the library and should only use unsigned types. Users
> relying on the extensions need to meet those preconditions too.

Understood, thanks!

>
> > Tested on x86_64-pc-linux-gnu.
> >
> > [1]: 
> > http://www.graphics.stanford.edu/~seander/bithacks.html#DetermineIfPowerOf2
> >
> > libstdc++-v3/ChangeLog:
> >
> > * include/std/bit (__has_single_bit): Define in terms of
> > bitwise-and, not popcount.
> > ---
> >  libstdc++-v3/include/std/bit | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/libstdc++-v3/include/std/bit b/libstdc++-v3/include/std/bit
> > index ef19d649e32..621ee4a9b95 100644
> > --- a/libstdc++-v3/include/std/bit
> > +++ b/libstdc++-v3/include/std/bit
> > @@ -316,7 +316,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >template
> >  constexpr bool
> >  __has_single_bit(_Tp __x) noexcept
> > -{ return std::__popcount(__x) == 1; }
> > +{ return __x != 0 && (__x & (__x - 1)) == 0; }
> >
> >template
> >  constexpr _Tp
> > --
> > 2.36.0.rc2.10.g1ac7422e39
> >
>



[pushed] libgccjit: Fix a bootstrap break for some targets.

2022-04-14 Thread Iain Sandoe via Gcc-patches
Some targets use 'long long unsigned int' for unsigned HW int, and this
leads to a Werror=format= fail for two print cases in jit-playback.cc
introduced in r12-8117-g30f7c83e9cfe (Add support for bitcasts [PR104071])

As discussed on IRC, casting to (long) seems entirely reasonable for the
values (since they are type sizes).

tested that this fixes bootstrap on x86_64-darwin19 and running check-jit.
pushed to master, thanks
Iain

Signed-off-by: Iain Sandoe 

gcc/jit/ChangeLog:

* jit-playback.cc (new_bitcast): Cast values returned by tree_to_uhwi
to 'long' to match the print format.
---
 gcc/jit/jit-playback.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/jit/jit-playback.cc b/gcc/jit/jit-playback.cc
index b1e72fbcf8a..6be6bdf8dea 100644
--- a/gcc/jit/jit-playback.cc
+++ b/gcc/jit/jit-playback.cc
@@ -1440,10 +1440,10 @@ new_bitcast (location *loc,
 active_playback_ctxt->add_error (loc,
   "bitcast with types of different sizes");
 fprintf (stderr, "input expression (size: %ld):\n",
-  tree_to_uhwi (expr_size));
+  (long) tree_to_uhwi (expr_size));
 debug_tree (t_expr);
 fprintf (stderr, "requested type (size: %ld):\n",
-  tree_to_uhwi (type_size));
+  (long) tree_to_uhwi (type_size));
 debug_tree (t_dst_type);
   }
   tree t_bitcast = build1 (VIEW_CONVERT_EXPR, t_dst_type, t_expr);
-- 
2.24.3 (Apple Git-128)



Re: [PATCH] libstdc++: Optimize std::has_single_bit

2022-04-14 Thread Jonathan Wakely via Gcc-patches
On Thu, 14 Apr 2022 at 19:17, Patrick Palka via Libstdc++
 wrote:
>
> This reimplements std::has_single_bit using the well-known bit-twiddilng
> trick[1], which is much faster than popcount on x86_64.

Is that always true for all microarchitectures? We have
https://gcc.gnu.org/PR97759 on this topic, and I think we agreed that
the compiler should match the popcount pattern and Do The Right Thing
for the target and current -march.

If we're confident it's always better, that PR number should go in the
changelog.

> Note that when __x is signed and maximally negative then this
> implementation invokes UB due to signed overflow, whereas the previous
> implementation would return true.  This isn't a problem for
> has_single_bit because it accepts only unsigned types, but it is a
> potential problem for the unconstrained __has_single_bit.  Should
> __has_single_bit continue to handle this non-standard case correctly for
> sake of backwards compatibility?

No. The extensions have the same preconditions as the corresponding
standard functions, we just don't check them. The code using them is
internal to the library and should only use unsigned types. Users
relying on the extensions need to meet those preconditions too.

> Tested on x86_64-pc-linux-gnu.
>
> [1]: 
> http://www.graphics.stanford.edu/~seander/bithacks.html#DetermineIfPowerOf2
>
> libstdc++-v3/ChangeLog:
>
> * include/std/bit (__has_single_bit): Define in terms of
> bitwise-and, not popcount.
> ---
>  libstdc++-v3/include/std/bit | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/libstdc++-v3/include/std/bit b/libstdc++-v3/include/std/bit
> index ef19d649e32..621ee4a9b95 100644
> --- a/libstdc++-v3/include/std/bit
> +++ b/libstdc++-v3/include/std/bit
> @@ -316,7 +316,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>template
>  constexpr bool
>  __has_single_bit(_Tp __x) noexcept
> -{ return std::__popcount(__x) == 1; }
> +{ return __x != 0 && (__x & (__x - 1)) == 0; }
>
>template
>  constexpr _Tp
> --
> 2.36.0.rc2.10.g1ac7422e39
>



[pushed] c++: lambda and the current instantiation [PR82980]

2022-04-14 Thread Jason Merrill via Gcc-patches
When a captured variable is type-dependent, we've expressed the type of the
capture field and proxy with a decltype variant.  But if the type is "the
current instantiation", we need to be able to see that so that we can do
lookup inside it just like we could with the captured variable itself.

I also tried looking through lambda capture in
cp_parser_postfix_dot_deref_expression, but this way seems cleaner.  I plan
to treat more types as deducible in stage 1.

I considered also using this in do_auto_deduction, but think that would be
wrong: [temp.dep.expr] says an id-expression is type-dependent if it is
"associated by name lookup with a variable declared with a type that
contains a placeholder type where the initializer is type-dependent".  That
doesn't clearly exclude deducing a dependent type from the initializer, but
it seems like a barrier, and other implementations agree.

Tested x86_64-pc-linux-gnu, applying to trunk.

PR c++/82980

gcc/cp/ChangeLog:

* lambda.cc (type_deducible_expression_p): New.
(lambda_capture_field_type): Check it.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/lambda/lambda-current-inst1.C: New test.
---
 gcc/cp/lambda.cc  | 20 ++-
 .../cpp0x/lambda/lambda-current-inst1.C   | 18 +
 2 files changed, 37 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/lambda/lambda-current-inst1.C

diff --git a/gcc/cp/lambda.cc b/gcc/cp/lambda.cc
index f22798d51e8..65579edc316 100644
--- a/gcc/cp/lambda.cc
+++ b/gcc/cp/lambda.cc
@@ -183,6 +183,24 @@ lambda_function (tree lambda)
   return lambda;
 }
 
+/* True if EXPR is an expression whose type can be used directly in lambda
+   capture.  Not to be used for 'auto'.  */
+
+static bool
+type_deducible_expression_p (tree expr)
+{
+  if (!type_dependent_expression_p (expr))
+return true;
+  if (BRACE_ENCLOSED_INITIALIZER_P (expr)
+  || TREE_CODE (expr) == EXPR_PACK_EXPANSION)
+return false;
+  tree t = non_reference (TREE_TYPE (expr));
+  if (!t) return false;
+  while (TREE_CODE (t) == POINTER_TYPE)
+t = TREE_TYPE (t);
+  return currently_open_class (t);
+}
+
 /* Returns the type to use for the FIELD_DECL corresponding to the
capture of EXPR.  EXPLICIT_INIT_P indicates whether this is a
C++14 init capture, and BY_REFERENCE_P indicates whether we're
@@ -211,7 +229,7 @@ lambda_capture_field_type (tree expr, bool explicit_init_p,
   else
type = do_auto_deduction (type, expr, auto_node);
 }
-  else if (type_dependent_expression_p (expr))
+  else if (!type_deducible_expression_p (expr))
 {
   type = cxx_make_type (DECLTYPE_TYPE);
   DECLTYPE_TYPE_EXPR (type) = expr;
diff --git a/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-current-inst1.C 
b/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-current-inst1.C
new file mode 100644
index 000..a6631c5ca99
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-current-inst1.C
@@ -0,0 +1,18 @@
+// PR c++/82980
+// { dg-do compile { target c++11 } }
+
+template 
+struct Outer
+{
+  template 
+  void f();
+
+  void bar(Outer outer) {
+[outer](){ outer.f(); };
+  }
+  void baz(Outer *p) {
+[&](){ p->f(); };
+  }
+};
+
+int main() { }

base-commit: d634c5d7c78c6ec0fa39d96984460475564519c8
-- 
2.27.0



Re: enable __ieee128 for p9vector tests

2022-04-14 Thread Alexandre Oliva via Gcc-patches
On Apr 14, 2022, Segher Boessenkool  wrote:

> Lol, the dates line up very well, I didn't realise it was from 2021 :-)

Heh, indeed.  Same testsuite results cleanup season, too ;-)

>> The relevant fact, described in yesterday's message, is that -mfloat128
>> is not enabled by default, even with -mcpu=power9, except on target
>> variants that define TARGET_FLOAT128_ENABLE_TYPE to nonzero.  As you
>> stated, its overall default is zero (though GNU/Linux overrides it to
>> nonzero), so the existing tests do not conform with the machine's
>> defaults in assuming -mfloat128 is enabled by -mcpu=power9.

> First off, vxworks.h should not disable it again.

Erhm...  I'm not sure what the 'it' is.

For abundance of clarity, we do *not* disable vsx when -mcpu=power9 is
given.  vsx is enabled for these tests.  But neither -mcpu=power9 nor
having vsx enabled are enough for the _Float128/_ieee128 type to be
defined.

The target-specific option that controls whether _Float128/_ieee128 is
defined when VSX is enabled is TARGET_FLOAT128_ENABLE_TYPE.  The only
file that defines it as nonzero is rs6000/linux64.h, which backs up the
comment in rs6000.cc before the statement that carries out this choice:

  /* Enable the default support for IEEE 128-bit floating point on
 [GNU/]Linux VSX sytems.  [...]  */
  TARGET_FLOAT128_TYPE = TARGET_FLOAT128_ENABLE_TYPE && TARGET_VSX;


So, if the 'it' refers to VSX, I reaffirm it's enabled as it should.
But if 'it' refers to TARGET_FLOAT128_ENABLE_TYPE, then it would seem
that you're saying that this is no longer a choice available to targets,
and that _Float128/_ieee128 are now mandatory when VSX is available.
That would be quite a departure from the current state.

Now, we are looking into the possibility of enabling _Float128/_ieee128
on ppc64-vx7r2, but keep in mind it's a nonfree system, so if system
libraries (or kernel) aren't up to it, that would be a blocker.  So I'd
prefer if both choices for TARGET_FLOAT128_ENABLE_TYPE remained
available.


> Then, this needs to be fixed, indeed.  But that would be a code fix, not
> a testsuite workaround.  If you use -mcpu=power9 it should support QP
> float.

I guess there's room for improvement indeed, especially in light of the
second patch for pr79004.c sent out ealier today, but I don't think I'd
risk such changes at this stage of development of gcc-12, let alone when
maintainer and implementation seem to me to disagree as to what the
expected behavior is :-(

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


[PATCH] libstdc++: Optimize std::has_single_bit

2022-04-14 Thread Patrick Palka via Gcc-patches
This reimplements std::has_single_bit using the well-known bit-twiddilng
trick[1], which is much faster than popcount on x86_64.

Note that when __x is signed and maximally negative then this
implementation invokes UB due to signed overflow, whereas the previous
implementation would return true.  This isn't a problem for
has_single_bit because it accepts only unsigned types, but it is a
potential problem for the unconstrained __has_single_bit.  Should
__has_single_bit continue to handle this non-standard case correctly for
sake of backwards compatibility?

Tested on x86_64-pc-linux-gnu.

[1]: http://www.graphics.stanford.edu/~seander/bithacks.html#DetermineIfPowerOf2

libstdc++-v3/ChangeLog:

* include/std/bit (__has_single_bit): Define in terms of
bitwise-and, not popcount.
---
 libstdc++-v3/include/std/bit | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/std/bit b/libstdc++-v3/include/std/bit
index ef19d649e32..621ee4a9b95 100644
--- a/libstdc++-v3/include/std/bit
+++ b/libstdc++-v3/include/std/bit
@@ -316,7 +316,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
 constexpr bool
 __has_single_bit(_Tp __x) noexcept
-{ return std::__popcount(__x) == 1; }
+{ return __x != 0 && (__x & (__x - 1)) == 0; }
 
   template
 constexpr _Tp
-- 
2.36.0.rc2.10.g1ac7422e39



Re: enable __ieee128 for p9vector tests

2022-04-14 Thread Segher Boessenkool
On Thu, Apr 14, 2022 at 01:56:39PM -0300, Alexandre Oliva wrote:
> On Apr 14, 2022, Segher Boessenkool  wrote:
> > On Sat, Apr 17, 2021 at 06:19:02AM -0300, Alexandre Oliva wrote:
> >> On Apr 12, 2021, Segher Boessenkool  wrote:
> >> > On Fri, Apr 02, 2021 at 01:52:59PM -0300, Alexandre Oliva wrote:
> >> >> Several compile tests that use the __ieee128 type do not ensure it is
> >> >> defined.  This patch adds -mfloat128 to their command lines, and
> >> >> disregards the warning that may be issued by it.
> >> 
> >> > But they do make sure it is defined, they use -mcpu=power9 (etc.).  What
> >> > is different in your setup that that does not work?
> >> 
> >> I suppose it's either -mno-altivec -mno-vsx in our self-specs,
> 
> > Yes, that is a problem.
> 
> Sorry, that message from last year was an unfounded suspicion of mine
> based on incorrect information.  Indeed, -mcpu=power9 combined with
> -mno-vsx raise an error.

Lol, the dates line up very well, I didn't realise it was from 2021 :-)

> The relevant fact, described in yesterday's message, is that -mfloat128
> is not enabled by default, even with -mcpu=power9, except on target
> variants that define TARGET_FLOAT128_ENABLE_TYPE to nonzero.  As you
> stated, its overall default is zero (though GNU/Linux overrides it to
> nonzero), so the existing tests do not conform with the machine's
> defaults in assuming -mfloat128 is enabled by -mcpu=power9.

First off, vxworks.h should not disable it again.

Then, this needs to be fixed, indeed.  But that would be a code fix, not
a testsuite workaround.  If you use -mcpu=power9 it should support QP
float.


Segher


Re: enable __ieee128 for p9vector tests

2022-04-14 Thread Alexandre Oliva via Gcc-patches
On Apr 14, 2022, Segher Boessenkool  wrote:

> Hi!
> On Sat, Apr 17, 2021 at 06:19:02AM -0300, Alexandre Oliva wrote:
>> On Apr 12, 2021, Segher Boessenkool  wrote:
>> > On Fri, Apr 02, 2021 at 01:52:59PM -0300, Alexandre Oliva wrote:
>> >> Several compile tests that use the __ieee128 type do not ensure it is
>> >> defined.  This patch adds -mfloat128 to their command lines, and
>> >> disregards the warning that may be issued by it.
>> 
>> > But they do make sure it is defined, they use -mcpu=power9 (etc.).  What
>> > is different in your setup that that does not work?
>> 
>> I suppose it's either -mno-altivec -mno-vsx in our self-specs,

> Yes, that is a problem.

Sorry, that message from last year was an unfounded suspicion of mine
based on incorrect information.  Indeed, -mcpu=power9 combined with
-mno-vsx raise an error.

The relevant fact, described in yesterday's message, is that -mfloat128
is not enabled by default, even with -mcpu=power9, except on target
variants that define TARGET_FLOAT128_ENABLE_TYPE to nonzero.  As you
stated, its overall default is zero (though GNU/Linux overrides it to
nonzero), so the existing tests do not conform with the machine's
defaults in assuming -mfloat128 is enabled by -mcpu=power9.

Would you please reconsider your assessment, disregarding my incorrect
and irrelevant suspicions from last year, and instead taking the updated
and corrected information about float128 defaults into account?

Thanks,

>> or the very old default CPU.

> powerpc-linux uses 603, introduced at the same time as 604 (in 1994),
> which is what vxworks appears to use.  It has all the same features.

Yup, this was another incorrect suspicion of mine, based on another
piece of irrelevant information.

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


Re: [PATCH v1] libstdc++: Default to mutex-based atomics on RISC-V

2022-04-14 Thread Jonathan Wakely via Gcc-patches
On Thu, 14 Apr 2022 at 16:24, Palmer Dabbelt  wrote:
>
> On Thu, 14 Apr 2022 08:22:05 PDT (-0700), jwak...@redhat.com wrote:
> > On Thu, 14 Apr 2022 at 16:18, Palmer Dabbelt wrote:
> >>
> >> On Thu, 14 Apr 2022 08:08:17 PDT (-0700), jwak...@redhat.com wrote:
> >> > On 07/04/22 11:46 -0700, Palmer Dabbelt wrote:
> >> >>The RISC-V port requires libatomic to be linked in order to resolve
> >> >>various atomic functions, which results in builds that have
> >> >>"--with-libstdcxx-lock-policy=auto" defaulting to mutex-based locks.
> >> >>Changing this to direct atomics breaks the ABI, this forces the auto
> >> >>detection mutex-based atomics on RISC-V in order to avoid a silent ABI
> >> >>break for users.
> >> >>
> >> >>See Bug 84568 for more discussion.  In the long run there may be a way
> >> >>to get the higher-performance atomics without an ABI flag day, but
> >> >>that's going to be a much more complicated operation.  We don't even
> >> >>have support for the inline atomics yet, but given that some folks have
> >> >>been discussing hacks to make these libatomic routines appear implicitly
> >> >>it seems prudent to just turn off the automatic detection for RISC-V.
> >> >>
> >> >>libstdc++-v3/ChangeLog
> >> >>
> >> >>  * acinclude.md (GLIBCXX_ENABLE_LOCK_POLICY): Force auto to mutex
> >> >>for RISC-V.
> >> >
> >> > As documented at https://gcc.gnu.org/lists.html all patches for
> >> > libstdc++ need to go to the libstdc++ list as well as gcc-patches
> >> > (otherwise I won't see them).
> >>
> >> Thanks, I'll try to remember to look next time.
> >>
> >> > We'd usually do something like:
> >> >
> >> > case "${host}" in
> >> >*-*-riscv) libstdcxx_atomic_lock_policy=mutex ;;
> >> >*-*-*) AC_TRY_COMPILE([ ... ],,[],[])
> >> > esac
> >> >
> >> > but this way is simpler. If we add more customization for other
> >> > targets we can reconsider using the 'case "${host}"' form.
> >>
> >> Ya, that's kind of where I came to as well -- the proper autoconf flavor
> >> would scale way better, but hopefully nobody else makes this mistake and
> >> thus we don't need to worry about that.
> >
> > 
> >
> >> I'm fine with either way (though I think we'd need a "riscv*" there, to
> >> match riscv32 and riscv64?), so if you want to swap it over (or have me
> >> re-spin this) it's no big deal on my end -- also fine, as per below,
> >> with you just committing this ;)
> >
> > Yeah, I figured *-*-riscv probably wasn't right, so that's another
> > reason to prefer your approach.
> >
> >
> >>
> >> > So this is OK for trunk, modulo regenerating libstdc++-v3/configure
> >> > with this change. Let me know if you want me to do that regen for you
> >> > (or commit the whole thing for you).
> >>
> >> That'd be great, thanks!  It usually takes me a while to get all the
> >> autotools versions lined up (we just got new machines at the office),
> >> that way I won't have to do so.
> >
> > No problem, I can regen+push for you.
>
> Great, thanks!

Pushed as r12-8161-g3fc22eedb033cb



[committed] libstdc++: Fix incorrect IS number in doc comment

2022-04-14 Thread Jonathan Wakely via Gcc-patches
libstdc++-v3/ChangeLog:

* doc/xml/manual/intro.xml: Fix comment.
---
 libstdc++-v3/doc/xml/manual/intro.xml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libstdc++-v3/doc/xml/manual/intro.xml 
b/libstdc++-v3/doc/xml/manual/intro.xml
index 86ed6964b6a..548b632b6e4 100644
--- a/libstdc++-v3/doc/xml/manual/intro.xml
+++ b/libstdc++-v3/doc/xml/manual/intro.xml
@@ -55,7 +55,7 @@
 http://www.w3.org/2001/XInclude; parse="xml" 
href="status_cxxtr24733.xml">
 
 
-
+
 http://www.w3.org/2001/XInclude; parse="xml" 
href="status_cxxis29124.xml">
 
   
-- 
2.34.1



Re: ppc: testsuite: bfp: enable float128 for __ieee128

2022-04-14 Thread Segher Boessenkool
On Wed, Apr 13, 2022 at 08:58:18PM -0300, Alexandre Oliva wrote:
> 
> Multiple bfp tests expect -mcpu=power[89] to enable the __ieee128
> type, but on targets that #define TARGET_FLOAT128_ENABLE_TYPE 0, that
> is not the case.  Since they are compile-time tests and effective
> target support for powerpc_p9vector_ok is required, adding -mfloat128
> is safe.  This patch does so, and arranges for the warning raised on
> such systems to be pruned.
> 
> Tested on x86_64-linux-gnu x ppc64-vx7r2 with gcc-11.  Ok to install?

Not okay.  We rely everywhere else on -mcpu=power9 (etc.) to enable VSX
just as well.

You should not change such defaults for your platform.


Segher


Re: [PATCH] libstdc++: Update incorrect statement about mainline in docs

2022-04-14 Thread Jonathan Wakely via Gcc-patches
On Thu, 14 Apr 2022 at 11:55, Richard Biener  wrote:
>
> On Thu, 14 Apr 2022, Jonathan Wakely wrote:
>
> > On Thu, 14 Apr 2022 at 11:36, Richard Biener  wrote:
> > >
> > > On Thu, 14 Apr 2022, Jonathan Wakely wrote:
> > >
> > > > This fixes some misleading text in the libstdc++ manual that says the
> > > > docs for the gcc-11 branch refer to mainline.
> > > >
> > > > Richi, is this OK for the gcc-11 branch now? It's been wrong for 11.1
> > > > and 11.2, but it would still be nice to fix.
> > >
> > > Yes, it's OK.  I notice the same problem exists on the GCC 10 branch
> > > but GCC 9 at least mentions GCC 9 once ;)
> >
> > Yes, I fixed it for gcc-9.3.0, but forgot to do it for gcc-10 and gcc-11.
> >
> > I pushed r10-10534 to fix gcc-10 (since that's open for doc changes)
> > and have now pushed r11-9881
> > as well.
> >
> > Maybe this year I'll remember to do it for gcc-12 after we branch from 
> > trunk!
>
> Add an entry to branching.html!

Like this? OK for wwwdocs?
commit 7789dd25ce48039d9b6459340051e403f1f8b309
Author: Jonathan Wakely 
Date:   Thu Apr 14 17:08:34 2022 +0100

Update branching.html with reminder for libstdc++ docs

diff --git a/htdocs/branching.html b/htdocs/branching.html
index f30a47e1..ff28eb51 100644
--- a/htdocs/branching.html
+++ b/htdocs/branching.html
@@ -40,6 +40,9 @@ git push origin tag basepoints/gcc-11
 for the next major release in the wwwdocs repository and
 populate it with initial copies of changes.html and
 criteria.html.
+
+Libstdc++ maintainers should update references to "mainline GCC" in
+libstdc++-v3/doc/xml/manual/status_cxx*.xml.
 
 
 Web Site Updates


Re: enable __ieee128 for p9vector tests

2022-04-14 Thread Segher Boessenkool
On Wed, Apr 13, 2022 at 08:37:40PM -0300, Alexandre Oliva wrote:
> On Apr 17, 2021, Alexandre Oliva  wrote:
> > On Apr 12, 2021, Segher Boessenkool  wrote:
> My supposition was wrong.  It turned out to be just because in
> vxworks.h, for TARGET_VXWORKS7, there's:
> 
> #define TARGET_FLOAT128_ENABLE_TYPE 0

This is the default as well, so what vsworks.h does is a no-op.

> This disables TARGET_FLOAT128_TYPE by default, and causes the warning to
> be issued when -mfloat128 is explicitly enabled.
> 
> So ping https://gcc.gnu.org/pipermail/gcc-patches/2021-April/567630.html

NAK on that patch, as explained upthread.

> upthread, and expect 3 new patches related with -mfloat128 momentarily.

Thanks,


Segher


[Patch] OpenMP, libgomp, gimple: omp_get_max_teams, omp_set_num_teams, and omp_{gs}et_teams_thread_limit on offload devices

2022-04-14 Thread Marcel Vollweiler

Hi,

This patch adds support for omp_get_max_teams, omp_set_num_teams, and
omp_{gs}et_teams_thread_limit on offload devices.

The patch builds on the following patches which are submitted, but not yet
approved/committed:
- [PATCH] OpenMP, libgomp: Environment variable syntax extension.
https://gcc.gnu.org/pipermail/gcc-patches/2022-January/588728.html
- [PATCH] OpenMP, libgomp: Add new runtime routine omp_get_mapped_ptr.
https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591556.html

The OpenMP runtime routines omp_get_max_teams, omp_set_num_teams, and
omp_{gs}et_teams_thread_limit were introduced in OpenMP 5.1 and where already
implemented for the host usage with patch
https://gcc.gnu.org/pipermail/gcc-patches/2021-October/581283.html

The new patch extends the functionality of these OpenMP runtime routines by the
usage also on the device, i.e. device-specific values for nteams-var and
teams-thread-limit-var ICVs can now be retrieved and set also on the device. The
updated number of teams/threads are then used when launching the kernel.

The following main aspects are considered:
(a) Implemented the functions in the according icv-device files.
(b) Added structures to not only store initial device-specific values (they have
to be kept for omp_display_env) but also device-specific ICV values that can be
changed on the device at runtime.
(c) Changed the gimplification:
(c.1) Introduced integer_minus_two_node.
(c.2) For target regions that do not include teams constructs, now the clause
num_teams(-2) is added instead num_teams(1). This was necessary as num_teams(1)
is ambigious: it can also mean that a teams construct with explicit num_teams(1)
clause was specified inside the target region. The disambiguation is needed in
order to choose the correct thread limit: teams-thread-limit-var is only
intended for teams constructs such that if there is no teams construct, then the
number of threads is limited by thread-limit-var.
(d) Extend GOMP_target_ext. The host needs to set the device-specific ICV values
before the kernel is launched. The number of teams and threads are members of
the args list and are modified when no value was specified in an explicit clause
and the computation of the value was not postponed due to mapped variables.
(d.1) The arguments list is copied in order to guarantee immutability.
(e) Added copy back mechanism for ICVs which are modified on the device. The
only way to change device-specific ICVs is to do it on the device. As the
device-specific values are sometimes needed also on the host when the kernel is
launched (particularly number of teams and threads) they have to be copied back.

The patch was tested on x86_64-linux with nvptx and gcn offloading. All with no
regressions.

Marcel
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
gcc/ChangeLog:

* gimplify.cc (optimize_target_teams): Changed integer_one_node to
integer_minus_two_node in case of non-existing teams construct in target
region due to disambiguation. Previously, num_teams(1) was used as
clause on the target construct when (a) no teams construct exists in the
target region or (b) a teams construct with explicit num_teams(1)
clause was specified.
* tree-core.h (enum tree_index): Added TI_INTEGER_MINUS_TWO.
* tree.cc (build_common_tree_nodes): Added integer_minus_two_node.
* tree.h (integer_minus_two_node): Likewise.

libgomp/ChangeLog:

* config/gcn/icv-device.c (omp_set_num_teams): Added.
(omp_get_teams_thread_limit): Added.
(omp_set_teams_thread_limit): Added.
(ialias): Added for omp_set_num_teams and omp_{gs}et_teams_thread_limit.
* config/nvptx/icv-device.c (omp_set_num_teams): Likewise.
(omp_get_teams_thread_limit): Likewise.
(omp_set_teams_thread_limit): Likewise.
(ialias): Likewise.
* env.c (struct gomp_default_icv_t): Added to hold default ICV values.
(struct gomp_icv_list): Removed static.
(omp_display_env): Renaming of used lists.
(add_device_specific_icv): Removed static.
(gomp_add_device_specific_icv): Removed static.
(parse_device_specific): Renaming of used lists and added storing of
parsed values in lists of modifiable ICV values. 
* icv-device.c (omp_set_num_teams): Added.
(ialias): Added for omp_set_num_teams and omp_{gs}et_teams_thread_limit.
(omp_get_teams_thread_limit): Added.
(omp_set_teams_thread_limit): Added.
* icv.c (omp_set_num_teams): Removed.
(omp_set_teams_thread_limit): Removed.
(omp_get_teams_thread_limit): Removed.
(ialias): Removed for omp_set_num_teams and
omp_{gs}et_teams_thread_limit.
* libgomp-plugin.h 

Re: enable __ieee128 for p9vector tests

2022-04-14 Thread Segher Boessenkool
Hi!

On Sat, Apr 17, 2021 at 06:19:02AM -0300, Alexandre Oliva wrote:
> On Apr 12, 2021, Segher Boessenkool  wrote:
> > On Fri, Apr 02, 2021 at 01:52:59PM -0300, Alexandre Oliva wrote:
> >> Several compile tests that use the __ieee128 type do not ensure it is
> >> defined.  This patch adds -mfloat128 to their command lines, and
> >> disregards the warning that may be issued by it.
> 
> > But they do make sure it is defined, they use -mcpu=power9 (etc.).  What
> > is different in your setup that that does not work?
> 
> I suppose it's either -mno-altivec -mno-vsx in our self-specs,

Yes, that is a problem.  None of our testcases are set up for compilers
with weird defaults (and this is not specific to rs6000).

I do not want to change many thousands of test cases to not use defaults
anymore, to specify everything everywhere instead :-(  This would make
things more unmaintainable than they already are.

> or the very old default CPU.

powerpc-linux uses 603, introduced at the same time as 604 (in 1994),
which is what vxworks appears to use.  It has all the same features.

> I imagine it's also possible that the issue,
> initially observed with GCC 10, is different or absent with the trunk.
> 
> I started trying to figure out what led __ieee128 to not be enabled
> there, back then, but decided it was not so important, given that other
> tests used this flag explicitly, and that it wouldn't hurt to have it
> even if it wasn't always necessary.

GCC for PowerPC does not currently support IEEE QP float on CPUs without
VSX.  Other than that, it should work (but no doubt there still are
problems).


Segher


Re: [PATCH] libstdc++: Add pretty printer for std::span

2022-04-14 Thread Jonathan Wakely via Gcc-patches
On Mon, 4 Apr 2022 at 11:54, Philipp Fent via Libstdc++
 wrote:
>
> This improves the debug output for C++20 spans.
> Before:
> {static extent = 18446744073709551615, _M_ptr = 0x7fffb9a8,
> _M_extent = {_M_extent_value = 2}}
> Now with StdSpanPrinter:
> std::span of length 2 = {1, 2}



[PATCH] ppc: testsuite: pr79004 needs -mlong-double-128 (was: Re: ppc: testsuite: prune float128 partial support warnings)

2022-04-14 Thread Alexandre Oliva via Gcc-patches
On Apr 13, 2022, Alexandre Oliva  wrote:

>   * gcc.target/powerpc/pr79004.c: Prune the -mfloat128 warning.

I failed to mention that this fixed a problem in the test, but that was
not enough for this test to pass; here's an incremental patch that is.


Some of the asm opcodes expected by pr79004 depend on
-mlong-double-128 to be output.  E.g., without this flag, the
conditions of patterns @extenddf2 and extendsf2 do not
hold, and so GCC resorts to libcalls instead of even trying
rs6000_expand_float128_convert.

Perhaps the conditions are too strict, and they could enable the use
of conversion insns involving __ieee128/_Float128 even with 64-bit
long doubles.  Alas, for now, we need this flag for the test to pass
on target variants that use 64-bit long doubles.

Tested on x86_64-linux-gnu x ppc64-vx7r2 with gcc-11.  Ok to install?


for  gcc/testsuite/ChangeLog

* gcc.target/powerpr/pr79004.c: Add -mlong-double-128.
---
 gcc/testsuite/gcc.target/powerpc/pr79004.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/pr79004.c 
b/gcc/testsuite/gcc.target/powerpc/pr79004.c
index e411702dc98a9..061a0e83fe2ad 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr79004.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr79004.c
@@ -1,6 +1,6 @@
 /* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
 /* { dg-require-effective-target powerpc_p9vector_ok } */
-/* { dg-options "-mdejagnu-cpu=power9 -O2 -mfloat128" } */
+/* { dg-options "-mdejagnu-cpu=power9 -O2 -mfloat128 -mlong-double-128" } */
 /* { dg-prune-output ".-mfloat128. option may not be fully supported" } */
 
 #include 


-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


Re: [PATCH] libstdc++: Avoid double-deref of __first in ranges::minmax [PR104858]

2022-04-14 Thread Jonathan Wakely via Gcc-patches
On Thu, 14 Apr 2022 at 16:21, Patrick Palka via Libstdc++
 wrote:
>
> Tested on x86_64-pc-linux-gnu, does this look OK for trunk and 11/10
> once the branch is unfrozen?
>
> PR libstdc++/104858
>
> libstdc++-v3/ChangeLog:
>
> * include/bits/ranges_algo.h (__minmax_fn): Avoid dereferencing
> __first twice at the start.
> * testsuite/25_algorithms/minmax/constrained.cc (test06): New test.
> ---
>  libstdc++-v3/include/bits/ranges_algo.h   |  2 +-
>  .../25_algorithms/minmax/constrained.cc   | 23 +++
>  2 files changed, 24 insertions(+), 1 deletion(-)
>
> diff --git a/libstdc++-v3/include/bits/ranges_algo.h 
> b/libstdc++-v3/include/bits/ranges_algo.h
> index 62dc605080a..3d30fb1428c 100644
> --- a/libstdc++-v3/include/bits/ranges_algo.h
> +++ b/libstdc++-v3/include/bits/ranges_algo.h
> @@ -3084,7 +3084,7 @@ namespace ranges
> auto __last = ranges::end(__r);
> __glibcxx_assert(__first != __last);
> auto __comp_proj = __detail::__make_comp_proj(__comp, __proj);
> -   minmax_result> __result = {*__first, *__first};
> +   minmax_result> __result = {*__first, 
> __result.min};

Clever ... I'm surprised this even works. I would have expected it to
evaluate both initializers before actually initializing the members.
TIL.

OK for trunk now, and branches once thawed.


> if (++__first == __last)
>   return __result;
> else
> diff --git a/libstdc++-v3/testsuite/25_algorithms/minmax/constrained.cc 
> b/libstdc++-v3/testsuite/25_algorithms/minmax/constrained.cc
> index 90882afb6d0..306c495babe 100644
> --- a/libstdc++-v3/testsuite/25_algorithms/minmax/constrained.cc
> +++ b/libstdc++-v3/testsuite/25_algorithms/minmax/constrained.cc
> @@ -129,6 +129,28 @@ test05()
>VERIFY( result.min == "a"s && result.max == "c"s );
>  }
>
> +struct A {
> +  A() = default;
> +  A(const A&) = default;
> +  A(A&&) { ++move_count; }
> +  A& operator=(const A&) = default;
> +  A& operator=(A&&) = default;
> +  friend auto operator<=>(const A&, const A&) = default;
> +  static inline int move_count = 0;
> +};
> +
> +void
> +test06()
> +{
> +  // PR libstdc++/104858
> +  // Verify ranges::minmax doesn't dereference the iterator for the first
> +  // element in the range twice.
> +  A a;
> +  ranges::subrange r = {std::move_iterator(), std::move_sentinel( + 1)};
> +  ranges::minmax(r);
> +  VERIFY( A::move_count == 1 );
> +}
> +
>  int
>  main()
>  {
> @@ -137,4 +159,5 @@ main()
>test03();
>test04();
>test05();
> +  test06();
>  }
> --
> 2.36.0.rc2.10.g1ac7422e39
>



Re: [PATCH v1] libstdc++: Default to mutex-based atomics on RISC-V

2022-04-14 Thread Palmer Dabbelt

On Thu, 14 Apr 2022 08:22:05 PDT (-0700), jwak...@redhat.com wrote:

On Thu, 14 Apr 2022 at 16:18, Palmer Dabbelt wrote:


On Thu, 14 Apr 2022 08:08:17 PDT (-0700), jwak...@redhat.com wrote:
> On 07/04/22 11:46 -0700, Palmer Dabbelt wrote:
>>The RISC-V port requires libatomic to be linked in order to resolve
>>various atomic functions, which results in builds that have
>>"--with-libstdcxx-lock-policy=auto" defaulting to mutex-based locks.
>>Changing this to direct atomics breaks the ABI, this forces the auto
>>detection mutex-based atomics on RISC-V in order to avoid a silent ABI
>>break for users.
>>
>>See Bug 84568 for more discussion.  In the long run there may be a way
>>to get the higher-performance atomics without an ABI flag day, but
>>that's going to be a much more complicated operation.  We don't even
>>have support for the inline atomics yet, but given that some folks have
>>been discussing hacks to make these libatomic routines appear implicitly
>>it seems prudent to just turn off the automatic detection for RISC-V.
>>
>>libstdc++-v3/ChangeLog
>>
>>  * acinclude.md (GLIBCXX_ENABLE_LOCK_POLICY): Force auto to mutex
>>for RISC-V.
>
> As documented at https://gcc.gnu.org/lists.html all patches for
> libstdc++ need to go to the libstdc++ list as well as gcc-patches
> (otherwise I won't see them).

Thanks, I'll try to remember to look next time.

> We'd usually do something like:
>
> case "${host}" in
>*-*-riscv) libstdcxx_atomic_lock_policy=mutex ;;
>*-*-*) AC_TRY_COMPILE([ ... ],,[],[])
> esac
>
> but this way is simpler. If we add more customization for other
> targets we can reconsider using the 'case "${host}"' form.

Ya, that's kind of where I came to as well -- the proper autoconf flavor
would scale way better, but hopefully nobody else makes this mistake and
thus we don't need to worry about that.





I'm fine with either way (though I think we'd need a "riscv*" there, to
match riscv32 and riscv64?), so if you want to swap it over (or have me
re-spin this) it's no big deal on my end -- also fine, as per below,
with you just committing this ;)


Yeah, I figured *-*-riscv probably wasn't right, so that's another
reason to prefer your approach.




> So this is OK for trunk, modulo regenerating libstdc++-v3/configure
> with this change. Let me know if you want me to do that regen for you
> (or commit the whole thing for you).

That'd be great, thanks!  It usually takes me a while to get all the
autotools versions lined up (we just got new machines at the office),
that way I won't have to do so.


No problem, I can regen+push for you.


Great, thanks!


Re: [PATCH v1] libstdc++: Default to mutex-based atomics on RISC-V

2022-04-14 Thread Jonathan Wakely via Gcc-patches
On Thu, 14 Apr 2022 at 16:18, Palmer Dabbelt wrote:
>
> On Thu, 14 Apr 2022 08:08:17 PDT (-0700), jwak...@redhat.com wrote:
> > On 07/04/22 11:46 -0700, Palmer Dabbelt wrote:
> >>The RISC-V port requires libatomic to be linked in order to resolve
> >>various atomic functions, which results in builds that have
> >>"--with-libstdcxx-lock-policy=auto" defaulting to mutex-based locks.
> >>Changing this to direct atomics breaks the ABI, this forces the auto
> >>detection mutex-based atomics on RISC-V in order to avoid a silent ABI
> >>break for users.
> >>
> >>See Bug 84568 for more discussion.  In the long run there may be a way
> >>to get the higher-performance atomics without an ABI flag day, but
> >>that's going to be a much more complicated operation.  We don't even
> >>have support for the inline atomics yet, but given that some folks have
> >>been discussing hacks to make these libatomic routines appear implicitly
> >>it seems prudent to just turn off the automatic detection for RISC-V.
> >>
> >>libstdc++-v3/ChangeLog
> >>
> >>  * acinclude.md (GLIBCXX_ENABLE_LOCK_POLICY): Force auto to mutex
> >>for RISC-V.
> >
> > As documented at https://gcc.gnu.org/lists.html all patches for
> > libstdc++ need to go to the libstdc++ list as well as gcc-patches
> > (otherwise I won't see them).
>
> Thanks, I'll try to remember to look next time.
>
> > We'd usually do something like:
> >
> > case "${host}" in
> >*-*-riscv) libstdcxx_atomic_lock_policy=mutex ;;
> >*-*-*) AC_TRY_COMPILE([ ... ],,[],[])
> > esac
> >
> > but this way is simpler. If we add more customization for other
> > targets we can reconsider using the 'case "${host}"' form.
>
> Ya, that's kind of where I came to as well -- the proper autoconf flavor
> would scale way better, but hopefully nobody else makes this mistake and
> thus we don't need to worry about that.



> I'm fine with either way (though I think we'd need a "riscv*" there, to
> match riscv32 and riscv64?), so if you want to swap it over (or have me
> re-spin this) it's no big deal on my end -- also fine, as per below,
> with you just committing this ;)

Yeah, I figured *-*-riscv probably wasn't right, so that's another
reason to prefer your approach.


>
> > So this is OK for trunk, modulo regenerating libstdc++-v3/configure
> > with this change. Let me know if you want me to do that regen for you
> > (or commit the whole thing for you).
>
> That'd be great, thanks!  It usually takes me a while to get all the
> autotools versions lined up (we just got new machines at the office),
> that way I won't have to do so.

No problem, I can regen+push for you.



[PATCH] libstdc++: Avoid double-deref of __first in ranges::minmax [PR104858]

2022-04-14 Thread Patrick Palka via Gcc-patches
Tested on x86_64-pc-linux-gnu, does this look OK for trunk and 11/10
once the branch is unfrozen?

PR libstdc++/104858

libstdc++-v3/ChangeLog:

* include/bits/ranges_algo.h (__minmax_fn): Avoid dereferencing
__first twice at the start.
* testsuite/25_algorithms/minmax/constrained.cc (test06): New test.
---
 libstdc++-v3/include/bits/ranges_algo.h   |  2 +-
 .../25_algorithms/minmax/constrained.cc   | 23 +++
 2 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/bits/ranges_algo.h 
b/libstdc++-v3/include/bits/ranges_algo.h
index 62dc605080a..3d30fb1428c 100644
--- a/libstdc++-v3/include/bits/ranges_algo.h
+++ b/libstdc++-v3/include/bits/ranges_algo.h
@@ -3084,7 +3084,7 @@ namespace ranges
auto __last = ranges::end(__r);
__glibcxx_assert(__first != __last);
auto __comp_proj = __detail::__make_comp_proj(__comp, __proj);
-   minmax_result> __result = {*__first, *__first};
+   minmax_result> __result = {*__first, 
__result.min};
if (++__first == __last)
  return __result;
else
diff --git a/libstdc++-v3/testsuite/25_algorithms/minmax/constrained.cc 
b/libstdc++-v3/testsuite/25_algorithms/minmax/constrained.cc
index 90882afb6d0..306c495babe 100644
--- a/libstdc++-v3/testsuite/25_algorithms/minmax/constrained.cc
+++ b/libstdc++-v3/testsuite/25_algorithms/minmax/constrained.cc
@@ -129,6 +129,28 @@ test05()
   VERIFY( result.min == "a"s && result.max == "c"s );
 }
 
+struct A {
+  A() = default;
+  A(const A&) = default;
+  A(A&&) { ++move_count; }
+  A& operator=(const A&) = default;
+  A& operator=(A&&) = default;
+  friend auto operator<=>(const A&, const A&) = default;
+  static inline int move_count = 0;
+};
+
+void
+test06()
+{
+  // PR libstdc++/104858
+  // Verify ranges::minmax doesn't dereference the iterator for the first
+  // element in the range twice.
+  A a;
+  ranges::subrange r = {std::move_iterator(), std::move_sentinel( + 1)};
+  ranges::minmax(r);
+  VERIFY( A::move_count == 1 );
+}
+
 int
 main()
 {
@@ -137,4 +159,5 @@ main()
   test03();
   test04();
   test05();
+  test06();
 }
-- 
2.36.0.rc2.10.g1ac7422e39



Re: [PATCH v1] libstdc++: Default to mutex-based atomics on RISC-V

2022-04-14 Thread Palmer Dabbelt

On Thu, 14 Apr 2022 08:08:17 PDT (-0700), jwak...@redhat.com wrote:

On 07/04/22 11:46 -0700, Palmer Dabbelt wrote:

The RISC-V port requires libatomic to be linked in order to resolve
various atomic functions, which results in builds that have
"--with-libstdcxx-lock-policy=auto" defaulting to mutex-based locks.
Changing this to direct atomics breaks the ABI, this forces the auto
detection mutex-based atomics on RISC-V in order to avoid a silent ABI
break for users.

See Bug 84568 for more discussion.  In the long run there may be a way
to get the higher-performance atomics without an ABI flag day, but
that's going to be a much more complicated operation.  We don't even
have support for the inline atomics yet, but given that some folks have
been discussing hacks to make these libatomic routines appear implicitly
it seems prudent to just turn off the automatic detection for RISC-V.

libstdc++-v3/ChangeLog

* acinclude.md (GLIBCXX_ENABLE_LOCK_POLICY): Force auto to mutex
  for RISC-V.


As documented at https://gcc.gnu.org/lists.html all patches for
libstdc++ need to go to the libstdc++ list as well as gcc-patches
(otherwise I won't see them).


Thanks, I'll try to remember to look next time.


We'd usually do something like:

case "${host}" in
   *-*-riscv) libstdcxx_atomic_lock_policy=mutex ;;
   *-*-*) AC_TRY_COMPILE([ ... ],,[],[])
esac

but this way is simpler. If we add more customization for other
targets we can reconsider using the 'case "${host}"' form.


Ya, that's kind of where I came to as well -- the proper autoconf flavor 
would scale way better, but hopefully nobody else makes this mistake and 
thus we don't need to worry about that.


I'm fine with either way (though I think we'd need a "riscv*" there, to 
match riscv32 and riscv64?), so if you want to swap it over (or have me 
re-spin this) it's no big deal on my end -- also fine, as per below, 
with you just committing this ;)



So this is OK for trunk, modulo regenerating libstdc++-v3/configure
with this change. Let me know if you want me to do that regen for you
(or commit the whole thing for you).


That'd be great, thanks!  It usually takes me a while to get all the 
autotools versions lined up (we just got new machines at the office), 
that way I won't have to do so.






---

I haven't even built this one, as I'm sure there's a better way to do it
then sticking some more C code in there.
---
libstdc++-v3/acinclude.m4 | 3 +++
1 file changed, 3 insertions(+)

diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
index f53461c85a5..945c0c66f8d 100644
--- a/libstdc++-v3/acinclude.m4
+++ b/libstdc++-v3/acinclude.m4
@@ -3612,6 +3612,9 @@ AC_DEFUN([GLIBCXX_ENABLE_LOCK_POLICY], [
dnl Why don't we check 8-byte CAS for sparc64, where _Atomic_word is long?!
dnl New targets should only check for CAS for the _Atomic_word type.
AC_TRY_COMPILE([
+#if defined __riscv
+# error "Defaulting to mutex-based locks for ABI compatibility"
+#endif
#if ! defined __GCC_HAVE_SYNC_COMPARE_AND_SWAP_2
# error "No 2-byte compare-and-swap"
#elif ! defined __GCC_HAVE_SYNC_COMPARE_AND_SWAP_4


Re: [PATCH] libstdc++: Optimize integer std::from_chars

2022-04-14 Thread Jonathan Wakely via Gcc-patches
On Thu, 14 Apr 2022 at 13:32, Patrick Palka via Libstdc++
 wrote:
>
> This applies the following optimizations to the integer std::from_chars
> implementation:
>
>   1. Use a lookup table for converting an alphanumeric digit to its
>  base-36 value instead of using a range test (for 0-9) and switch
>  (for a-z and A-Z).  The table is constructed using a C++14
>  constexpr function which doesn't assume a particular character
>  encoding or __CHAR_BIT__ value.  The new conversion function
>  __from_chars_alnum_to_val is templated on whether we care
>  only about the decimal digits, in which case we can perform the
>  conversion with a single subtraction since the digit characters
>  are guaranteed to be contiguous (unlike the letters).
>   2. Generalize __from_chars_binary to handle all power-of-two bases.
>  This function, now named __from_chars_pow2_base, is also templated
>  on whether we care only about the decimal digits in order to speed
>  up digit conversion for base 2, 4 and 8.
>   3. In __from_chars_digit, use
>static_cast(__c - '0') < __base
>  instead of
>'0' <= __c && __c <= ('0' + (__base - 1)).
>  as the digit recognition test (exhaustively verified that the two
>  tests are equivalent).
>   4. In __from_chars_alnum, use a nested loop to consume the rest of the
>  digits in the overflow case (mirroring __from_chars_digit) so that
>  the main loop doesn't have to maintain the __valid overflow flag.
>
> At this point, __from_chars_digit is nearly identical to
> __from_chars_alnum, so this patch combines the two functions, removing
> the former and templatizing the latter according to whether we care only
> about the decimal digits.  Finally,
>
>   5. In __from_chars_alnum, keep track of a lower bound on the number of
>  unused bits in the result and use that to omit the overflow check
>  when it's safe to do so.
>
> In passing this replaces the non-portable function ascii_to_hexit
> used by __floating_from_chars_hex with the new conversion function.
>
> Here are some runtime measurements for a simple 15-line benchmark that
> roundtrips printing/parsing 200 million integers via std::to/from_chars
> (average of 5 runs):
>
>   Base  Before  After (seconds, lower is better)
>  29.37   9.37
>  3   12.13  15.79
>  83.67   4.15
> 103.86   4.90
> 115.03   6.84
> 162.93   4.14
> 322.39   3.85
> 363.26   5.22
>
> Testedon x86_64-pc-linux-gnu, does this look OK for trunk?  Also tested
> against libc++'s from_chars tests for good measure.
>
> libstdc++-v3/ChangeLog:
>
> * include/std/charconv (__from_chars_alnum_to_val_table): Define.
> (__from_chars_alnum_to_val): Define.
> (__from_chars_binary): Rename to ...
> (__from_chars_pow2_base): ... this.  Generalize to handle any
> power-of-two base using __from_chars_alnum_to_val.
> (__from_chars_digit): Optimize digit recognition to a single
> test instead of two tests.  Use [[__unlikely___]] attribute.
> (__from_chars_alpha_to_num): Remove.
> (__from_chars_alnum): Use __from_chars_alnum_to_val.  Use a
> nested loop for the overflow case.
> (from_chars): Adjust appropriately.
> * src/c++17/floating_from_chars.cc (ascii_to_hexit): Remove.
> (__floating_from_chars_hex): Use __from_chars_alnum_to_val
> to recognize a hex digit instead.
> ---
>  libstdc++-v3/include/std/charconv | 250 --
>  libstdc++-v3/src/c++17/floating_from_chars.cc |  18 +-
>  2 files changed, 105 insertions(+), 163 deletions(-)
>
> diff --git a/libstdc++-v3/include/std/charconv 
> b/libstdc++-v3/include/std/charconv
> index 2ce9c7d4cb9..5e44459749a 100644
> --- a/libstdc++-v3/include/std/charconv
> +++ b/libstdc++-v3/include/std/charconv
> @@ -407,176 +407,127 @@ namespace __detail
>return true;
>  }
>
> -  /// std::from_chars implementation for integers in base 2.
> -  template
> +  // Construct and return a lookup table that maps 0-9, A-Z and a-z to the
> +  // corresponding corresponding base-36 value and maps all other characters
> +  // to 127.
> +  constexpr auto
> +  __from_chars_alnum_to_val_table()
> +  {
> +constexpr unsigned char __lower_letters[]
> +  = { 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j',
> + 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't',
> + 'u', 'v', 'w', 'x', 'y', 'z' };
> +constexpr unsigned char __upper_letters[]
> +  = { 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J',
> + 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T',
> + 'U', 'V', 'W', 'X', 'Y', 'Z' };
> +struct { unsigned char __data[1u << __CHAR_BIT__] = {}; } __table;
> +for (auto& __entry : __table.__data)
> +  __entry = 127;
> +for (int __i = 0; __i < 10; ++__i)
> +  __table.__data['0' + __i] = __i;
> +for (int __i = 0; __i < 26; 

Re: [PATCH v1] libstdc++: Default to mutex-based atomics on RISC-V

2022-04-14 Thread Jonathan Wakely via Gcc-patches

On 07/04/22 11:46 -0700, Palmer Dabbelt wrote:

The RISC-V port requires libatomic to be linked in order to resolve
various atomic functions, which results in builds that have
"--with-libstdcxx-lock-policy=auto" defaulting to mutex-based locks.
Changing this to direct atomics breaks the ABI, this forces the auto
detection mutex-based atomics on RISC-V in order to avoid a silent ABI
break for users.

See Bug 84568 for more discussion.  In the long run there may be a way
to get the higher-performance atomics without an ABI flag day, but
that's going to be a much more complicated operation.  We don't even
have support for the inline atomics yet, but given that some folks have
been discussing hacks to make these libatomic routines appear implicitly
it seems prudent to just turn off the automatic detection for RISC-V.

libstdc++-v3/ChangeLog

* acinclude.md (GLIBCXX_ENABLE_LOCK_POLICY): Force auto to mutex
  for RISC-V.


As documented at https://gcc.gnu.org/lists.html all patches for
libstdc++ need to go to the libstdc++ list as well as gcc-patches
(otherwise I won't see them).

We'd usually do something like:

case "${host}" in
  *-*-riscv) libstdcxx_atomic_lock_policy=mutex ;;
  *-*-*) AC_TRY_COMPILE([ ... ],,[],[])
esac

but this way is simpler. If we add more customization for other
targets we can reconsider using the 'case "${host}"' form.

So this is OK for trunk, modulo regenerating libstdc++-v3/configure
with this change. Let me know if you want me to do that regen for you
(or commit the whole thing for you).



---

I haven't even built this one, as I'm sure there's a better way to do it
then sticking some more C code in there.
---
libstdc++-v3/acinclude.m4 | 3 +++
1 file changed, 3 insertions(+)

diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
index f53461c85a5..945c0c66f8d 100644
--- a/libstdc++-v3/acinclude.m4
+++ b/libstdc++-v3/acinclude.m4
@@ -3612,6 +3612,9 @@ AC_DEFUN([GLIBCXX_ENABLE_LOCK_POLICY], [
dnl Why don't we check 8-byte CAS for sparc64, where _Atomic_word is long?!
dnl New targets should only check for CAS for the _Atomic_word type.
AC_TRY_COMPILE([
+#if defined __riscv
+# error "Defaulting to mutex-based locks for ABI compatibility"
+#endif
#if ! defined __GCC_HAVE_SYNC_COMPARE_AND_SWAP_2
# error "No 2-byte compare-and-swap"
#elif ! defined __GCC_HAVE_SYNC_COMPARE_AND_SWAP_4




Re: [wwwdocs] Add Ada's changelog entry

2022-04-14 Thread Jonathan Wakely via Gcc-patches

On 05/04/22 06:05 +, Arnaud Charlet wrote:

Thank you for the feedback. Should I remove it and resuply the patch or
can you/GCC maintainers do the modification before merging?


Can you please resubmit it?

I'll let others comment on the need to sign a contributor agreement, my
understanding is that this is unavoidable, whether you're contributing
code or documentation doesn't change this need AFAIU.


Nothing is needed for changes which are not "legally significant
changes", see 
https://www.gnu.org/prep/maintain/html_node/Legally-Significant.html

For legally significant changes we require either a copyright
assignment to the FSF *or* DCO sign-off, which can be done by adding
yourself to the DCO section of the MAINTAINERS file, or as described at
https://gcc.gnu.org/dco.html



Re: [PATCH] fortran: use fpu-glibc on powerpc*-unknown-freebsd

2022-04-14 Thread Piotr Kubaj via Gcc-patches
On 22-04-14 09:05:17, FX wrote:
> Hi,
> 
> > can you check the following patch?
> 
> Why restrict it to powerpc-freebsd only, and not all freebsd? Do they differ?
amd64 and i386 on all systems use a different setting and are not affected.
For FreeBSD-supported architectures that are not amd64, i386 or powerpc*, there 
are also armv6, armv7, aarch64, riscv64 and riscv64sf.

However, GCC is currently not ported to riscv64 and riscv64sf (but it's likely 
affected as well).
aarch64 is confirmed to be affected (so armv6 and armv7 are probably also 
affected), but I don't have any way to test whether it works on aarch64 that 
way.

So currently limiting to powerpc*, but it will probably be extended to armv*, 
aarch64 and riscv64* in the future.

> Otherwise it looks ok to me, but probably should be tested on a glibc non-x86 
> target.
> 
> In any case, this will be for the new branch, when stage 1 reopens.
> 
> FX

-- 


signature.asc
Description: PGP signature


Re: [PATCH] tree-optimization/104010 - fix SLP scalar costing with patterns

2022-04-14 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> On Thu, 14 Apr 2022, Richard Sandiford wrote:
>
>> Richard Biener  writes:
>> > When doing BB vectorization the scalar cost compute is derailed
>> > by patterns, causing lanes to be considered live and thus not
>> > costed on the scalar side.  For the testcase in PR104010 this
>> > prevents vectorization which was done by GCC 11.  PR103941
>> > shows similar cases of missed optimizations that are fixed by
>> > this patch.
>> >
>> > Bootstrapped and tested on x86_64-unknown-linux-gnu.
>> >
>> > I'm only considering this now because PR104010 is identified
>> > as regression on arm - Richards, what do you think?  I do think
>> > this will enable vectorization of more stuff now which might
>> > be good or bad - who knowns, but at least it needs to involve
>> > patterns.
>> >
>> > Thanks,
>> > Richard.
>> >
>> > 2022-04-13  Richard Biener  
>> >
>> >PR tree-optimization/104010
>> >PR tree-optimization/103941
>> >* tree-vect-slp.cc (vect_bb_slp_scalar_cost): When
>> >we run into stmts in patterns continue walking those
>> >for uses outside of the vectorized region instead of
>> >marking the lane live.
>> >
>> >* gcc.target/i386/pr103941-1.c: New testcase.
>> >* gcc.target/i386/pr103941-2.c: Likewise.
>> > ---
>> >  gcc/testsuite/gcc.target/i386/pr103941-1.c | 14 +++
>> >  gcc/testsuite/gcc.target/i386/pr103941-2.c | 12 ++
>> >  gcc/tree-vect-slp.cc   | 47 --
>> >  3 files changed, 61 insertions(+), 12 deletions(-)
>> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr103941-1.c
>> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr103941-2.c
>> >
>> > diff --git a/gcc/testsuite/gcc.target/i386/pr103941-1.c 
>> > b/gcc/testsuite/gcc.target/i386/pr103941-1.c
>> > new file mode 100644
>> > index 000..524fdd0b4b1
>> > --- /dev/null
>> > +++ b/gcc/testsuite/gcc.target/i386/pr103941-1.c
>> > @@ -0,0 +1,14 @@
>> > +/* { dg-do compile } */
>> > +/* { dg-options "-O2 -msse2" } */
>> > +
>> > +unsigned char ur[16], ua[16], ub[16];
>> > +
>> > +void avgu_v2qi (void)
>> > +{
>> > +  int i;
>> > +
>> > +  for (i = 0; i < 2; i++)
>> > +ur[i] = (ua[i] + ub[i] + 1) >> 1;
>> > +}
>> > +
>> > +/* { dg-final { scan-assembler "pavgb" } } */
>> > diff --git a/gcc/testsuite/gcc.target/i386/pr103941-2.c 
>> > b/gcc/testsuite/gcc.target/i386/pr103941-2.c
>> > new file mode 100644
>> > index 000..972a32be997
>> > --- /dev/null
>> > +++ b/gcc/testsuite/gcc.target/i386/pr103941-2.c
>> > @@ -0,0 +1,12 @@
>> > +/* { dg-do compile } */
>> > +/* { dg-options "-O2 -msse2" } */
>> > +
>> > +void foo (int *c, float *x, float *y)
>> > +{
>> > +  c[0] = x[0] < y[0];
>> > +  c[1] = x[1] < y[1];
>> > +  c[2] = x[2] < y[2];
>> > +  c[3] = x[3] < y[3];
>> > +}
>> > +
>> > +/* { dg-final { scan-assembler "cmpltps" } } */
>> > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
>> > index 4ac2b70303c..c7687065374 100644
>> > --- a/gcc/tree-vect-slp.cc
>> > +++ b/gcc/tree-vect-slp.cc
>> > @@ -5185,22 +5185,45 @@ vect_bb_slp_scalar_cost (vec_info *vinfo,
>> > the scalar cost.  */
>> >if (!STMT_VINFO_LIVE_P (stmt_info))
>> >{
>> > -FOR_EACH_PHI_OR_STMT_DEF (def_p, orig_stmt, op_iter, SSA_OP_DEF)
>> > +auto_vec worklist;
>> > +hash_set *worklist_visited = NULL;
>> > +worklist.quick_push (orig_stmt);
>> > +do
>> >{
>> > -imm_use_iterator use_iter;
>> > -gimple *use_stmt;
>> > -FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, DEF_FROM_PTR (def_p))
>> > -  if (!is_gimple_debug (use_stmt))
>> > -{
>> > -  stmt_vec_info use_stmt_info = vinfo->lookup_stmt (use_stmt);
>> > -  if (!use_stmt_info
>> > -  || !vectorized_scalar_stmts.contains (use_stmt_info))
>> > +gimple *work_stmt = worklist.pop ();
>> > +FOR_EACH_PHI_OR_STMT_DEF (def_p, work_stmt, op_iter, SSA_OP_DEF)
>> > +  {
>> > +imm_use_iterator use_iter;
>> > +gimple *use_stmt;
>> > +FOR_EACH_IMM_USE_STMT (use_stmt, use_iter,
>> > +   DEF_FROM_PTR (def_p))
>> > +  if (!is_gimple_debug (use_stmt))
>> >  {
>> > -  (*life)[i] = true;
>> > -  break;
>> > +  stmt_vec_info use_stmt_info
>> > += vinfo->lookup_stmt (use_stmt);
>> > +  if (!use_stmt_info
>> > +  || !vectorized_scalar_stmts.contains 
>> > (use_stmt_info))
>> > +{
>> > +  if (STMT_VINFO_IN_PATTERN_P (use_stmt_info))
>> > +{
>> 
>> I guess I should walk through the testcase and figure it out for myself,
>> but: I assume vectorized_scalar_stmts exists because not every statement
>> we've considered vectorising has made the cut.  Isn't that also true
>> of (original) scalar statements that would have been vectorised 

[PATCH] rtl-optimization/105231 - distribute_notes and REG_EH_REGION

2022-04-14 Thread Richard Biener via Gcc-patches
The following mitigates a problem in combine distribute_notes which
places an original REG_EH_REGION based on only may_trap_p which is
good to test whether a non-call insn can possibly throw but not if
actually it does or we care.  That's something we decided at RTL
expansion time where we possibly still know the insn evaluates
to a constant.

In fact, the REG_EH_REGION can only come from the original i3 and
an assert is added to that effect.  That means we only need to
retain the note on i3 or, if that cannot trap, drop it but we
should never move it to i2.  If splitting of i3 ever becomes a
problem here the insn split should be rejected instead.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

OK?

2022-04-14  Richard Biener  

PR rtl-optimization/105231
* combine.cc (distribute_notes): Assert that a REG_EH_REGION
is from i3 and only keep it there or drop it if the insn
can not trap.

* gcc.dg/torture/pr105231.c: New testcase.
---
 gcc/combine.cc  | 12 +---
 gcc/testsuite/gcc.dg/torture/pr105231.c | 15 +++
 2 files changed, 20 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr105231.c

diff --git a/gcc/combine.cc b/gcc/combine.cc
index 53dcac92abc..ec53eda7595 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -14175,21 +14175,19 @@ distribute_notes (rtx notes, rtx_insn *from_insn, 
rtx_insn *i3, rtx_insn *i2,
  break;
 
case REG_EH_REGION:
+ /* A REG_EH_REGION note can only ever come from i3.  */
+ gcc_assert (from_insn == i3);
  /* These notes must remain with the call or trapping instruction.  */
  if (CALL_P (i3))
place = i3;
- else if (i2 && CALL_P (i2))
-   place = i2;
  else
{
  gcc_assert (cfun->can_throw_non_call_exceptions);
+ /* If i3 can still trap preserve the note, otherwise we've
+combined things such that we can now prove that the
+instructions can't trap.  Drop the note in this case.  */
  if (may_trap_p (i3))
place = i3;
- else if (i2 && may_trap_p (i2))
-   place = i2;
- /* ??? Otherwise assume we've combined things such that we
-can now prove that the instructions can't trap.  Drop the
-note in this case.  */
}
  break;
 
diff --git a/gcc/testsuite/gcc.dg/torture/pr105231.c 
b/gcc/testsuite/gcc.dg/torture/pr105231.c
new file mode 100644
index 000..50459219c08
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr105231.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target int32plus } */
+/* { dg-require-effective-target dfp } */
+/* { dg-additional-options "-fsanitize-coverage=trace-pc -fnon-call-exceptions 
--param=max-cse-insns=1 -frounding-math" } */
+/* { dg-additional-options "-mstack-arg-probe" { target x86_64-*-* i?86-*-* } 
} */
+
+void baz (int *);
+void bar (double, double, _Decimal64);
+
+void
+foo (void)
+{
+  int s __attribute__((cleanup (baz)));
+  bar (0xfffe, 0xebf3fff2fbebaf7f, 0xff);
+}
-- 
2.34.1


[PATCH] gimple-fold: fix further missing stmt locations [PR104308]

2022-04-14 Thread David Malcolm via Gcc-patches
PR analyzer/104308 initially reported about a
-Wanalyzer-use-of-uninitialized-value diagnostic using UNKNOWN_LOCATION
when complaining about certain memmove operations where the source
is uninitialized.

In r12-7856-g875342766d4298 I fixed the missing location for
a stmt generated by gimple_fold_builtin_memory_op, but the reporter
then found another way to generate such a stmt with UNKNOWN_LOCATION.

I've now gone through gimple_fold_builtin_memory_op looking at all
statement creation, and found three places in which a new statement
doesn't have a location set on it (either directly via
gimple_set_location, or indirectly via gsi_replace), one of which is
the new reproducer.

This patch adds a gimple_set_location to these three cases, and adds
test coverage for one of them (the third hunk within the patch), fixing
the new reproducer for PR analyzer/104308.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.

OK for trunk in stage 4?  Or in stage 1?

Thanks
Dave

gcc/ChangeLog:
PR analyzer/104308
* gimple-fold.cc (gimple_fold_builtin_memory_op): Explicitly set
the location of new_stmt in all places that don't already set it,
whether explicitly, or via a call to gsi_replace.

gcc/testsuite/ChangeLog:
PR analyzer/104308
* gcc.dg/analyzer/pr104308.c: Add test coverage.

Signed-off-by: David Malcolm 
---
 gcc/gimple-fold.cc   |  3 +++
 gcc/testsuite/gcc.dg/analyzer/pr104308.c | 13 -
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
index ac22adfd9b1..863ee3d3912 100644
--- a/gcc/gimple-fold.cc
+++ b/gcc/gimple-fold.cc
@@ -1048,6 +1048,7 @@ gimple_fold_builtin_memory_op (gimple_stmt_iterator *gsi,
  gsi_replace (gsi, new_stmt, false);
  return true;
}
+ gimple_set_location (new_stmt, loc);
  gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
  goto done;
}
@@ -1302,6 +1303,7 @@ gimple_fold_builtin_memory_op (gimple_stmt_iterator *gsi,
   new_stmt);
  gimple_assign_set_lhs (new_stmt, srcvar);
  gimple_set_vuse (new_stmt, gimple_vuse (stmt));
+ gimple_set_location (new_stmt, loc);
  gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
}
  new_stmt = gimple_build_assign (destvar, srcvar);
@@ -1338,6 +1340,7 @@ set_vop_and_replace:
  gsi_replace (gsi, new_stmt, false);
  return true;
}
+  gimple_set_location (new_stmt, loc);
   gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
 }
 
diff --git a/gcc/testsuite/gcc.dg/analyzer/pr104308.c 
b/gcc/testsuite/gcc.dg/analyzer/pr104308.c
index 9cd5ee6feee..a3a0cbb7317 100644
--- a/gcc/testsuite/gcc.dg/analyzer/pr104308.c
+++ b/gcc/testsuite/gcc.dg/analyzer/pr104308.c
@@ -1,8 +1,19 @@
+/* Verify that we have source locations for
+   -Wanalyzer-use-of-uninitialized-value warnings involving folded
+   memory ops.  */
+
 #include 
 
-int main()
+int test_memmove_within_uninit (void)
 {
   char s[5]; /* { dg-message "region created on stack here" } */
   memmove(s, s + 1, 2); /* { dg-warning "use of uninitialized value" } */
   return 0;
 }
+
+int test_memcpy_from_uninit (void)
+{
+  char a1[5];
+  char a2[5]; /* { dg-message "region created on stack here" } */
+  return (memcpy(a1, a2, 5) == a1); /* { dg-warning "use of uninitialized 
value" } */
+}
-- 
2.26.3



[committed] analyzer: fix ICE comparing VECTOR_CSTs [PR105252]

2022-04-14 Thread David Malcolm via Gcc-patches
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r12-8159-gb209a349268d24.

gcc/analyzer/ChangeLog:
PR analyzer/105252
* svalue.cc (cmp_cst): When comparing VECTOR_CSTs, compare the
types of the encoded elements before calling cmp_cst on them.

gcc/testsuite/ChangeLog:
PR analyzer/105252
* gcc.dg/analyzer/pr105252.c: New test.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/svalue.cc   | 13 ++---
 gcc/testsuite/gcc.dg/analyzer/pr105252.c | 20 
 2 files changed, 30 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr105252.c

diff --git a/gcc/analyzer/svalue.cc b/gcc/analyzer/svalue.cc
index 553edae7250..536bc288dbf 100644
--- a/gcc/analyzer/svalue.cc
+++ b/gcc/analyzer/svalue.cc
@@ -337,9 +337,16 @@ cmp_cst (const_tree cst1, const_tree cst2)
return cmp_nelts_per_pattern;
   unsigned encoded_nelts = vector_cst_encoded_nelts (cst1);
   for (unsigned i = 0; i < encoded_nelts; i++)
-   if (int el_cmp = cmp_cst (VECTOR_CST_ENCODED_ELT (cst1, i),
- VECTOR_CST_ENCODED_ELT (cst2, i)))
- return el_cmp;
+   {
+ const_tree elt1 = VECTOR_CST_ENCODED_ELT (cst1, i);
+ const_tree elt2 = VECTOR_CST_ENCODED_ELT (cst2, i);
+ int t1 = TYPE_UID (TREE_TYPE (elt1));
+ int t2 = TYPE_UID (TREE_TYPE (elt2));
+ if (int cmp_type = t1 - t2)
+   return cmp_type;
+ if (int el_cmp = cmp_cst (elt1, elt2))
+   return el_cmp;
+   }
   return 0;
 }
 }
diff --git a/gcc/testsuite/gcc.dg/analyzer/pr105252.c 
b/gcc/testsuite/gcc.dg/analyzer/pr105252.c
new file mode 100644
index 000..a093eababc5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/pr105252.c
@@ -0,0 +1,20 @@
+/* { dg-additional-options "-fnon-call-exceptions -O" } */
+
+typedef unsigned char C;
+typedef unsigned char __attribute__((__vector_size__ (4))) V;
+
+C m;
+
+static inline void
+bar (C c, V v, V *r)
+{
+  v %= (c | v) % m;
+  *r = v;
+}
+
+void
+foo (void)
+{
+  V x;
+  bar (0, (V){2}, );
+}
-- 
2.26.3



Re: [PATCH] tree-optimization/104010 - fix SLP scalar costing with patterns

2022-04-14 Thread Richard Biener via Gcc-patches
On Thu, 14 Apr 2022, Richard Sandiford wrote:

> Richard Biener  writes:
> > When doing BB vectorization the scalar cost compute is derailed
> > by patterns, causing lanes to be considered live and thus not
> > costed on the scalar side.  For the testcase in PR104010 this
> > prevents vectorization which was done by GCC 11.  PR103941
> > shows similar cases of missed optimizations that are fixed by
> > this patch.
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> >
> > I'm only considering this now because PR104010 is identified
> > as regression on arm - Richards, what do you think?  I do think
> > this will enable vectorization of more stuff now which might
> > be good or bad - who knowns, but at least it needs to involve
> > patterns.
> >
> > Thanks,
> > Richard.
> >
> > 2022-04-13  Richard Biener  
> >
> > PR tree-optimization/104010
> > PR tree-optimization/103941
> > * tree-vect-slp.cc (vect_bb_slp_scalar_cost): When
> > we run into stmts in patterns continue walking those
> > for uses outside of the vectorized region instead of
> > marking the lane live.
> >
> > * gcc.target/i386/pr103941-1.c: New testcase.
> > * gcc.target/i386/pr103941-2.c: Likewise.
> > ---
> >  gcc/testsuite/gcc.target/i386/pr103941-1.c | 14 +++
> >  gcc/testsuite/gcc.target/i386/pr103941-2.c | 12 ++
> >  gcc/tree-vect-slp.cc   | 47 --
> >  3 files changed, 61 insertions(+), 12 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr103941-1.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr103941-2.c
> >
> > diff --git a/gcc/testsuite/gcc.target/i386/pr103941-1.c 
> > b/gcc/testsuite/gcc.target/i386/pr103941-1.c
> > new file mode 100644
> > index 000..524fdd0b4b1
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/pr103941-1.c
> > @@ -0,0 +1,14 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -msse2" } */
> > +
> > +unsigned char ur[16], ua[16], ub[16];
> > +
> > +void avgu_v2qi (void)
> > +{
> > +  int i;
> > +
> > +  for (i = 0; i < 2; i++)
> > +ur[i] = (ua[i] + ub[i] + 1) >> 1;
> > +}
> > +
> > +/* { dg-final { scan-assembler "pavgb" } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/pr103941-2.c 
> > b/gcc/testsuite/gcc.target/i386/pr103941-2.c
> > new file mode 100644
> > index 000..972a32be997
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/pr103941-2.c
> > @@ -0,0 +1,12 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -msse2" } */
> > +
> > +void foo (int *c, float *x, float *y)
> > +{
> > +  c[0] = x[0] < y[0];
> > +  c[1] = x[1] < y[1];
> > +  c[2] = x[2] < y[2];
> > +  c[3] = x[3] < y[3];
> > +}
> > +
> > +/* { dg-final { scan-assembler "cmpltps" } } */
> > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> > index 4ac2b70303c..c7687065374 100644
> > --- a/gcc/tree-vect-slp.cc
> > +++ b/gcc/tree-vect-slp.cc
> > @@ -5185,22 +5185,45 @@ vect_bb_slp_scalar_cost (vec_info *vinfo,
> >  the scalar cost.  */
> >if (!STMT_VINFO_LIVE_P (stmt_info))
> > {
> > - FOR_EACH_PHI_OR_STMT_DEF (def_p, orig_stmt, op_iter, SSA_OP_DEF)
> > + auto_vec worklist;
> > + hash_set *worklist_visited = NULL;
> > + worklist.quick_push (orig_stmt);
> > + do
> > {
> > - imm_use_iterator use_iter;
> > - gimple *use_stmt;
> > - FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, DEF_FROM_PTR (def_p))
> > -   if (!is_gimple_debug (use_stmt))
> > - {
> > -   stmt_vec_info use_stmt_info = vinfo->lookup_stmt (use_stmt);
> > -   if (!use_stmt_info
> > -   || !vectorized_scalar_stmts.contains (use_stmt_info))
> > + gimple *work_stmt = worklist.pop ();
> > + FOR_EACH_PHI_OR_STMT_DEF (def_p, work_stmt, op_iter, SSA_OP_DEF)
> > +   {
> > + imm_use_iterator use_iter;
> > + gimple *use_stmt;
> > + FOR_EACH_IMM_USE_STMT (use_stmt, use_iter,
> > +DEF_FROM_PTR (def_p))
> > +   if (!is_gimple_debug (use_stmt))
> >   {
> > -   (*life)[i] = true;
> > -   break;
> > +   stmt_vec_info use_stmt_info
> > + = vinfo->lookup_stmt (use_stmt);
> > +   if (!use_stmt_info
> > +   || !vectorized_scalar_stmts.contains 
> > (use_stmt_info))
> > + {
> > +   if (STMT_VINFO_IN_PATTERN_P (use_stmt_info))
> > + {
> 
> I guess I should walk through the testcase and figure it out for myself,
> but: I assume vectorized_scalar_stmts exists because not every statement
> we've considered vectorising has made the cut.  Isn't that also true
> of (original) scalar statements that would have been vectorised using
> patterns?
> 
> Does vectorized_scalar_stmts record original statements or statements to
> vectorise?  

Re: [PATCH] tree-optimization/104010 - fix SLP scalar costing with patterns

2022-04-14 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> When doing BB vectorization the scalar cost compute is derailed
> by patterns, causing lanes to be considered live and thus not
> costed on the scalar side.  For the testcase in PR104010 this
> prevents vectorization which was done by GCC 11.  PR103941
> shows similar cases of missed optimizations that are fixed by
> this patch.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
>
> I'm only considering this now because PR104010 is identified
> as regression on arm - Richards, what do you think?  I do think
> this will enable vectorization of more stuff now which might
> be good or bad - who knowns, but at least it needs to involve
> patterns.
>
> Thanks,
> Richard.
>
> 2022-04-13  Richard Biener  
>
>   PR tree-optimization/104010
>   PR tree-optimization/103941
>   * tree-vect-slp.cc (vect_bb_slp_scalar_cost): When
>   we run into stmts in patterns continue walking those
>   for uses outside of the vectorized region instead of
>   marking the lane live.
>
>   * gcc.target/i386/pr103941-1.c: New testcase.
>   * gcc.target/i386/pr103941-2.c: Likewise.
> ---
>  gcc/testsuite/gcc.target/i386/pr103941-1.c | 14 +++
>  gcc/testsuite/gcc.target/i386/pr103941-2.c | 12 ++
>  gcc/tree-vect-slp.cc   | 47 --
>  3 files changed, 61 insertions(+), 12 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr103941-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr103941-2.c
>
> diff --git a/gcc/testsuite/gcc.target/i386/pr103941-1.c 
> b/gcc/testsuite/gcc.target/i386/pr103941-1.c
> new file mode 100644
> index 000..524fdd0b4b1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr103941-1.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -msse2" } */
> +
> +unsigned char ur[16], ua[16], ub[16];
> +
> +void avgu_v2qi (void)
> +{
> +  int i;
> +
> +  for (i = 0; i < 2; i++)
> +ur[i] = (ua[i] + ub[i] + 1) >> 1;
> +}
> +
> +/* { dg-final { scan-assembler "pavgb" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/pr103941-2.c 
> b/gcc/testsuite/gcc.target/i386/pr103941-2.c
> new file mode 100644
> index 000..972a32be997
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr103941-2.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -msse2" } */
> +
> +void foo (int *c, float *x, float *y)
> +{
> +  c[0] = x[0] < y[0];
> +  c[1] = x[1] < y[1];
> +  c[2] = x[2] < y[2];
> +  c[3] = x[3] < y[3];
> +}
> +
> +/* { dg-final { scan-assembler "cmpltps" } } */
> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> index 4ac2b70303c..c7687065374 100644
> --- a/gcc/tree-vect-slp.cc
> +++ b/gcc/tree-vect-slp.cc
> @@ -5185,22 +5185,45 @@ vect_bb_slp_scalar_cost (vec_info *vinfo,
>the scalar cost.  */
>if (!STMT_VINFO_LIVE_P (stmt_info))
>   {
> -   FOR_EACH_PHI_OR_STMT_DEF (def_p, orig_stmt, op_iter, SSA_OP_DEF)
> +   auto_vec worklist;
> +   hash_set *worklist_visited = NULL;
> +   worklist.quick_push (orig_stmt);
> +   do
>   {
> -   imm_use_iterator use_iter;
> -   gimple *use_stmt;
> -   FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, DEF_FROM_PTR (def_p))
> - if (!is_gimple_debug (use_stmt))
> -   {
> - stmt_vec_info use_stmt_info = vinfo->lookup_stmt (use_stmt);
> - if (!use_stmt_info
> - || !vectorized_scalar_stmts.contains (use_stmt_info))
> +   gimple *work_stmt = worklist.pop ();
> +   FOR_EACH_PHI_OR_STMT_DEF (def_p, work_stmt, op_iter, SSA_OP_DEF)
> + {
> +   imm_use_iterator use_iter;
> +   gimple *use_stmt;
> +   FOR_EACH_IMM_USE_STMT (use_stmt, use_iter,
> +  DEF_FROM_PTR (def_p))
> + if (!is_gimple_debug (use_stmt))
> {
> - (*life)[i] = true;
> - break;
> + stmt_vec_info use_stmt_info
> +   = vinfo->lookup_stmt (use_stmt);
> + if (!use_stmt_info
> + || !vectorized_scalar_stmts.contains 
> (use_stmt_info))
> +   {
> + if (STMT_VINFO_IN_PATTERN_P (use_stmt_info))
> +   {

I guess I should walk through the testcase and figure it out for myself,
but: I assume vectorized_scalar_stmts exists because not every statement
we've considered vectorising has made the cut.  Isn't that also true
of (original) scalar statements that would have been vectorised using
patterns?

Does vectorized_scalar_stmts record original statements or statements to
vectorise?  From its name I'd have assumed original statements, in which
case I wouldn't have expected IN_PATTERN_P to need special handling.

I know these are likely to be dumb questions, sorry. :-)

Richard

> +

Re: [PATCH] libstdc++: Optimize integer std::from_chars

2022-04-14 Thread Patrick Palka via Gcc-patches
On Thu, 14 Apr 2022, Patrick Palka wrote:

> This applies the following optimizations to the integer std::from_chars
> implementation:
> 
>   1. Use a lookup table for converting an alphanumeric digit to its
>  base-36 value instead of using a range test (for 0-9) and switch
>  (for a-z and A-Z).  The table is constructed using a C++14
>  constexpr function which doesn't assume a particular character
>  encoding or __CHAR_BIT__ value.  The new conversion function
>  __from_chars_alnum_to_val is templated on whether we care
>  only about the decimal digits, in which case we can perform the
>  conversion with a single subtraction since the digit characters
>  are guaranteed to be contiguous (unlike the letters).
>   2. Generalize __from_chars_binary to handle all power-of-two bases.
>  This function, now named __from_chars_pow2_base, is also templated
>  on whether we care only about the decimal digits in order to speed
>  up digit conversion for base 2, 4 and 8.
>   3. In __from_chars_digit, use
>static_cast(__c - '0') < __base
>  instead of
>'0' <= __c && __c <= ('0' + (__base - 1)).
>  as the digit recognition test (exhaustively verified that the two
>  tests are equivalent).
>   4. In __from_chars_alnum, use a nested loop to consume the rest of the
>  digits in the overflow case (mirroring __from_chars_digit) so that
>  the main loop doesn't have to maintain the __valid overflow flag.
> 
> At this point, __from_chars_digit is nearly identical to
> __from_chars_alnum, so this patch combines the two functions, removing
> the former and templatizing the latter according to whether we care only
> about the decimal digits.  Finally,
> 
>   5. In __from_chars_alnum, keep track of a lower bound on the number of
>  unused bits in the result and use that to omit the overflow check
>  when it's safe to do so.
> 
> In passing this replaces the non-portable function ascii_to_hexit
> used by __floating_from_chars_hex with the new conversion function.
> 
> Here are some runtime measurements for a simple 15-line benchmark that
> roundtrips printing/parsing 200 million integers via std::to/from_chars
> (average of 5 runs):
> 
>   Base  Before  After (seconds, lower is better)
>  29.37   9.37
>  3   12.13  15.79
>  83.67   4.15
> 103.86   4.90
> 115.03   6.84
> 162.93   4.14
> 322.39   3.85
> 363.26   5.22

Whoops, the second and third columns should be swapped (runtime is
smaller after the patch across the board).

> 
> Testedon x86_64-pc-linux-gnu, does this look OK for trunk?  Also tested
> against libc++'s from_chars tests for good measure.
> 
> libstdc++-v3/ChangeLog:
> 
>   * include/std/charconv (__from_chars_alnum_to_val_table): Define.
>   (__from_chars_alnum_to_val): Define.
>   (__from_chars_binary): Rename to ...
>   (__from_chars_pow2_base): ... this.  Generalize to handle any
>   power-of-two base using __from_chars_alnum_to_val.
>   (__from_chars_digit): Optimize digit recognition to a single
>   test instead of two tests.  Use [[__unlikely___]] attribute.
>   (__from_chars_alpha_to_num): Remove.
>   (__from_chars_alnum): Use __from_chars_alnum_to_val.  Use a
>   nested loop for the overflow case.
>   (from_chars): Adjust appropriately.
>   * src/c++17/floating_from_chars.cc (ascii_to_hexit): Remove.
>   (__floating_from_chars_hex): Use __from_chars_alnum_to_val
>   to recognize a hex digit instead.
> ---
>  libstdc++-v3/include/std/charconv | 250 --
>  libstdc++-v3/src/c++17/floating_from_chars.cc |  18 +-
>  2 files changed, 105 insertions(+), 163 deletions(-)
> 
> diff --git a/libstdc++-v3/include/std/charconv 
> b/libstdc++-v3/include/std/charconv
> index 2ce9c7d4cb9..5e44459749a 100644
> --- a/libstdc++-v3/include/std/charconv
> +++ b/libstdc++-v3/include/std/charconv
> @@ -407,176 +407,127 @@ namespace __detail
>return true;
>  }
>  
> -  /// std::from_chars implementation for integers in base 2.
> -  template
> +  // Construct and return a lookup table that maps 0-9, A-Z and a-z to the
> +  // corresponding corresponding base-36 value and maps all other characters
> +  // to 127.
> +  constexpr auto
> +  __from_chars_alnum_to_val_table()
> +  {
> +constexpr unsigned char __lower_letters[]
> +  = { 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j',
> +   'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't',
> +   'u', 'v', 'w', 'x', 'y', 'z' };
> +constexpr unsigned char __upper_letters[]
> +  = { 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J',
> +   'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T',
> +   'U', 'V', 'W', 'X', 'Y', 'Z' };
> +struct { unsigned char __data[1u << __CHAR_BIT__] = {}; } __table;
> +for (auto& __entry : __table.__data)
> +  __entry = 127;
> +for (int __i = 0; __i < 10; ++__i)
> +  

RE: [PATCH] libstdc++: Optimize integer std::from_chars

2022-04-14 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Gcc-patches  bounces+kyrylo.tkachov=arm@gcc.gnu.org> On Behalf Of Patrick Palka
> via Gcc-patches
> Sent: Thursday, April 14, 2022 1:31 PM
> To: gcc-patches@gcc.gnu.org
> Cc: libstd...@gcc.gnu.org
> Subject: [PATCH] libstdc++: Optimize integer std::from_chars
> 
> This applies the following optimizations to the integer std::from_chars
> implementation:
> 
>   1. Use a lookup table for converting an alphanumeric digit to its
>  base-36 value instead of using a range test (for 0-9) and switch
>  (for a-z and A-Z).  The table is constructed using a C++14
>  constexpr function which doesn't assume a particular character
>  encoding or __CHAR_BIT__ value.  The new conversion function
>  __from_chars_alnum_to_val is templated on whether we care
>  only about the decimal digits, in which case we can perform the
>  conversion with a single subtraction since the digit characters
>  are guaranteed to be contiguous (unlike the letters).
>   2. Generalize __from_chars_binary to handle all power-of-two bases.
>  This function, now named __from_chars_pow2_base, is also templated
>  on whether we care only about the decimal digits in order to speed
>  up digit conversion for base 2, 4 and 8.
>   3. In __from_chars_digit, use
>static_cast(__c - '0') < __base
>  instead of
>'0' <= __c && __c <= ('0' + (__base - 1)).
>  as the digit recognition test (exhaustively verified that the two
>  tests are equivalent).
>   4. In __from_chars_alnum, use a nested loop to consume the rest of the
>  digits in the overflow case (mirroring __from_chars_digit) so that
>  the main loop doesn't have to maintain the __valid overflow flag.
> 
> At this point, __from_chars_digit is nearly identical to
> __from_chars_alnum, so this patch combines the two functions, removing
> the former and templatizing the latter according to whether we care only
> about the decimal digits.  Finally,
> 
>   5. In __from_chars_alnum, keep track of a lower bound on the number of
>  unused bits in the result and use that to omit the overflow check
>  when it's safe to do so.
> 
> In passing this replaces the non-portable function ascii_to_hexit
> used by __floating_from_chars_hex with the new conversion function.
> 
> Here are some runtime measurements for a simple 15-line benchmark that
> roundtrips printing/parsing 200 million integers via std::to/from_chars
> (average of 5 runs):
> 
>   Base  Before  After (seconds, lower is better)
>  29.37   9.37
>  3   12.13  15.79
>  83.67   4.15
> 103.86   4.90
> 115.03   6.84
> 162.93   4.14
> 322.39   3.85
> 363.26   5.22

The after numbers look worse (higher)? Are the columns accidentally swapped?
Thanks,
Kyrill

> 
> Testedon x86_64-pc-linux-gnu, does this look OK for trunk?  Also tested
> against libc++'s from_chars tests for good measure.
> 
> libstdc++-v3/ChangeLog:
> 
>   * include/std/charconv (__from_chars_alnum_to_val_table): Define.
>   (__from_chars_alnum_to_val): Define.
>   (__from_chars_binary): Rename to ...
>   (__from_chars_pow2_base): ... this.  Generalize to handle any
>   power-of-two base using __from_chars_alnum_to_val.
>   (__from_chars_digit): Optimize digit recognition to a single
>   test instead of two tests.  Use [[__unlikely___]] attribute.
>   (__from_chars_alpha_to_num): Remove.
>   (__from_chars_alnum): Use __from_chars_alnum_to_val.  Use a
>   nested loop for the overflow case.
>   (from_chars): Adjust appropriately.
>   * src/c++17/floating_from_chars.cc (ascii_to_hexit): Remove.
>   (__floating_from_chars_hex): Use __from_chars_alnum_to_val
>   to recognize a hex digit instead.
> ---
>  libstdc++-v3/include/std/charconv | 250 --
>  libstdc++-v3/src/c++17/floating_from_chars.cc |  18 +-
>  2 files changed, 105 insertions(+), 163 deletions(-)
> 
> diff --git a/libstdc++-v3/include/std/charconv b/libstdc++-
> v3/include/std/charconv
> index 2ce9c7d4cb9..5e44459749a 100644
> --- a/libstdc++-v3/include/std/charconv
> +++ b/libstdc++-v3/include/std/charconv
> @@ -407,176 +407,127 @@ namespace __detail
>return true;
>  }
> 
> -  /// std::from_chars implementation for integers in base 2.
> -  template
> +  // Construct and return a lookup table that maps 0-9, A-Z and a-z to the
> +  // corresponding corresponding base-36 value and maps all other
> characters
> +  // to 127.
> +  constexpr auto
> +  __from_chars_alnum_to_val_table()
> +  {
> +constexpr unsigned char __lower_letters[]
> +  = { 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j',
> +   'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't',
> +   'u', 'v', 'w', 'x', 'y', 'z' };
> +constexpr unsigned char __upper_letters[]
> +  = { 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J',
> +   'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 

[PATCH] libstdc++: Optimize integer std::from_chars

2022-04-14 Thread Patrick Palka via Gcc-patches
This applies the following optimizations to the integer std::from_chars
implementation:

  1. Use a lookup table for converting an alphanumeric digit to its
 base-36 value instead of using a range test (for 0-9) and switch
 (for a-z and A-Z).  The table is constructed using a C++14
 constexpr function which doesn't assume a particular character
 encoding or __CHAR_BIT__ value.  The new conversion function
 __from_chars_alnum_to_val is templated on whether we care
 only about the decimal digits, in which case we can perform the
 conversion with a single subtraction since the digit characters
 are guaranteed to be contiguous (unlike the letters).
  2. Generalize __from_chars_binary to handle all power-of-two bases.
 This function, now named __from_chars_pow2_base, is also templated
 on whether we care only about the decimal digits in order to speed
 up digit conversion for base 2, 4 and 8.
  3. In __from_chars_digit, use
   static_cast(__c - '0') < __base
 instead of
   '0' <= __c && __c <= ('0' + (__base - 1)).
 as the digit recognition test (exhaustively verified that the two
 tests are equivalent).
  4. In __from_chars_alnum, use a nested loop to consume the rest of the
 digits in the overflow case (mirroring __from_chars_digit) so that
 the main loop doesn't have to maintain the __valid overflow flag.

At this point, __from_chars_digit is nearly identical to
__from_chars_alnum, so this patch combines the two functions, removing
the former and templatizing the latter according to whether we care only
about the decimal digits.  Finally,

  5. In __from_chars_alnum, keep track of a lower bound on the number of
 unused bits in the result and use that to omit the overflow check
 when it's safe to do so.

In passing this replaces the non-portable function ascii_to_hexit
used by __floating_from_chars_hex with the new conversion function.

Here are some runtime measurements for a simple 15-line benchmark that
roundtrips printing/parsing 200 million integers via std::to/from_chars
(average of 5 runs):

  Base  Before  After (seconds, lower is better)
 29.37   9.37
 3   12.13  15.79
 83.67   4.15
103.86   4.90
115.03   6.84
162.93   4.14
322.39   3.85
363.26   5.22

Testedon x86_64-pc-linux-gnu, does this look OK for trunk?  Also tested
against libc++'s from_chars tests for good measure.

libstdc++-v3/ChangeLog:

* include/std/charconv (__from_chars_alnum_to_val_table): Define.
(__from_chars_alnum_to_val): Define.
(__from_chars_binary): Rename to ...
(__from_chars_pow2_base): ... this.  Generalize to handle any
power-of-two base using __from_chars_alnum_to_val.
(__from_chars_digit): Optimize digit recognition to a single
test instead of two tests.  Use [[__unlikely___]] attribute.
(__from_chars_alpha_to_num): Remove.
(__from_chars_alnum): Use __from_chars_alnum_to_val.  Use a
nested loop for the overflow case.
(from_chars): Adjust appropriately.
* src/c++17/floating_from_chars.cc (ascii_to_hexit): Remove.
(__floating_from_chars_hex): Use __from_chars_alnum_to_val
to recognize a hex digit instead.
---
 libstdc++-v3/include/std/charconv | 250 --
 libstdc++-v3/src/c++17/floating_from_chars.cc |  18 +-
 2 files changed, 105 insertions(+), 163 deletions(-)

diff --git a/libstdc++-v3/include/std/charconv 
b/libstdc++-v3/include/std/charconv
index 2ce9c7d4cb9..5e44459749a 100644
--- a/libstdc++-v3/include/std/charconv
+++ b/libstdc++-v3/include/std/charconv
@@ -407,176 +407,127 @@ namespace __detail
   return true;
 }
 
-  /// std::from_chars implementation for integers in base 2.
-  template
+  // Construct and return a lookup table that maps 0-9, A-Z and a-z to the
+  // corresponding corresponding base-36 value and maps all other characters
+  // to 127.
+  constexpr auto
+  __from_chars_alnum_to_val_table()
+  {
+constexpr unsigned char __lower_letters[]
+  = { 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j',
+ 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't',
+ 'u', 'v', 'w', 'x', 'y', 'z' };
+constexpr unsigned char __upper_letters[]
+  = { 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J',
+ 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T',
+ 'U', 'V', 'W', 'X', 'Y', 'Z' };
+struct { unsigned char __data[1u << __CHAR_BIT__] = {}; } __table;
+for (auto& __entry : __table.__data)
+  __entry = 127;
+for (int __i = 0; __i < 10; ++__i)
+  __table.__data['0' + __i] = __i;
+for (int __i = 0; __i < 26; ++__i)
+  {
+   __table.__data[__lower_letters[__i]] = 10 + __i;
+   __table.__data[__upper_letters[__i]] = 10 + __i;
+  }
+return __table;
+  }
+
+  /// If _DecOnly is true: if the character is a decimal digit, then
+  /// return its corresponding 

Re: [PATCH] simplify-rtx: Don't assume shift count has the same mode as the shift [PR105247]

2022-04-14 Thread Eric Botcazou via Gcc-patches
> 2022-04-13  Jakub Jelinek  
> 
>   PR target/105247
>   * simplify-rtx.cc (simplify_const_binary_operation): For shifts
>   or rotates by VOIDmode constant integer shift count use word_mode
>   for the operand if int_mode is narrower than word.
> 
>   * gcc.c-torture/compile/pr105247.c: New test.

OK, thanks.

-- 
Eric Botcazou




Re: [PATCH] libstdc++: Update incorrect statement about mainline in docs

2022-04-14 Thread Richard Biener via Gcc-patches
On Thu, 14 Apr 2022, Jonathan Wakely wrote:

> On Thu, 14 Apr 2022 at 11:36, Richard Biener  wrote:
> >
> > On Thu, 14 Apr 2022, Jonathan Wakely wrote:
> >
> > > This fixes some misleading text in the libstdc++ manual that says the
> > > docs for the gcc-11 branch refer to mainline.
> > >
> > > Richi, is this OK for the gcc-11 branch now? It's been wrong for 11.1
> > > and 11.2, but it would still be nice to fix.
> >
> > Yes, it's OK.  I notice the same problem exists on the GCC 10 branch
> > but GCC 9 at least mentions GCC 9 once ;)
> 
> Yes, I fixed it for gcc-9.3.0, but forgot to do it for gcc-10 and gcc-11.
> 
> I pushed r10-10534 to fix gcc-10 (since that's open for doc changes)
> and have now pushed r11-9881
> as well.
> 
> Maybe this year I'll remember to do it for gcc-12 after we branch from trunk!

Add an entry to branching.html!

Richard.


[committed] libstdc++: Fix missing and incorrect feature test macros [PR105269]

2022-04-14 Thread Jonathan Wakely via Gcc-patches
Tested powerpc64le-linux, pushed to trunk.

-- >8 --

libstdc++-v3/ChangeLog:

PR libstdc++/105269
* include/bits/stl_vector.h (__cpp_lib_constexpr_vector):
Define.
* include/c_compatibility/stdatomic.h (__cpp_lib_stdatomic_h):
Define.
* include/std/optional (__cpp_lib_optional): Define new value
for C++23.
(__cpp_lib_monadic_optional): Remove.
* include/std/version (__cpp_lib_constexpr_vector): Define.
(__cpp_lib_stdatomic_h): Define.
(__cpp_lib_optional): Define new value for C++23.
(__cpp_lib_monadic_optional): Remove.
* testsuite/20_util/optional/monadic/and_then.cc: Adjust.
* testsuite/20_util/optional/requirements.cc: Adjust for C++23.
* testsuite/20_util/optional/version.cc: Likewise.
* testsuite/23_containers/vector/cons/constexpr.cc: Check
feature test macro.
* testsuite/29_atomics/headers/stdatomic.h/c_compat.cc:
Likewise.
* testsuite/20_util/optional/monadic/version.cc: Removed.
* testsuite/23_containers/vector/requirements/version.cc: New test.
* testsuite/29_atomics/headers/stdatomic.h/version.cc: New test.
---
 libstdc++-v3/include/bits/stl_vector.h   |  3 ++-
 libstdc++-v3/include/c_compatibility/stdatomic.h |  2 ++
 libstdc++-v3/include/std/optional| 12 ++--
 libstdc++-v3/include/std/version |  5 -
 .../testsuite/20_util/optional/monadic/and_then.cc   |  4 +---
 .../testsuite/20_util/optional/monadic/version.cc| 10 --
 .../testsuite/20_util/optional/requirements.cc   |  4 +++-
 libstdc++-v3/testsuite/20_util/optional/version.cc   |  4 +++-
 .../testsuite/23_containers/vector/cons/constexpr.cc |  7 +++
 .../23_containers/vector/requirements/version.cc | 10 ++
 .../29_atomics/headers/stdatomic.h/c_compat.cc   |  6 ++
 .../29_atomics/headers/stdatomic.h/version.cc| 10 ++
 12 files changed, 54 insertions(+), 23 deletions(-)
 delete mode 100644 libstdc++-v3/testsuite/20_util/optional/monadic/version.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/vector/requirements/version.cc
 create mode 100644 
libstdc++-v3/testsuite/29_atomics/headers/stdatomic.h/version.cc

diff --git a/libstdc++-v3/include/bits/stl_vector.h 
b/libstdc++-v3/include/bits/stl_vector.h
index 8e2fcc6f49b..b4ff3989a5d 100644
--- a/libstdc++-v3/include/bits/stl_vector.h
+++ b/libstdc++-v3/include/bits/stl_vector.h
@@ -62,8 +62,9 @@
 #if __cplusplus >= 201103L
 #include 
 #endif
-#if __cplusplus > 201703L
+#if __cplusplus >= 202002L
 # include 
+#define __cpp_lib_constexpr_vector 201907L
 #endif
 
 #include 
diff --git a/libstdc++-v3/include/c_compatibility/stdatomic.h 
b/libstdc++-v3/include/c_compatibility/stdatomic.h
index c97cbac984e..a51a84c2054 100644
--- a/libstdc++-v3/include/c_compatibility/stdatomic.h
+++ b/libstdc++-v3/include/c_compatibility/stdatomic.h
@@ -32,6 +32,8 @@
 #if __cplusplus > 202002L
 #include 
 
+#define __cpp_lib_stdatomic_h 202011L
+
 #define _Atomic(_Tp) std::atomic<_Tp>
 
 using std::memory_order;
diff --git a/libstdc++-v3/include/std/optional 
b/libstdc++-v3/include/std/optional
index d6aece45fbf..791ef6f1994 100644
--- a/libstdc++-v3/include/std/optional
+++ b/libstdc++-v3/include/std/optional
@@ -60,10 +60,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
*  @{
*/
 
-#if __cplusplus == 201703L
-# define __cpp_lib_optional 201606L
-#else
+#if __cplusplus > 202002L && __cpp_lib_concepts
+# define __cpp_lib_optional 202110L
+#elif __cplusplus >= 202002L
 # define __cpp_lib_optional 202106L
+#else
+# define __cpp_lib_optional 201606L
 #endif
 
   template
@@ -1043,9 +1045,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
return static_cast<_Tp>(std::forward<_Up>(__u));
}
 
-#if __cplusplus > 202002L && __cpp_lib_concepts
-#define __cpp_lib_monadic_optional 202110L
-
+#if __cpp_lib_optional >= 202110L
   // [optional.monadic]
 
   template
diff --git a/libstdc++-v3/include/std/version b/libstdc++-v3/include/std/version
index 51f2110b68e..d8ec658484f 100644
--- a/libstdc++-v3/include/std/version
+++ b/libstdc++-v3/include/std/version
@@ -257,6 +257,7 @@
 #define __cpp_lib_constexpr_string_view 201811L
 #define __cpp_lib_constexpr_tuple 201811L
 #define __cpp_lib_constexpr_utility 201811L
+#define __cpp_lib_constexpr_vector 201907L
 #define __cpp_lib_erase_if 202002L
 #define __cpp_lib_generic_unordered_lookup 201811L
 #define __cpp_lib_interpolate 201902L
@@ -312,7 +313,8 @@
 #define __cpp_lib_invoke_r 202106L
 #define __cpp_lib_ios_noreplace 202200L
 #if __cpp_lib_concepts
-# define __cpp_lib_monadic_optional 202110L
+# undef __cpp_lib_optional
+# define __cpp_lib_optional 202110L
 #endif
 #define __cpp_lib_move_only_function 202110L
 #if __cpp_lib_span
@@ -321,6 +323,7 @@
 #if _GLIBCXX_HAVE_STACKTRACE
 # define __cpp_lib_stacktrace 202011L
 #endif
+#define __cpp_lib_stdatomic_h 

[committed] libstdc++: Add new headers to PCH

2022-04-14 Thread Jonathan Wakely via Gcc-patches
Tested powerpc64le-linux, pushed to trunk.

-- >8 --

libstdc++-v3/ChangeLog:

* include/precompiled/stdc++.h: Include  and
 for C++23.
---
 libstdc++-v3/include/precompiled/stdc++.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/libstdc++-v3/include/precompiled/stdc++.h 
b/libstdc++-v3/include/precompiled/stdc++.h
index 6d6d2ad7c4c..5ee1244dc22 100644
--- a/libstdc++-v3/include/precompiled/stdc++.h
+++ b/libstdc++-v3/include/precompiled/stdc++.h
@@ -155,4 +155,8 @@
 #if __cplusplus > 202002L
 #include 
 #include 
+#if __has_include()
+# include 
+#endif
+#include 
 #endif
-- 
2.34.1



Re: [PATCH] libstdc++: Update incorrect statement about mainline in docs

2022-04-14 Thread Jonathan Wakely via Gcc-patches
On Thu, 14 Apr 2022 at 11:36, Richard Biener  wrote:
>
> On Thu, 14 Apr 2022, Jonathan Wakely wrote:
>
> > This fixes some misleading text in the libstdc++ manual that says the
> > docs for the gcc-11 branch refer to mainline.
> >
> > Richi, is this OK for the gcc-11 branch now? It's been wrong for 11.1
> > and 11.2, but it would still be nice to fix.
>
> Yes, it's OK.  I notice the same problem exists on the GCC 10 branch
> but GCC 9 at least mentions GCC 9 once ;)

Yes, I fixed it for gcc-9.3.0, but forgot to do it for gcc-10 and gcc-11.

I pushed r10-10534 to fix gcc-10 (since that's open for doc changes)
and have now pushed r11-9881
as well.

Maybe this year I'll remember to do it for gcc-12 after we branch from trunk!



Re: [PATCH] libstdc++: Update incorrect statement about mainline in docs

2022-04-14 Thread Richard Biener via Gcc-patches
On Thu, 14 Apr 2022, Jonathan Wakely wrote:

> This fixes some misleading text in the libstdc++ manual that says the
> docs for the gcc-11 branch refer to mainline.
> 
> Richi, is this OK for the gcc-11 branch now? It's been wrong for 11.1
> and 11.2, but it would still be nice to fix.

Yes, it's OK.  I notice the same problem exists on the GCC 10 branch
but GCC 9 at least mentions GCC 9 once ;)

Richard.

> -- >8 --
> 
> libstdc++-v3/ChangeLog:
> 
>   * doc/xml/manual/status_cxx1998.xml: Refer to GCC 11 not
>   mainline.
>   * doc/xml/manual/status_cxx2011.xml: Likewise.
>   * doc/xml/manual/status_cxx2014.xml: Likewise.
>   * doc/xml/manual/status_cxx2017.xml: Likewise.
>   * doc/xml/manual/status_cxx2020.xml: Likewise.
>   * doc/xml/manual/status_cxxtr1.xml: Likewise.
>   * doc/xml/manual/status_cxxtr24733.xml: Likewise.
>   * doc/html/manual/status.html: Regenerate.
> ---
>  libstdc++-v3/doc/html/manual/status.html  | 24 ---
>  .../doc/xml/manual/status_cxx1998.xml |  3 +--
>  .../doc/xml/manual/status_cxx2011.xml |  3 +--
>  .../doc/xml/manual/status_cxx2014.xml |  4 ++--
>  .../doc/xml/manual/status_cxx2017.xml |  4 ++--
>  .../doc/xml/manual/status_cxx2020.xml |  4 ++--
>  libstdc++-v3/doc/xml/manual/status_cxxtr1.xml |  3 +--
>  .../doc/xml/manual/status_cxxtr24733.xml  |  3 +--
>  8 files changed, 20 insertions(+), 28 deletions(-)
> 
> diff --git a/libstdc++-v3/doc/xml/manual/status_cxx1998.xml 
> b/libstdc++-v3/doc/xml/manual/status_cxx1998.xml
> index 792272bcf26..f1be6667cd7 100644
> --- a/libstdc++-v3/doc/xml/manual/status_cxx1998.xml
> +++ b/libstdc++-v3/doc/xml/manual/status_cxx1998.xml
> @@ -18,8 +18,7 @@ This status table is based on the table of contents of 
> ISO/IEC 14882:2003.
>  
>  
>  
> -This section describes the C++ support in mainline GCC, not in any
> -particular release.
> +This section describes the C++ support in the GCC 11 release series.
>  
>  
>  
> diff --git a/libstdc++-v3/doc/xml/manual/status_cxx2011.xml 
> b/libstdc++-v3/doc/xml/manual/status_cxx2011.xml
> index 88844f8f0cc..e06f2479763 100644
> --- a/libstdc++-v3/doc/xml/manual/status_cxx2011.xml
> +++ b/libstdc++-v3/doc/xml/manual/status_cxx2011.xml
> @@ -34,8 +34,7 @@ This status table is based on the table of contents of 
> ISO/IEC 14882:2011.
>  
>  
>  
> -This section describes the C++11 support in mainline GCC, not in any
> -particular release.
> +This section describes the C++11 support in the GCC 11 release series.
>  
>  
>  
> diff --git a/libstdc++-v3/doc/xml/manual/status_cxx2014.xml 
> b/libstdc++-v3/doc/xml/manual/status_cxx2014.xml
> index 5dc287707d8..deb15c7f0d5 100644
> --- a/libstdc++-v3/doc/xml/manual/status_cxx2014.xml
> +++ b/libstdc++-v3/doc/xml/manual/status_cxx2014.xml
> @@ -29,8 +29,8 @@ since C++11 and the implementation is complete.
>  
>  
>  
> -This section describes the C++14 and library TS support in mainline GCC,
> -not in any particular release.
> +This section describes the C++14 and library TS support in the GCC 11
> +release series.
>  
>  
>  
> diff --git a/libstdc++-v3/doc/xml/manual/status_cxx2017.xml 
> b/libstdc++-v3/doc/xml/manual/status_cxx2017.xml
> index 11372bb28b3..52bbce0a2be 100644
> --- a/libstdc++-v3/doc/xml/manual/status_cxx2017.xml
> +++ b/libstdc++-v3/doc/xml/manual/status_cxx2017.xml
> @@ -23,8 +23,8 @@ since that release.
>  
>  
>  
> -This section describes the C++17 and library TS support in mainline GCC,
> -not in any particular release.
> +This section describes the C++17 and library TS support in the GCC 11
> +release series.
>  
>  
>  
> diff --git a/libstdc++-v3/doc/xml/manual/status_cxx2020.xml 
> b/libstdc++-v3/doc/xml/manual/status_cxx2020.xml
> index 9e9777ea85f..411a337c534 100644
> --- a/libstdc++-v3/doc/xml/manual/status_cxx2020.xml
> +++ b/libstdc++-v3/doc/xml/manual/status_cxx2020.xml
> @@ -20,8 +20,8 @@ presence of the required flag.
>  
>  
>  
> -This section describes the C++20 and library TS support in mainline GCC,
> -not in any particular release.
> +This section describes the C++20 and library TS support in the GCC 11
> +release series.
>  
>  
>  
> diff --git a/libstdc++-v3/doc/xml/manual/status_cxxtr1.xml 
> b/libstdc++-v3/doc/xml/manual/status_cxxtr1.xml
> index 021cb6394a7..6533857ba12 100644
> --- a/libstdc++-v3/doc/xml/manual/status_cxxtr1.xml
> +++ b/libstdc++-v3/doc/xml/manual/status_cxxtr1.xml
> @@ -22,8 +22,7 @@ In this implementation the header names are prefixed by
>  
>  
>  
> -This page describes the TR1 support in mainline GCC, not in any particular
> -release.
> +This page describes the TR1 support in the GCC 11 release series.
>  
>  
>  
> diff --git a/libstdc++-v3/doc/xml/manual/status_cxxtr24733.xml 
> b/libstdc++-v3/doc/xml/manual/status_cxxtr24733.xml
> index 139b94442e8..4b6d64c7c6a 100644
> --- a/libstdc++-v3/doc/xml/manual/status_cxxtr24733.xml
> +++ b/libstdc++-v3/doc/xml/manual/status_cxxtr24733.xml
> 

[PATCH] libstdc++: Update incorrect statement about mainline in docs

2022-04-14 Thread Jonathan Wakely via Gcc-patches
This fixes some misleading text in the libstdc++ manual that says the
docs for the gcc-11 branch refer to mainline.

Richi, is this OK for the gcc-11 branch now? It's been wrong for 11.1
and 11.2, but it would still be nice to fix.

-- >8 --

libstdc++-v3/ChangeLog:

* doc/xml/manual/status_cxx1998.xml: Refer to GCC 11 not
mainline.
* doc/xml/manual/status_cxx2011.xml: Likewise.
* doc/xml/manual/status_cxx2014.xml: Likewise.
* doc/xml/manual/status_cxx2017.xml: Likewise.
* doc/xml/manual/status_cxx2020.xml: Likewise.
* doc/xml/manual/status_cxxtr1.xml: Likewise.
* doc/xml/manual/status_cxxtr24733.xml: Likewise.
* doc/html/manual/status.html: Regenerate.
---
 libstdc++-v3/doc/html/manual/status.html  | 24 ---
 .../doc/xml/manual/status_cxx1998.xml |  3 +--
 .../doc/xml/manual/status_cxx2011.xml |  3 +--
 .../doc/xml/manual/status_cxx2014.xml |  4 ++--
 .../doc/xml/manual/status_cxx2017.xml |  4 ++--
 .../doc/xml/manual/status_cxx2020.xml |  4 ++--
 libstdc++-v3/doc/xml/manual/status_cxxtr1.xml |  3 +--
 .../doc/xml/manual/status_cxxtr24733.xml  |  3 +--
 8 files changed, 20 insertions(+), 28 deletions(-)

diff --git a/libstdc++-v3/doc/xml/manual/status_cxx1998.xml 
b/libstdc++-v3/doc/xml/manual/status_cxx1998.xml
index 792272bcf26..f1be6667cd7 100644
--- a/libstdc++-v3/doc/xml/manual/status_cxx1998.xml
+++ b/libstdc++-v3/doc/xml/manual/status_cxx1998.xml
@@ -18,8 +18,7 @@ This status table is based on the table of contents of 
ISO/IEC 14882:2003.
 
 
 
-This section describes the C++ support in mainline GCC, not in any
-particular release.
+This section describes the C++ support in the GCC 11 release series.
 
 
 
diff --git a/libstdc++-v3/doc/xml/manual/status_cxx2011.xml 
b/libstdc++-v3/doc/xml/manual/status_cxx2011.xml
index 88844f8f0cc..e06f2479763 100644
--- a/libstdc++-v3/doc/xml/manual/status_cxx2011.xml
+++ b/libstdc++-v3/doc/xml/manual/status_cxx2011.xml
@@ -34,8 +34,7 @@ This status table is based on the table of contents of 
ISO/IEC 14882:2011.
 
 
 
-This section describes the C++11 support in mainline GCC, not in any
-particular release.
+This section describes the C++11 support in the GCC 11 release series.
 
 
 
diff --git a/libstdc++-v3/doc/xml/manual/status_cxx2014.xml 
b/libstdc++-v3/doc/xml/manual/status_cxx2014.xml
index 5dc287707d8..deb15c7f0d5 100644
--- a/libstdc++-v3/doc/xml/manual/status_cxx2014.xml
+++ b/libstdc++-v3/doc/xml/manual/status_cxx2014.xml
@@ -29,8 +29,8 @@ since C++11 and the implementation is complete.
 
 
 
-This section describes the C++14 and library TS support in mainline GCC,
-not in any particular release.
+This section describes the C++14 and library TS support in the GCC 11
+release series.
 
 
 
diff --git a/libstdc++-v3/doc/xml/manual/status_cxx2017.xml 
b/libstdc++-v3/doc/xml/manual/status_cxx2017.xml
index 11372bb28b3..52bbce0a2be 100644
--- a/libstdc++-v3/doc/xml/manual/status_cxx2017.xml
+++ b/libstdc++-v3/doc/xml/manual/status_cxx2017.xml
@@ -23,8 +23,8 @@ since that release.
 
 
 
-This section describes the C++17 and library TS support in mainline GCC,
-not in any particular release.
+This section describes the C++17 and library TS support in the GCC 11
+release series.
 
 
 
diff --git a/libstdc++-v3/doc/xml/manual/status_cxx2020.xml 
b/libstdc++-v3/doc/xml/manual/status_cxx2020.xml
index 9e9777ea85f..411a337c534 100644
--- a/libstdc++-v3/doc/xml/manual/status_cxx2020.xml
+++ b/libstdc++-v3/doc/xml/manual/status_cxx2020.xml
@@ -20,8 +20,8 @@ presence of the required flag.
 
 
 
-This section describes the C++20 and library TS support in mainline GCC,
-not in any particular release.
+This section describes the C++20 and library TS support in the GCC 11
+release series.
 
 
 
diff --git a/libstdc++-v3/doc/xml/manual/status_cxxtr1.xml 
b/libstdc++-v3/doc/xml/manual/status_cxxtr1.xml
index 021cb6394a7..6533857ba12 100644
--- a/libstdc++-v3/doc/xml/manual/status_cxxtr1.xml
+++ b/libstdc++-v3/doc/xml/manual/status_cxxtr1.xml
@@ -22,8 +22,7 @@ In this implementation the header names are prefixed by
 
 
 
-This page describes the TR1 support in mainline GCC, not in any particular
-release.
+This page describes the TR1 support in the GCC 11 release series.
 
 
 
diff --git a/libstdc++-v3/doc/xml/manual/status_cxxtr24733.xml 
b/libstdc++-v3/doc/xml/manual/status_cxxtr24733.xml
index 139b94442e8..4b6d64c7c6a 100644
--- a/libstdc++-v3/doc/xml/manual/status_cxxtr24733.xml
+++ b/libstdc++-v3/doc/xml/manual/status_cxxtr24733.xml
@@ -16,8 +16,7 @@ decimal floating-point arithmetic
 
 
 
-This page describes the TR 24733 support in mainline GCC, not in any
-particular release.
+This page describes the TR 24733 support in the GCC 11 release series.
 
 
 
-- 
2.34.1



Re: [PATCH] i386: Disable stv under optimize_size [PR 105034]

2022-04-14 Thread Hongyu Wang via Gcc-patches
> As a general comment it would be nicer if the cost metric itself would focus
> on size costs when optimizing for size and speed costs when optimizing for
> speed so that individual STV opportunities can be enabled/disabled based
> on it.

Agreed. I think the cost computation should consider insn number under
-Os, also for ABS/MIN/MAX it needs more correct model to describe the
actual insn count.
Thanks for your review.

Richard Biener  于2022年4月14日周四 16:56写道:
>
> On Thu, Apr 14, 2022 at 10:31 AM Hongyu Wang  wrote:
> >
> > > >virtual bool gate (function *)
> > >
> > > please name the parameter ...
> > >
> > > >  {
> > > >return ((!timode_p || TARGET_64BIT)
> > > > -   && TARGET_STV && TARGET_SSE2 && optimize > 1);
> > > > +   && TARGET_STV && TARGET_SSE2 && optimize > 1
> > > > +   && optimize_function_for_speed_p (cfun));
> > >
> > > ... and use it here instead of referencing 'cfun'
> >
> > Updated. Thanks!
>
> As a general comment it would be nicer if the cost metric itself would focus
> on size costs when optimizing for size and speed costs when optimizing for
> speed so that individual STV opportunities can be enabled/disabled based
> on it.
>
> At least I see the chance that there will be a case where STV improves
> code size that will regress if we simply disable it for -Os.  Like when I do
>
> typedef int v4si __attribute__((vector_size(16)));
>
> #define min(a,b) ((a)<(b)?(a):(b))
>
> v4si foo (v4si a, v4si b)
> {
>   a[0] = min (a[0], b[0]);
>   return a;
> }
>
> there's a xmm to grp move penalty for scalar code that could go away
> (but oddly enough we're not arranging for the use of pminsd here - seems
> we're confused about vec_select/vec_merge).
>
> Richard.
>
> > gcc/ChangeLog:
> >
> > PR target/105034
> > * config/i386/i386-features.cc (pass_stv::gate()): Name param
> > to fun and add optimize_function_for_speed_p (fun).
> >
> > gcc/testsuite/ChangeLog:
> >
> > PR target/105034
> > * gcc.target/i386/pr105034.c: New test.
> > ---
> >  gcc/config/i386/i386-features.cc |  5 +++--
> >  gcc/testsuite/gcc.target/i386/pr105034.c | 23 +++
> >  2 files changed, 26 insertions(+), 2 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr105034.c
> >
> > diff --git a/gcc/config/i386/i386-features.cc 
> > b/gcc/config/i386/i386-features.cc
> > index 6fe41c3c24f..26be2986486 100644
> > --- a/gcc/config/i386/i386-features.cc
> > +++ b/gcc/config/i386/i386-features.cc
> > @@ -1908,10 +1908,11 @@ public:
> >{}
> >
> >/* opt_pass methods: */
> > -  virtual bool gate (function *)
> > +  virtual bool gate (function *fun)
> >  {
> >return ((!timode_p || TARGET_64BIT)
> > -  && TARGET_STV && TARGET_SSE2 && optimize > 1);
> > +  && TARGET_STV && TARGET_SSE2 && optimize > 1
> > +  && optimize_function_for_speed_p (fun));
> >  }
> >
> >virtual unsigned int execute (function *)
> > diff --git a/gcc/testsuite/gcc.target/i386/pr105034.c
> > b/gcc/testsuite/gcc.target/i386/pr105034.c
> > new file mode 100644
> > index 000..d997e26e9ed
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/pr105034.c
> > @@ -0,0 +1,23 @@
> > +/* PR target/105034 */
> > +/* { dg-do compile } */
> > +/* { dg-options "-Os -msse4.1" } */
> > +
> > +#define max(a,b) (((a) > (b))? (a) : (b))
> > +#define min(a,b) (((a) < (b))? (a) : (b))
> > +
> > +int foo(int x)
> > +{
> > +  return max(x,0);
> > +}
> > +
> > +int bar(int x)
> > +{
> > +  return min(x,0);
> > +}
> > +
> > +unsigned int baz(unsigned int x)
> > +{
> > +  return min(x,1);
> > +}
> > +
> > +/* { dg-final { scan-assembler-not "xmm" } } */
> > --
> > 2.18.1
> >
> >
> > Richard Biener  于2022年4月14日周四 16:06写道:
> > >
> > > On Thu, Apr 14, 2022 at 9:55 AM Hongyu Wang  
> > > wrote:
> > > >
> > > > >
> > > > > optimize_function_for_speed ()?
> > > > >
> > > >
> > > > Yes, updated patch with optimize_function_for_speed_p()
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > PR target/105034
> > > > * config/i386/i386-features.cc (pass_stv::gate()): Add
> > > >   optimize_function_for_speed_p ().
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > > PR target/105034
> > > > * gcc.target/i386/pr105034.c: New test.
> > > > ---
> > > >  gcc/config/i386/i386-features.cc |  3 ++-
> > > >  gcc/testsuite/gcc.target/i386/pr105034.c | 23 +++
> > > >  2 files changed, 25 insertions(+), 1 deletion(-)
> > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr105034.c
> > > >
> > > > diff --git a/gcc/config/i386/i386-features.cc 
> > > > b/gcc/config/i386/i386-features.cc
> > > > index 6fe41c3c24f..a49c3aa1525 100644
> > > > --- a/gcc/config/i386/i386-features.cc
> > > > +++ b/gcc/config/i386/i386-features.cc
> > > > @@ -1911,7 +1911,8 @@ public:
> > > >virtual bool gate (function *)
> > >
> > > please name the parameter ...
> > >
> > > >  {
> > > >return ((!timode_p || TARGET_64BIT)
> > > > -   && TARGET_STV && TARGET_SSE2 && 

Re: [PATCH] i386: Disable stv under optimize_size [PR 105034]

2022-04-14 Thread Richard Biener via Gcc-patches
On Thu, Apr 14, 2022 at 10:31 AM Hongyu Wang  wrote:
>
> > >virtual bool gate (function *)
> >
> > please name the parameter ...
> >
> > >  {
> > >return ((!timode_p || TARGET_64BIT)
> > > -   && TARGET_STV && TARGET_SSE2 && optimize > 1);
> > > +   && TARGET_STV && TARGET_SSE2 && optimize > 1
> > > +   && optimize_function_for_speed_p (cfun));
> >
> > ... and use it here instead of referencing 'cfun'
>
> Updated. Thanks!

As a general comment it would be nicer if the cost metric itself would focus
on size costs when optimizing for size and speed costs when optimizing for
speed so that individual STV opportunities can be enabled/disabled based
on it.

At least I see the chance that there will be a case where STV improves
code size that will regress if we simply disable it for -Os.  Like when I do

typedef int v4si __attribute__((vector_size(16)));

#define min(a,b) ((a)<(b)?(a):(b))

v4si foo (v4si a, v4si b)
{
  a[0] = min (a[0], b[0]);
  return a;
}

there's a xmm to grp move penalty for scalar code that could go away
(but oddly enough we're not arranging for the use of pminsd here - seems
we're confused about vec_select/vec_merge).

Richard.

> gcc/ChangeLog:
>
> PR target/105034
> * config/i386/i386-features.cc (pass_stv::gate()): Name param
> to fun and add optimize_function_for_speed_p (fun).
>
> gcc/testsuite/ChangeLog:
>
> PR target/105034
> * gcc.target/i386/pr105034.c: New test.
> ---
>  gcc/config/i386/i386-features.cc |  5 +++--
>  gcc/testsuite/gcc.target/i386/pr105034.c | 23 +++
>  2 files changed, 26 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr105034.c
>
> diff --git a/gcc/config/i386/i386-features.cc 
> b/gcc/config/i386/i386-features.cc
> index 6fe41c3c24f..26be2986486 100644
> --- a/gcc/config/i386/i386-features.cc
> +++ b/gcc/config/i386/i386-features.cc
> @@ -1908,10 +1908,11 @@ public:
>{}
>
>/* opt_pass methods: */
> -  virtual bool gate (function *)
> +  virtual bool gate (function *fun)
>  {
>return ((!timode_p || TARGET_64BIT)
> -  && TARGET_STV && TARGET_SSE2 && optimize > 1);
> +  && TARGET_STV && TARGET_SSE2 && optimize > 1
> +  && optimize_function_for_speed_p (fun));
>  }
>
>virtual unsigned int execute (function *)
> diff --git a/gcc/testsuite/gcc.target/i386/pr105034.c
> b/gcc/testsuite/gcc.target/i386/pr105034.c
> new file mode 100644
> index 000..d997e26e9ed
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr105034.c
> @@ -0,0 +1,23 @@
> +/* PR target/105034 */
> +/* { dg-do compile } */
> +/* { dg-options "-Os -msse4.1" } */
> +
> +#define max(a,b) (((a) > (b))? (a) : (b))
> +#define min(a,b) (((a) < (b))? (a) : (b))
> +
> +int foo(int x)
> +{
> +  return max(x,0);
> +}
> +
> +int bar(int x)
> +{
> +  return min(x,0);
> +}
> +
> +unsigned int baz(unsigned int x)
> +{
> +  return min(x,1);
> +}
> +
> +/* { dg-final { scan-assembler-not "xmm" } } */
> --
> 2.18.1
>
>
> Richard Biener  于2022年4月14日周四 16:06写道:
> >
> > On Thu, Apr 14, 2022 at 9:55 AM Hongyu Wang  wrote:
> > >
> > > >
> > > > optimize_function_for_speed ()?
> > > >
> > >
> > > Yes, updated patch with optimize_function_for_speed_p()
> > >
> > > gcc/ChangeLog:
> > >
> > > PR target/105034
> > > * config/i386/i386-features.cc (pass_stv::gate()): Add
> > >   optimize_function_for_speed_p ().
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > PR target/105034
> > > * gcc.target/i386/pr105034.c: New test.
> > > ---
> > >  gcc/config/i386/i386-features.cc |  3 ++-
> > >  gcc/testsuite/gcc.target/i386/pr105034.c | 23 +++
> > >  2 files changed, 25 insertions(+), 1 deletion(-)
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr105034.c
> > >
> > > diff --git a/gcc/config/i386/i386-features.cc 
> > > b/gcc/config/i386/i386-features.cc
> > > index 6fe41c3c24f..a49c3aa1525 100644
> > > --- a/gcc/config/i386/i386-features.cc
> > > +++ b/gcc/config/i386/i386-features.cc
> > > @@ -1911,7 +1911,8 @@ public:
> > >virtual bool gate (function *)
> >
> > please name the parameter ...
> >
> > >  {
> > >return ((!timode_p || TARGET_64BIT)
> > > -   && TARGET_STV && TARGET_SSE2 && optimize > 1);
> > > +   && TARGET_STV && TARGET_SSE2 && optimize > 1
> > > +   && optimize_function_for_speed_p (cfun));
> >
> > ... and use it here instead of referencing 'cfun'
> >
> > Richard.
> >
> > >  }
> > >
> > >virtual unsigned int execute (function *)
> > > diff --git a/gcc/testsuite/gcc.target/i386/pr105034.c
> > > b/gcc/testsuite/gcc.target/i386/pr105034.c
> > > new file mode 100644
> > > index 000..d997e26e9ed
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/pr105034.c
> > > @@ -0,0 +1,23 @@
> > > +/* PR target/105034 */
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-Os -msse4.1" } */
> > > +
> > > +#define max(a,b) (((a) > (b))? (a) : (b))
> > > +#define min(a,b) (((a) < (b))? (a) : (b))
> > > +
> > 

Re: [PATCH] i386: Disable stv under optimize_size [PR 105034]

2022-04-14 Thread Hongyu Wang via Gcc-patches
> >virtual bool gate (function *)
>
> please name the parameter ...
>
> >  {
> >return ((!timode_p || TARGET_64BIT)
> > -   && TARGET_STV && TARGET_SSE2 && optimize > 1);
> > +   && TARGET_STV && TARGET_SSE2 && optimize > 1
> > +   && optimize_function_for_speed_p (cfun));
>
> ... and use it here instead of referencing 'cfun'

Updated. Thanks!

gcc/ChangeLog:

PR target/105034
* config/i386/i386-features.cc (pass_stv::gate()): Name param
to fun and add optimize_function_for_speed_p (fun).

gcc/testsuite/ChangeLog:

PR target/105034
* gcc.target/i386/pr105034.c: New test.
---
 gcc/config/i386/i386-features.cc |  5 +++--
 gcc/testsuite/gcc.target/i386/pr105034.c | 23 +++
 2 files changed, 26 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr105034.c

diff --git a/gcc/config/i386/i386-features.cc b/gcc/config/i386/i386-features.cc
index 6fe41c3c24f..26be2986486 100644
--- a/gcc/config/i386/i386-features.cc
+++ b/gcc/config/i386/i386-features.cc
@@ -1908,10 +1908,11 @@ public:
   {}

   /* opt_pass methods: */
-  virtual bool gate (function *)
+  virtual bool gate (function *fun)
 {
   return ((!timode_p || TARGET_64BIT)
-  && TARGET_STV && TARGET_SSE2 && optimize > 1);
+  && TARGET_STV && TARGET_SSE2 && optimize > 1
+  && optimize_function_for_speed_p (fun));
 }

   virtual unsigned int execute (function *)
diff --git a/gcc/testsuite/gcc.target/i386/pr105034.c
b/gcc/testsuite/gcc.target/i386/pr105034.c
new file mode 100644
index 000..d997e26e9ed
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr105034.c
@@ -0,0 +1,23 @@
+/* PR target/105034 */
+/* { dg-do compile } */
+/* { dg-options "-Os -msse4.1" } */
+
+#define max(a,b) (((a) > (b))? (a) : (b))
+#define min(a,b) (((a) < (b))? (a) : (b))
+
+int foo(int x)
+{
+  return max(x,0);
+}
+
+int bar(int x)
+{
+  return min(x,0);
+}
+
+unsigned int baz(unsigned int x)
+{
+  return min(x,1);
+}
+
+/* { dg-final { scan-assembler-not "xmm" } } */
-- 
2.18.1


Richard Biener  于2022年4月14日周四 16:06写道:
>
> On Thu, Apr 14, 2022 at 9:55 AM Hongyu Wang  wrote:
> >
> > >
> > > optimize_function_for_speed ()?
> > >
> >
> > Yes, updated patch with optimize_function_for_speed_p()
> >
> > gcc/ChangeLog:
> >
> > PR target/105034
> > * config/i386/i386-features.cc (pass_stv::gate()): Add
> >   optimize_function_for_speed_p ().
> >
> > gcc/testsuite/ChangeLog:
> >
> > PR target/105034
> > * gcc.target/i386/pr105034.c: New test.
> > ---
> >  gcc/config/i386/i386-features.cc |  3 ++-
> >  gcc/testsuite/gcc.target/i386/pr105034.c | 23 +++
> >  2 files changed, 25 insertions(+), 1 deletion(-)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr105034.c
> >
> > diff --git a/gcc/config/i386/i386-features.cc 
> > b/gcc/config/i386/i386-features.cc
> > index 6fe41c3c24f..a49c3aa1525 100644
> > --- a/gcc/config/i386/i386-features.cc
> > +++ b/gcc/config/i386/i386-features.cc
> > @@ -1911,7 +1911,8 @@ public:
> >virtual bool gate (function *)
>
> please name the parameter ...
>
> >  {
> >return ((!timode_p || TARGET_64BIT)
> > -   && TARGET_STV && TARGET_SSE2 && optimize > 1);
> > +   && TARGET_STV && TARGET_SSE2 && optimize > 1
> > +   && optimize_function_for_speed_p (cfun));
>
> ... and use it here instead of referencing 'cfun'
>
> Richard.
>
> >  }
> >
> >virtual unsigned int execute (function *)
> > diff --git a/gcc/testsuite/gcc.target/i386/pr105034.c
> > b/gcc/testsuite/gcc.target/i386/pr105034.c
> > new file mode 100644
> > index 000..d997e26e9ed
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/pr105034.c
> > @@ -0,0 +1,23 @@
> > +/* PR target/105034 */
> > +/* { dg-do compile } */
> > +/* { dg-options "-Os -msse4.1" } */
> > +
> > +#define max(a,b) (((a) > (b))? (a) : (b))
> > +#define min(a,b) (((a) < (b))? (a) : (b))
> > +
> > +int foo(int x)
> > +{
> > +  return max(x,0);
> > +}
> > +
> > +int bar(int x)
> > +{
> > +  return min(x,0);
> > +}
> > +
> > +unsigned int baz(unsigned int x)
> > +{
> > +  return min(x,1);
> > +}
> > +
> > +/* { dg-final { scan-assembler-not "xmm" } } */
> > --
> > 2.18.1
> >
> > Richard Biener via Gcc-patches  于2022年4月14日周四 
> > 14:56写道:
> > >
> > > On Thu, Apr 14, 2022 at 3:18 AM Hongyu Wang via Gcc-patches
> > >  wrote:
> > > >
> > > > Hi,
> > > >
> > > > From -Os point of view, stv converts scalar register to vector mode
> > > > which introduces extra reg conversion and increase instruction size.
> > > > Disabling stv under optimize_size would avoid such code size increment
> > > > and no need to touch ix86_size_cost that has not been tuned for long
> > > > time.
> > > >
> > > > Bootstrapped/regtested on x86_64-pc-linux-gnu{-m32,},
> > > >
> > > > Ok for master?
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > PR target/105034
> > > > * config/i386/i386-features.cc (pass_stv::gate()): Block out
> > > > optimize_size.
> > > >
> 

Re: [PATCH] i386: Disable stv under optimize_size [PR 105034]

2022-04-14 Thread Richard Biener via Gcc-patches
On Thu, Apr 14, 2022 at 9:55 AM Hongyu Wang  wrote:
>
> >
> > optimize_function_for_speed ()?
> >
>
> Yes, updated patch with optimize_function_for_speed_p()
>
> gcc/ChangeLog:
>
> PR target/105034
> * config/i386/i386-features.cc (pass_stv::gate()): Add
>   optimize_function_for_speed_p ().
>
> gcc/testsuite/ChangeLog:
>
> PR target/105034
> * gcc.target/i386/pr105034.c: New test.
> ---
>  gcc/config/i386/i386-features.cc |  3 ++-
>  gcc/testsuite/gcc.target/i386/pr105034.c | 23 +++
>  2 files changed, 25 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr105034.c
>
> diff --git a/gcc/config/i386/i386-features.cc 
> b/gcc/config/i386/i386-features.cc
> index 6fe41c3c24f..a49c3aa1525 100644
> --- a/gcc/config/i386/i386-features.cc
> +++ b/gcc/config/i386/i386-features.cc
> @@ -1911,7 +1911,8 @@ public:
>virtual bool gate (function *)

please name the parameter ...

>  {
>return ((!timode_p || TARGET_64BIT)
> -   && TARGET_STV && TARGET_SSE2 && optimize > 1);
> +   && TARGET_STV && TARGET_SSE2 && optimize > 1
> +   && optimize_function_for_speed_p (cfun));

... and use it here instead of referencing 'cfun'

Richard.

>  }
>
>virtual unsigned int execute (function *)
> diff --git a/gcc/testsuite/gcc.target/i386/pr105034.c
> b/gcc/testsuite/gcc.target/i386/pr105034.c
> new file mode 100644
> index 000..d997e26e9ed
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr105034.c
> @@ -0,0 +1,23 @@
> +/* PR target/105034 */
> +/* { dg-do compile } */
> +/* { dg-options "-Os -msse4.1" } */
> +
> +#define max(a,b) (((a) > (b))? (a) : (b))
> +#define min(a,b) (((a) < (b))? (a) : (b))
> +
> +int foo(int x)
> +{
> +  return max(x,0);
> +}
> +
> +int bar(int x)
> +{
> +  return min(x,0);
> +}
> +
> +unsigned int baz(unsigned int x)
> +{
> +  return min(x,1);
> +}
> +
> +/* { dg-final { scan-assembler-not "xmm" } } */
> --
> 2.18.1
>
> Richard Biener via Gcc-patches  于2022年4月14日周四 
> 14:56写道:
> >
> > On Thu, Apr 14, 2022 at 3:18 AM Hongyu Wang via Gcc-patches
> >  wrote:
> > >
> > > Hi,
> > >
> > > From -Os point of view, stv converts scalar register to vector mode
> > > which introduces extra reg conversion and increase instruction size.
> > > Disabling stv under optimize_size would avoid such code size increment
> > > and no need to touch ix86_size_cost that has not been tuned for long
> > > time.
> > >
> > > Bootstrapped/regtested on x86_64-pc-linux-gnu{-m32,},
> > >
> > > Ok for master?
> > >
> > > gcc/ChangeLog:
> > >
> > > PR target/105034
> > > * config/i386/i386-features.cc (pass_stv::gate()): Block out
> > > optimize_size.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > PR target/105034
> > > * gcc.target/i386/pr105034.c: New test.
> > > ---
> > >  gcc/config/i386/i386-features.cc |  3 ++-
> > >  gcc/testsuite/gcc.target/i386/pr105034.c | 23 +++
> > >  2 files changed, 25 insertions(+), 1 deletion(-)
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr105034.c
> > >
> > > diff --git a/gcc/config/i386/i386-features.cc 
> > > b/gcc/config/i386/i386-features.cc
> > > index 6fe41c3c24f..f57281e672f 100644
> > > --- a/gcc/config/i386/i386-features.cc
> > > +++ b/gcc/config/i386/i386-features.cc
> > > @@ -1911,7 +1911,8 @@ public:
> > >virtual bool gate (function *)
> > >  {
> > >return ((!timode_p || TARGET_64BIT)
> > > - && TARGET_STV && TARGET_SSE2 && optimize > 1);
> > > + && TARGET_STV && TARGET_SSE2 && optimize > 1
> > > + && !optimize_size);
> >
> > optimize_function_for_speed ()?
> >
> > >  }
> > >
> > >virtual unsigned int execute (function *)
> > > diff --git a/gcc/testsuite/gcc.target/i386/pr105034.c 
> > > b/gcc/testsuite/gcc.target/i386/pr105034.c
> > > new file mode 100644
> > > index 000..d997e26e9ed
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/pr105034.c
> > > @@ -0,0 +1,23 @@
> > > +/* PR target/105034 */
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-Os -msse4.1" } */
> > > +
> > > +#define max(a,b) (((a) > (b))? (a) : (b))
> > > +#define min(a,b) (((a) < (b))? (a) : (b))
> > > +
> > > +int foo(int x)
> > > +{
> > > +  return max(x,0);
> > > +}
> > > +
> > > +int bar(int x)
> > > +{
> > > +  return min(x,0);
> > > +}
> > > +
> > > +unsigned int baz(unsigned int x)
> > > +{
> > > +  return min(x,1);
> > > +}
> > > +
> > > +/* { dg-final { scan-assembler-not "xmm" } } */
> > > --
> > > 2.18.1
> > >


Re: GCC 11.2.1 Status Report (2022-04-13), branch frozen for release

2022-04-14 Thread Richard Biener via Gcc-patches
On Thu, 14 Apr 2022, Andreas Krebbel wrote:

> On 4/13/22 09:30, Richard Biener via Gcc wrote:
> > 
> > Status
> > ==
> > 
> > The gcc-11 branch is now frozen in preparation for a GCC 11.3 release
> > candidate and the GCC 11.3 release next week.  All changes now require
> > release manager approval.
> 
> Hi,
> 
> I would like to push:
> 
> https://gcc.gnu.org/pipermail/gcc-patches/2022-April/593103.html
> 
> to GCC 11 branch before 11.3 release. Ok?

OK.

Richard.


Re: [PATCH] i386: Disable stv under optimize_size [PR 105034]

2022-04-14 Thread Hongyu Wang via Gcc-patches
>
> optimize_function_for_speed ()?
>

Yes, updated patch with optimize_function_for_speed_p()

gcc/ChangeLog:

PR target/105034
* config/i386/i386-features.cc (pass_stv::gate()): Add
  optimize_function_for_speed_p ().

gcc/testsuite/ChangeLog:

PR target/105034
* gcc.target/i386/pr105034.c: New test.
---
 gcc/config/i386/i386-features.cc |  3 ++-
 gcc/testsuite/gcc.target/i386/pr105034.c | 23 +++
 2 files changed, 25 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr105034.c

diff --git a/gcc/config/i386/i386-features.cc b/gcc/config/i386/i386-features.cc
index 6fe41c3c24f..a49c3aa1525 100644
--- a/gcc/config/i386/i386-features.cc
+++ b/gcc/config/i386/i386-features.cc
@@ -1911,7 +1911,8 @@ public:
   virtual bool gate (function *)
 {
   return ((!timode_p || TARGET_64BIT)
-   && TARGET_STV && TARGET_SSE2 && optimize > 1);
+   && TARGET_STV && TARGET_SSE2 && optimize > 1
+   && optimize_function_for_speed_p (cfun));
 }

   virtual unsigned int execute (function *)
diff --git a/gcc/testsuite/gcc.target/i386/pr105034.c
b/gcc/testsuite/gcc.target/i386/pr105034.c
new file mode 100644
index 000..d997e26e9ed
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr105034.c
@@ -0,0 +1,23 @@
+/* PR target/105034 */
+/* { dg-do compile } */
+/* { dg-options "-Os -msse4.1" } */
+
+#define max(a,b) (((a) > (b))? (a) : (b))
+#define min(a,b) (((a) < (b))? (a) : (b))
+
+int foo(int x)
+{
+  return max(x,0);
+}
+
+int bar(int x)
+{
+  return min(x,0);
+}
+
+unsigned int baz(unsigned int x)
+{
+  return min(x,1);
+}
+
+/* { dg-final { scan-assembler-not "xmm" } } */
-- 
2.18.1

Richard Biener via Gcc-patches  于2022年4月14日周四 14:56写道:
>
> On Thu, Apr 14, 2022 at 3:18 AM Hongyu Wang via Gcc-patches
>  wrote:
> >
> > Hi,
> >
> > From -Os point of view, stv converts scalar register to vector mode
> > which introduces extra reg conversion and increase instruction size.
> > Disabling stv under optimize_size would avoid such code size increment
> > and no need to touch ix86_size_cost that has not been tuned for long
> > time.
> >
> > Bootstrapped/regtested on x86_64-pc-linux-gnu{-m32,},
> >
> > Ok for master?
> >
> > gcc/ChangeLog:
> >
> > PR target/105034
> > * config/i386/i386-features.cc (pass_stv::gate()): Block out
> > optimize_size.
> >
> > gcc/testsuite/ChangeLog:
> >
> > PR target/105034
> > * gcc.target/i386/pr105034.c: New test.
> > ---
> >  gcc/config/i386/i386-features.cc |  3 ++-
> >  gcc/testsuite/gcc.target/i386/pr105034.c | 23 +++
> >  2 files changed, 25 insertions(+), 1 deletion(-)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr105034.c
> >
> > diff --git a/gcc/config/i386/i386-features.cc 
> > b/gcc/config/i386/i386-features.cc
> > index 6fe41c3c24f..f57281e672f 100644
> > --- a/gcc/config/i386/i386-features.cc
> > +++ b/gcc/config/i386/i386-features.cc
> > @@ -1911,7 +1911,8 @@ public:
> >virtual bool gate (function *)
> >  {
> >return ((!timode_p || TARGET_64BIT)
> > - && TARGET_STV && TARGET_SSE2 && optimize > 1);
> > + && TARGET_STV && TARGET_SSE2 && optimize > 1
> > + && !optimize_size);
>
> optimize_function_for_speed ()?
>
> >  }
> >
> >virtual unsigned int execute (function *)
> > diff --git a/gcc/testsuite/gcc.target/i386/pr105034.c 
> > b/gcc/testsuite/gcc.target/i386/pr105034.c
> > new file mode 100644
> > index 000..d997e26e9ed
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/pr105034.c
> > @@ -0,0 +1,23 @@
> > +/* PR target/105034 */
> > +/* { dg-do compile } */
> > +/* { dg-options "-Os -msse4.1" } */
> > +
> > +#define max(a,b) (((a) > (b))? (a) : (b))
> > +#define min(a,b) (((a) < (b))? (a) : (b))
> > +
> > +int foo(int x)
> > +{
> > +  return max(x,0);
> > +}
> > +
> > +int bar(int x)
> > +{
> > +  return min(x,0);
> > +}
> > +
> > +unsigned int baz(unsigned int x)
> > +{
> > +  return min(x,1);
> > +}
> > +
> > +/* { dg-final { scan-assembler-not "xmm" } } */
> > --
> > 2.18.1
> >


Re: GCC 11.2.1 Status Report (2022-04-13), branch frozen for release

2022-04-14 Thread Andreas Krebbel via Gcc-patches
On 4/13/22 09:30, Richard Biener via Gcc wrote:
> 
> Status
> ==
> 
> The gcc-11 branch is now frozen in preparation for a GCC 11.3 release
> candidate and the GCC 11.3 release next week.  All changes now require
> release manager approval.

Hi,

I would like to push:

https://gcc.gnu.org/pipermail/gcc-patches/2022-April/593103.html

to GCC 11 branch before 11.3 release. Ok?

Bye,

Andreas


Re: [PATCH] fortran: use fpu-glibc on powerpc*-unknown-freebsd

2022-04-14 Thread FX via Gcc-patches
Hi,

> can you check the following patch?

Why restrict it to powerpc-freebsd only, and not all freebsd? Do they differ?
Otherwise it looks ok to me, but probably should be tested on a glibc non-x86 
target.

In any case, this will be for the new branch, when stage 1 reopens.

FX

Re: [PATCH] i386: Disable stv under optimize_size [PR 105034]

2022-04-14 Thread Richard Biener via Gcc-patches
On Thu, Apr 14, 2022 at 3:18 AM Hongyu Wang via Gcc-patches
 wrote:
>
> Hi,
>
> From -Os point of view, stv converts scalar register to vector mode
> which introduces extra reg conversion and increase instruction size.
> Disabling stv under optimize_size would avoid such code size increment
> and no need to touch ix86_size_cost that has not been tuned for long
> time.
>
> Bootstrapped/regtested on x86_64-pc-linux-gnu{-m32,},
>
> Ok for master?
>
> gcc/ChangeLog:
>
> PR target/105034
> * config/i386/i386-features.cc (pass_stv::gate()): Block out
> optimize_size.
>
> gcc/testsuite/ChangeLog:
>
> PR target/105034
> * gcc.target/i386/pr105034.c: New test.
> ---
>  gcc/config/i386/i386-features.cc |  3 ++-
>  gcc/testsuite/gcc.target/i386/pr105034.c | 23 +++
>  2 files changed, 25 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr105034.c
>
> diff --git a/gcc/config/i386/i386-features.cc 
> b/gcc/config/i386/i386-features.cc
> index 6fe41c3c24f..f57281e672f 100644
> --- a/gcc/config/i386/i386-features.cc
> +++ b/gcc/config/i386/i386-features.cc
> @@ -1911,7 +1911,8 @@ public:
>virtual bool gate (function *)
>  {
>return ((!timode_p || TARGET_64BIT)
> - && TARGET_STV && TARGET_SSE2 && optimize > 1);
> + && TARGET_STV && TARGET_SSE2 && optimize > 1
> + && !optimize_size);

optimize_function_for_speed ()?

>  }
>
>virtual unsigned int execute (function *)
> diff --git a/gcc/testsuite/gcc.target/i386/pr105034.c 
> b/gcc/testsuite/gcc.target/i386/pr105034.c
> new file mode 100644
> index 000..d997e26e9ed
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr105034.c
> @@ -0,0 +1,23 @@
> +/* PR target/105034 */
> +/* { dg-do compile } */
> +/* { dg-options "-Os -msse4.1" } */
> +
> +#define max(a,b) (((a) > (b))? (a) : (b))
> +#define min(a,b) (((a) < (b))? (a) : (b))
> +
> +int foo(int x)
> +{
> +  return max(x,0);
> +}
> +
> +int bar(int x)
> +{
> +  return min(x,0);
> +}
> +
> +unsigned int baz(unsigned int x)
> +{
> +  return min(x,1);
> +}
> +
> +/* { dg-final { scan-assembler-not "xmm" } } */
> --
> 2.18.1
>


Re: [PATCH] fold, simplify-rtx: Punt on non-representable floating point constants [PR104522]

2022-04-14 Thread Richard Biener via Gcc-patches
On Wed, Apr 13, 2022 at 5:22 PM Qing Zhao  wrote:
>
> Hi, Richard,
>
> Thanks a lot for taking a look at this issue (and Sorry that I haven’t fixed 
> this one yet, I was distracted by other tasks then just forgot this one….)
>
> > On Apr 13, 2022, at 3:41 AM, Richard Biener  
> > wrote:
> >
> > On Tue, Feb 15, 2022 at 5:31 PM Qing Zhao via Gcc-patches
> >  wrote:
> >>
> >>
> >>
> >>> On Feb 15, 2022, at 3:58 AM, Jakub Jelinek  wrote:
> >>>
> >>> Hi!
> >>>
> >>> For IBM double double I've added in PR95450 and PR99648 verification that
> >>> when we at the tree/GIMPLE or RTL level interpret target bytes as a 
> >>> REAL_CST
> >>> or CONST_DOUBLE constant, we try to encode it back to target bytes and
> >>> verify it is the same.
> >>> This is because our real.c support isn't able to represent all valid 
> >>> values
> >>> of IBM double double which has variable precision.
> >>> In PR104522, it has been noted that we have similar problem with the
> >>> Intel/Motorola extended XFmode formats, our internal representation isn't
> >>> able to record pseudo denormals, pseudo infinities, pseudo NaNs and 
> >>> unnormal
> >>> values.
> >>> So, the following patch is an attempt to extend that verification to all
> >>> floats.
> >>> Unfortunately, it wasn't that straightforward, because the
> >>> __builtin_clear_padding code exactly for the XFmode long doubles needs to
> >>> discover what bits are padding and does that by interpreting memory of
> >>> all 1s.  That is actually a valid supported value, a qNaN with negative
> >>> sign with all mantissa bits set, but the verification includes also the
> >>> padding bits (exactly what __builtin_clear_padding wants to figure out)
> >>> and so fails the comparison check and so we ICE.
> >>> The patch fixes that case by moving that verification from
> >>> native_interpret_real to its caller, so that clear_padding_type can
> >>> call native_interpret_real and avoid that extra check.
> >>>
> >>> With this, the only thing that regresses in the testsuite is
> >>> +FAIL: gcc.target/i386/auto-init-4.c scan-assembler-times 
> >>> long\\t-16843010 5
> >>> because it decides to use a pattern that has non-zero bits in the padding
> >>> bits of the long double, so the simplify-rtx.cc change prevents folding
> >>> a SUBREG into a constant.  We emit (the testcase is -O0 but we emit worse
> >>> code at all opt levels) something like:
> >>>   movabsq $-72340172838076674, %rax
> >>>   movabsq $-72340172838076674, %rdx
> >>>   movq%rax, -48(%rbp)
> >>>   movq%rdx, -40(%rbp)
> >>>   fldt-48(%rbp)
> >>>   fstpt   -32(%rbp)
> >>> instead of
> >>>   fldt.LC2(%rip)
> >>>   fstpt   -32(%rbp)
> >>> ...
> >>> .LC2:
> >>>   .long   -16843010
> >>>   .long   -16843010
> >>>   .long   65278
> >>>   .long   0
> >>> Note, neither of those sequences actually stores the padding bits, fstpt
> >>> simply doesn't touch them.
> >>> For vars with clear_padding_real_needs_padding_p types that are allocated
> >>> to memory at expansion time, I'd say much better would be to do the stores
> >>> using integral modes rather than XFmode, so do that:
> >>>   movabsq $-72340172838076674, %rax
> >>>  movq%rax, -32(%rbp)
> >>>  movq%rax, -24(%rbp)
> >>> directly.  That is the only way to ensure the padding bits are initialized
> >>> (or expand __builtin_clear_padding, but then you initialize separately the
> >>> value bits and padding bits).
> >>>
> >>> Bootstrapped/regtested on x86_64-linux and i686-linux, though as mentioned
> >>> above, the gcc.target/i386/auto-init-4.c case is unresolved.
> >>
> >> Thanks, I will try to fix this testing case in a later patch.
> >
> > I've looked at this FAIL now and really wonder whether "pattern init" as
> > implemented makes any sense for non-integral types.
> > We end up with
> > initializing a register (SSA name) with
> >
> >  VIEW_CONVERT_EXPR(0xfefefefefefefefefefefefefefefefe)
> >
> > as we go building a TImode constant (we verified we have a TImode SET!)
> > but then
> >
> >  /* Pun the LHS to make sure its type has constant size
> > unless it is an SSA name where that's already known.  */
> >  if (TREE_CODE (lhs) != SSA_NAME)
> >lhs = build1 (VIEW_CONVERT_EXPR, itype, lhs);
> >  else
> >init = fold_build1 (VIEW_CONVERT_EXPR, TREE_TYPE (lhs), init);
> > ...
> >  expand_assignment (lhs, init, false);
> >
> > and generally registers do not have any padding.  This weird expansion
> > then causes us to spill the TImode constant and reload the XFmode value,
> > which is definitely not optimal here.
> >
> > One approach to avoid the worse code generation would be to use mode
> > specific patterns for registers (like using a NaN or a target specific
> > value that
> > can be loaded cheaply),
>
> You mean that using “mode specific patterns” ONLY for registers?
> Can we use “mode specific patterns” consistently for both registers and 
> 

Re: [PATCH] s390: Add scheduler description for z16

2022-04-14 Thread Andreas Krebbel via Gcc-patches
On 4/13/22 12:23, Robin Dapp wrote:
> Hi,
> 
> this patch adds the scheduler description for z16.  Bootstrapped and
> regtested with --with-arch=z16.
> 
> Is it OK?
> 
> Regards
>  Robin
> 
> 
> gcc/ChangeLog:
> 
>   * config/s390/s390.cc (s390_get_sched_attrmask): Add z16.
>   (s390_get_unit_mask): Likewise.
>   (s390_is_fpd): Likewise.
>   (s390_is_fxd): Likewise.
>   * config/s390/s390.md 
> (z900,z990,z9_109,z9_ec,z10,z196,zEC12,z13,z14,z15):
>   Add z16.
>   (z900,z990,z9_109,z9_ec,z10,z196,zEC12,z13,z14,z15,z16):
>   Likewise.
>   * config/s390/3931.md: New file.

Ok. Thanks!

Andreas




Re: [PATCH] testsuite/s390: Silence warning in pr80725.c

2022-04-14 Thread Andreas Krebbel via Gcc-patches
On 4/13/22 09:35, Robin Dapp wrote:
> Hi,
> 
> this test case checks that we do not ICE but FAILs because of
> -Wint-to-pointer-cast.  Silence this warning.
> 
> Is it OK?

Ok. Thanks!

Andreas



Re: [PATCH] testsuite: Skip pr105250.c for powerpc and s390 [PR105266]

2022-04-14 Thread Andreas Krebbel via Gcc-patches
On 4/14/22 05:10, Kewen.Lin wrote:
> Hi,
> 
> The test case pr105250.c is like its related pr105140.c, which
> suffers the error with message like "{AltiVec,vector} argument
> passed to unprototyped" on powerpc and s390.  So like commits
> r12-8025 and r12-8039, this fix is to add the dg-skip-if for
> powerpc*-*-* and s390*-*-*.
> 
> Tested on powerpc64le-linux-gnu P9 and it should work on s390
> as its similar PR105147.
> 
> Is it ok for trunk?
> 
> BR,
> Kewen
> -
> 
> gcc/testsuite/ChangeLog:
> 
>   PR testsuite/105266
>   * gcc.dg/pr105250.c: Skip for powerpc*-*-* and s390*-*-*.

Ok for s390. Thanks!

Andreas